How To Install Hadoop On Ubuntu Step By Step

Hadoop is a free, open-source, and Java-based software framework used for the storage and processing of large datasets on clusters of machines. Information technology uses HDFS to shop its data and process these data using MapReduce. It is an ecosystem of Big Data tools that are primarily used for data mining and machine learning.

Apache Hadoop 3.3 comes with noticeable improvements and many issues fixes over the previous releases. It has iv major components such equally Hadoop Common, HDFS, YARN, and MapReduce.

This tutorial volition explain y'all to how to install and configure Apache Hadoop on Ubuntu 20.04 LTS Linux organization.

Setup Hadoop on Linux

Step 1 – Installing Java

Hadoop is written in Java and supports merely Java version 8. Hadoop version 3.3 and latest likewise back up Java 11 runtime likewise as Java 8.

You can install OpenJDK 11 from the default apt repositories:

          sudo apt update                    sudo apt install openjdk-eleven-jdk

One time installed, verify the installed version of Java with the post-obit command:

          java -version

You should get the following output:

openjdk version "xi.0.11" 2022-04-20 OpenJDK Runtime Environment (build 11.0.eleven+9-Ubuntu-0ubuntu2.20.04) OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed manner, sharing)

Step 2 – Create a Hadoop User

It is a good idea to create a separate user to run Hadoop for security reasons.

Run the post-obit control to create a new user with name hadoop:

          sudo adduser hadoop

Provide and confirm the new password as shown below:

Calculation user `hadoop' ... Adding new group `hadoop' (1002) ... Calculation new user `hadoop' (1002) with group `hadoop' ... Creating home directory `/abode/hadoop' ... Copying files from `/etc/skel' ... New password: Retype new password: passwd: countersign updated successfully Changing the user information for hadoop Enter the new value, or printing ENTER for the default         Full Name []:         Room Number []:         Work Telephone []:         Home Phone []:         Other []: Is the information correct? [Y/n] y

Step 3 – Configure SSH Key-based Authentication

Adjacent, y'all will need to configure passwordless SSH authentication for the local system.

Offset, change the user to hadoop with the following command:

          su - hadoop

Next, run the following control to generate Public and Private Key Pairs:

          ssh-keygen -t rsa

Y'all will be asked to enter the filename. Just printing Enter to consummate the process:

Generating public/individual rsa key pair. Enter file in which to save the key (/abode/hadoop/.ssh/id_rsa): Created directory '/home/hadoop/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hadoop/.ssh/id_rsa Your public primal has been saved in /home/hadoop/.ssh/id_rsa.pub The key fingerprint is: SHA256:QSa2syeISwP0hD+UXxxi0j9MSOrjKDGIbkfbM3ejyIk [email protected] The key's randomart image is: +---[RSA 3072]----+ | ..o++=.+        | |..oo++.O         | |. oo. B .        | |o..+ o * .       | |= ++o o S        | |.++o+  o         | |.+.+ + . o       | |o . o * o .      | |   E + .         | +----[SHA256]-----+

Adjacent, append the generated public keys from id_rsa.pub to authorized_keys and set proper permission:

          cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys                    chmod 640 ~/.ssh/authorized_keys

Next, verify the passwordless SSH hallmark with the following command:

          ssh localhost

Yous will be asked to authenticate hosts by adding RSA keys to known hosts. Blazon yes and hit Enter to authenticate the localhost:

The actuality of host 'localhost (127.0.0.1)' can't exist established. ECDSA cardinal fingerprint is SHA256:JFqDVbM3zTPhUPgD5oMJ4ClviH6tzIRZ2GD3BdNqGMQ. Are you sure you want to go along connecting (yes/no/[fingerprint])? yes

Step 4 – Installing Hadoop

Commencement, change the user to hadoop with the following control:

          su - hadoop

Side by side, download the latest version of Hadoop using the wget command:

          wget https://downloads.apache.org/hadoop/common/hadoop-3.iii.0/hadoop-three.3.0.tar.gz

Once downloaded, excerpt the downloaded file:

          tar -xvzf hadoop-3.3.0.tar.gz

Next, rename the extracted directory to hadoop:

          mv hadoop-3.3.0 hadoop

Next, you will need to configure Hadoop and Coffee Environment Variables on your organization.

Open the ~/.bashrc file in your favorite text editor:

          nano ~/.bashrc

Suspend the below lines to file. You can find JAVA_HOME location by running dirname $(dirname $(readlink -f $(which java))) command on final.

          consign JAVA_HOME=/usr/lib/jvm/java-eleven-openjdk-amd64 export HADOOP_HOME=/home/hadoop/hadoop export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME consign HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME consign HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Save and close the file. Then, activate the environment variables with the following command:

                      source ~/.bashrc

Next, open up the Hadoop surround variable file:

                      nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Again gear up the JAVA_HOME in hadoop environemnt.

          export JAVA_HOME=/usr/lib/jvm/java-eleven-openjdk-amd64

Relieve and close the file when yous are finished.

`Step five - Configuring Hadoop`

Get-go, y'all volition need to create the namenode and datanode directories inside Hadoop dwelling directory:

Run the following control to create both directories:

                      mkdir -p ~/hadoopdata/hdfs/namenode                        mkdir -p ~/hadoopdata/hdfs/datanode

Next, edit the cadre-site.xml file and update with your arrangement hostname:

                      nano $HADOOP_HOME/etc/hadoop/core-site.xml

Modify the following proper noun every bit per your system hostname:

                                  XHTML

<name> fs.defaultFS </name>

<value> hdfs://hadoop.tecadmin.com:9000 </value>

</holding>

</configuration>

Salve and close the file. Then, edit the hdfs-site.xml file:

                      nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Modify the NameNode and DataNode directory path as shown below:

                                  XHTML

half dozen

nine

<name> dfs.replication </proper noun>

</property>

<value> file:///home/hadoop/hadoopdata/hdfs/namenode </value>

</property>

<value> file:///domicile/hadoop/hadoopdata/hdfs/datanode </value>

</holding>

</configuration>

Save and close the file. And so, edit the mapred-site.xml file:

                      nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Make the following changes:

                                  XHTML

<proper name> mapreduce.framework.proper noun </proper noun>

</property>

</configuration>

Relieve and close the file. Then, edit the yarn-site.xml file:

                      nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Brand the post-obit changes:

                                  XHTML

<name> yarn.nodemanager.aux-services </name>

<value> mapreduce_shuffle </value>

</property>

</configuration>

Save and close the file when you are finished.

`Step half dozen - Start Hadoop Cluster`

Before starting the Hadoop cluster. You volition need to format the Namenode as a hadoop user.

Run the post-obit command to format the hadoop Namenode:

                      hdfs namenode -format

You should get the following output:

          2020-xi-23 10:31:51,318 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 2022-11-23 10:31:51,323 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown. 2022-xi-23 10:31:51,323 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop.tecadmin.net/127.0.1.i ************************************************************/

After formatting the Namenode, run the following control to start the hadoop cluster:

                      start-dfs.sh

Once the HDFS started successfully, yous should go the following output:

          Starting namenodes on [hadoop.tecadmin.com] hadoop.tecadmin.com: Warning: Permanently added 'hadoop.tecadmin.com,fe80::200:2dff:fe3a:26ca%eth0' (ECDSA) to the list of known hosts. Starting datanodes Starting secondary namenodes [hadoop.tecadmin.com]

Side by side, start the YARN service as shown below:

                      start-yarn.sh

You should become the following output:

          Starting resourcemanager Starting nodemanagers

You can now check the status of all Hadoop services using the jps command:

jps

You should see all the running services in the following output:

          18194 NameNode 18822 NodeManager 17911 SecondaryNameNode 17720 DataNode 18669 ResourceManager 19151 Jps

`Pace 7 - Adjust Firewall`

Hadoop is now started and listening on port 9870 and 8088. Next, you will need to permit these ports through the firewall.

Run the following command to allow Hadoop connections through the firewall:

                      firewall-cmd --permanent --add-port=9870/tcp                        firewall-cmd --permanent --add-port=8088/tcp

Next, reload the firewalld service to apply the changes:

                      firewall-cmd --reload

`Stride viii - Access Hadoop Namenode and Resource Manager`

To access the Namenode, open your web browser and visit the URL http://your-server-ip:9870. You should see the following screen:

http://hadoop.tecadmin.net:9870

To access the Resource Manage, open your web browser and visit the URL http://your-server-ip:8088. You should encounter the following screen:

http://hadoop.tecadmin.cyberspace:8088

`Step 9 - Verify the Hadoop Cluster`

At this bespeak, the Hadoop cluster is installed and configured. Side by side, we volition create some directories in HDFS filesystem to test the Hadoop.

Let's create some directory in the HDFS filesystem using the post-obit command:

                      hdfs dfs -mkdir /test1            hdfs dfs -mkdir /logs

Next, run the following command to list the to a higher place directory:

                      hdfs dfs -ls /

Y'all should become the following output:

          Establish 3 items drwxr-xr-x   - hadoop supergroup          0 2022-eleven-23 10:56 /logs drwxr-xr-x   - hadoop supergroup          0 2022-11-23 10:51 /test1

Too, put some files to hadoop file system. For the example, putting log files from host car to hadoop file system.

                      hdfs dfs -put /var/log/* /logs/

You tin can also verify the to a higher place files and directory in the Hadoop Namenode web interface.

Go to the Namenode web interface, click on the Utilities => Browse the file arrangement. You should encounter your directories which you have created earlier in the following screen:

http://hadoop.tecadmin.net:9870/explorer.html

`Pace ten - End Hadoop Cluster`

You can also stop the Hadoop Namenode and Yarn service any time by running the finish-dfs.sh and terminate-yarn.sh script equally a Hadoop user.

To stop the Hadoop Namenode service, run the following command as a hadoop user:

                      end-dfs.sh

To finish the Hadoop Resources Manager service, run the post-obit command:

                      cease-yarn.sh

`Determination`

This tutorial explained you step past footstep tutorial to install and configure Hadoop on Ubuntu 20.04 Linux system.

Source: https://tecadmin.net/install-hadoop-on-ubuntu-20-04/

Posted by: gonzalesmoseng.blogspot.com

How To Install Hadoop On Ubuntu Step By Step

Step 1 – Installing Java

Step 2 – Create a Hadoop User

Step 3 – Configure SSH Key-based Authentication

Step 4 – Installing Hadoop

`Step five - Configuring Hadoop`

`Step half dozen - Start Hadoop Cluster`

`Pace 7 - Adjust Firewall`

`Stride viii - Access Hadoop Namenode and Resource Manager`

`Step 9 - Verify the Hadoop Cluster`

`Pace ten - End Hadoop Cluster`

`Determination`

0 Response to "How To Install Hadoop On Ubuntu Step By Step"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel