Hadoop 3.2.1. on Ubuntu 18.04 (Pseudo-Distribuited)
In this small article we will install Hadoop on a Ubuntu OS hosted by the Oracle VM VirtualBox. Before hand we did a Ubuntu update and install the NANO text editor by console.
[ Install SSH ]
Type Yes, and hit ENTER.
[ Install PDSH ]
Type Yes, and hit ENTER.
[ Edit .bashrc ]
At the end of the document, add the following line an save the file:
export PDSH_RCMD_TYPE=ssh
[ New Key ]
Execute the following command to create a new key:
ssh-keygen -t rsa -P “”
Hit ENTER when asked for a name file.
Copy the public key to authorized_keys:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Check the ssh settings by connecting into the localhost. If asked to continue, type yes:
ssh localhost
Exit the localhost.
[ Install JAVA 8 ]
It needs to be JAVA 8 as Hadoop 3* only support this one.
Check java installation.
[ Download Hadoop 3.2.1 ]
sudo wget -P ~ https://mirrors.sonic.net/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
Extract it.
For convenience, let’s change the name folder.
[ Check the Java’s path ]
ls /usr/lib/jvm/java-8-openjdk-amd64/
[ Editing Files ]
[ hadoop-env.sh ]
Path:
nano ~/hadoop/etc/hadoop/hadoop-env.sh
Add the following line on the java’s implementation section:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
[ core-site.xml ]
Path:
nano ~/hadoop/etc/hadoop/core-site.xml
Add:
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/<USER>/hdata</value>
</property>
[ hdfs-site.xml ]
Path:
nano ~/hadoop/etc/hadoop/hdfs-site.xml
Add:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
[ mapred-site.xml ]
Path:
nano ~/hadoop/etc/hadoop/mapred-site.xml
Add:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/home/<USER>/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/home/<USER>/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/home/<USER>/hadoop</value>
</property>
</configuration>
[ yarn-site.xml ]
Path:
nano ~/hadoop/etc/hadoop/yarn-site.xml
Add:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
[ Add more code into the .bashrc ]
export HADOOP_HOME="/home/<USER>/hadoop"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
[ Format the Namenode ]
[ Start the HDFS Services ]
Check JVM’s
And check it on localhost:9870
[ Start the Yarn Services ]
A check it on localhost:8088
Hope you liked this small article, best regards, Ricardo Costa (Richards).
This article was created in a context of the Distributed Systems Class 2020–21, ESTG-IPG.