Hadoop 3.2.1. on Ubuntu 18.04 (Pseudo-Distribuited)

on a ORACLE VM VirtualBox Machine.

4 min readMay 22, 2021

In this small article we will install Hadoop on a Ubuntu OS hosted by the Oracle VM VirtualBox. Before hand we did a Ubuntu update and install the NANO text editor by console.

[ Install SSH ]

Type Yes, and hit ENTER.

[ Install PDSH ]

Type Yes, and hit ENTER.

[ Edit .bashrc ]

At the end of the document, add the following line an save the file:

export PDSH_RCMD_TYPE=ssh

[ New Key ]

Execute the following command to create a new key:

ssh-keygen -t rsa -P “”

Hit ENTER when asked for a name file.

Copy the public key to authorized_keys:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Check the ssh settings by connecting into the localhost. If asked to continue, type yes:

ssh localhost

Exit the localhost.

[ Install JAVA 8 ]

It needs to be JAVA 8 as Hadoop 3* only support this one.

Check java installation.

[ Download Hadoop 3.2.1 ]

sudo wget -P ~ https://mirrors.sonic.net/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

Extract it.

For convenience, let’s change the name folder.

[ Check the Java’s path ]

ls /usr/lib/jvm/java-8-openjdk-amd64/

[ Editing Files ]

[ hadoop-env.sh ]

Path:

nano ~/hadoop/etc/hadoop/hadoop-env.sh

Add the following line on the java’s implementation section:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

[ core-site.xml ]

Path:

nano ~/hadoop/etc/hadoop/core-site.xml

Add:

<property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:9000</value>
</property>
<property>
      <name>hadoop.tmp.dir</name>
      <value>/home/<USER>/hdata</value>
</property>

[ hdfs-site.xml ]

Path:

nano ~/hadoop/etc/hadoop/hdfs-site.xml

Add:

<property>
      <name>dfs.replication</name>
      <value>1</value>
</property>

[ mapred-site.xml ]

Path:

nano ~/hadoop/etc/hadoop/mapred-site.xml

Add:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=/home/<USER>/hadoop</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=/home/<USER>/hadoop</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=/home/<USER>/hadoop</value>
    </property>
</configuration>

[ yarn-site.xml ]

Path:

nano ~/hadoop/etc/hadoop/yarn-site.xml

Add:

<property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
</property>
<property>
      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

[ Add more code into the .bashrc ]

export HADOOP_HOME="/home/<USER>/hadoop"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin 
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}