Plain Vanilla Hadoop on Ubuntu

.ubuntu-hadoop

 

Before installing hadoop, you need to have a ubuntu instance. You can create a Ubuntu instance using VMware or Virtual box

 

———Below are the steps that needs to be performed to install hadoop on Ubuntu.———

Before performing the below steps, create a sudo user(in my case its dhruva).

$ sudo apt-get update

$ sudo apt-get install default-jdk

$ java -version

$ sudo apt-get install ssh

$ sudo apt-get install rsync

$ ssh-keygen -t dsa -P ‘ ‘ -f ~/.ssh/id_dsa

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

$ wget -c http://mirror.olnevhost.net/pub/apache/hadoop/common/current/hadoop-2.7.0.tar.gz

$ sudo tar -zxvf hadoop-2.6.0.tar.gz

$ sudo mv hadoop-2.7.0 /usr/local/hadoop

$ update-alternatives –config java

$ sudo gedit ~/.bashrc (paste all this lines at end of the file)

#Hadoop Variables
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib”

$ source ~/.bashrc

$ cd /usr/local/hadoop/etc/hadoop

$ sudo gedit hadoop-env.sh (find export JAVA_HOME and change it with the below line)

#The java implementation to use.
export JAVA_HOME=”/usr/lib/jvm/java-7-openjdk-amd64″

$ sudo gedit core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

$ sudo gedit yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value> org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

$ sudo cp mapred-site.xml.template mapred-site.xml

$ sudo gedit mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

$ sudo gedit hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/hadoop_store/hdfs/datanode</value>
</property>
</configuration>

$ cd

$ mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode

$ mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode

$ sudo chown dhruva:dhruva -R /usr/local/hadoop

$ hdfs namenode -format

$ start-all.sh

$ jps(no sudo jps)

you can check this steps to ensure whether the services are configured or not and if you are able to access all this urls it means that you have configured correctly.

http://localhost:8088/
http://localhost:50070/
http://localhost:50090/
http://localhost:50075/
$ cd /usr/local/hadoop

$ bin/hdfs dfs -mkdir /user

$ bin/hdfs dfs -mkdir /user/dhruva

$ bin/hdfs dfs -chown dhruva /user/dhruva

$ cd

$ cd Desktop

$ jps >> textfile.txt

$ hdfs dfs -put textfile.txt

$ hdfs dfs -ls

Running the mapreduce job from examples.jar

$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar wordcount textfile.txt outputdir

$ hdfs dfs -cat outputdir/part*

Leave a Reply