Prerequisites that need to be performed before setting up a Hadoop Cluster on CentOS and bring up Cloudera Manager

 

 

screen-shot-2012-06-29-at-7-25-57-am

Below is the list of things or perquisites that need to be performed for smooth and error free configuration of Hadoop Cluster using CDH and Cloudera Manager on CentOS servers.

Primarily login as root user into the server and execute the step 1 to step 10 on all nodes that goes into the cluster.

 

1) Add Hosts under

sudo vi /etc/hosts

#Here you list all the nodes that are going into the cluster.(I’m assuming one Namenode and four datanodes. )

xx.xx.xx.xx   Namenode

xxx.xx.xx.xx  Datanode1

xx.xx.xxx.xx   Datanode2

xxx.xxx.xx.x   Datanode3

xxx.xx.xxx.xxx Datanode4

The x…. indicates the IP adresses

 

2)yum -y install perl ntp wget

 

3) Rename the host

sudo vi /etc/sysconfig/network

#change hostname accordingly.

 

4) Add groups and users

#Add group hadoop and user hduser

groupadd hadoop

useradd hduser -g hadoop

passwd hduser

usermod -aG wheel hduser

 

5)Add hduser to sudoers (visudo).

perl -p -i -e “s/# \%wheel/\%wheel/g” /etc/sudoers

 

6) Stop and disable iptables(default firewall system of Redhat Linux)

service iptables stop

chkconfig iptables off

 

7)Enable password authentication

perl -p -i -e “s/^PasswordAuthentication no/PasswordAuthentication yes/g” /etc/ssh/

sshd_config

service sshd restart

 

8)Disable selinux

perl -p -i -e “s/^SELINUX=enforcing/SELINUX=disabled/g” /etc/selinux/config

 

9)Disable vm.swappiness (Virtual memory swappiness)

sysctl -a |grep swappiness

echo “vm.swappiness=0” >> /etc/sysctl.conf

 

10)Setup ntpd, create a directory under /stage and reboot.

service ntpd start; chkconfig ntpd on

mkdir -p /stage/cloudera

init 6

 

11)SSH keygen(This step has to be performed only on the Node that you wish to install the Cloudera Manger). This step enables the namenode to login to the other nodes in the cluster without any password.

ssh-keygen

ssh hduser@Datanode1 ‘mkdir -p ~/.ssh; chmod 700 ~/.ssh’

ssh hduser@Datanode2 ‘mkdir -p ~/.ssh; chmod 700 ~/.ssh’

ssh hduser@Datanode3 ‘mkdir -p ~/.ssh; chmod 700 ~/.ssh’

ssh hduser@Datanode4 ‘mkdir -p ~/.ssh; chmod 700 ~/.ssh’

 

cat ~/.ssh/id_rsa.pub | ssh hduser@Datanode1 “cat >> ~/.ssh/authorized_keys”

cat ~/.ssh/id_rsa.pub | ssh hduser@Datanode2 “cat >> ~/.ssh/authorized_keys”

cat ~/.ssh/id_rsa.pub | ssh hduser@Datanode3 “cat >> ~/.ssh/authorized_keys”

cat ~/.ssh/id_rsa.pub | ssh hduser@Datanode4 “cat >> ~/.ssh/authorized_keys”

 

12)Install Mysql(On any one of the datanodes) in this case let us take Datanode3. login to the Datanode3.

sudo yum install mysql-server

sudo service mysqld start

sudo yum install mysql-connector-java

sudo /usr/bin/mysql_secure_installation

Enter current password for root (enter for none):(its none for the first time)

Change the root password? [Y/n] Y

Remove anonymous users? [Y/n] Y

Disallow root login remotely? [Y/n] N

Remove test database and access to it? [Y/n] Y

Reload privilege tables now? [Y/n] Y

sudo /sbin/chkconfig mysqld on

sudo /sbin/chkconfig –list mysqld

result should be like this

mysqld                   0:off    1:off    2:on    3:on    4:on    5:on    6:off

 

13) Login to mysql and create the users and databases listed below.(on same node where you installed the mysql)

mysql -u root -p

#rman database, rman user is for cloudera manager.

create database rman;

create user ‘rman’ identified by ‘rman’;

grant all on rman.* to ‘rman’;

#hive database and hive user is for hive meta store.

create database hive;

create user ‘hive’ identified by ‘hive’;

grant all on hive.* to ‘hive’;

 

14) On the Node where you wanted to install the cloudera Manager, Download the cloudera manager installer from link below

wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin

chmod +x cloudera-manager-installer.bin

sudo ./cloudera-manager-installer.bin

Now it will start installing the cloudera-manager deamons and servers and provides a port to access it and default port is 7180, you can access the cloudera manager once it is installed by using

http://Ipaddress it is installed:7180

let us say that IP address of the node where you installed is like ww.xx.yy.zz, then you can access it by

http://ww.xx.yy.zz:7180.

Leave a Reply