Saturday, December 29, 2012

Test Hadoop cluster on vmware

SQL Server MVP Jeremiah Peschka posted 2 articles about Hadoop, which makes me be interested on the nosql skill.

I don't have much knowledge on Nosql and Linux system, so I am going to setup a testing environment on my laptop in holidays

1. download CentOS Linux setup iso file

2. download java jdk 1.6

3. download hadoop setup file

I downloaded release 1.0.4

4. Create VM with VMware workstation
I created 3 vm
linux1 :   ----->master

linux2 :   ----->slaver
linux3 :   ----->slaver

5. install Linux OS

6. Configure vm ip address
vi /etc/sysconfig/network-scripts/ifcfg-eth0

7. Configure host name and hosts file
vi /etc/sysconfig/network          --------->set the hostname
vi /etc/hosts                              --------->add ip hostname mapping for all 3 servers, for instance linux1 linux2 linux3

8. Install JDK
Copy the jdk install file to vm with vmware share folders, and unzip it to local folder. I installed the jdk in /usr/jdk1.6.0-37

9. Install Hadoop
Copy the install file to vm with vmware share folders, and unzip it to local folder. I installed the hadoop files in /usr/hadoop-1.0.4

10. create folder to Hadoop
temp folder: /usr/hadoop-1.0.4/temp
Data folder: /usr/hadoopfiles/Data
Name folder:/usr/hadoopfiles/Name

make sure the folder owner is the user which will start hadoop thread. and for Data folder and Name folder, the permission should be 755
chmod 755 /usr/hadoopfiles/Data

11. Set environment variable
vi /etc/profile

then add the line below:

export JAVA_HOME
export PATH

12. Setup SSH
1) generate ssh pub key file on all 3 servers
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/ >> ~/.ssh/authorized_keys

run "ssh localhost" to test if ssh works. make sure the authorized_keys file has correct permission, that's important
chmod 644 authorized_keys

2)Copy the file to other 2 servers with a new file name, for instance
on linux1, copy the to lunix2 and linux3 with name

3) log on other 2 servers, import the new file
cat ~/.ssh/ >> ~/.ssh/authorized_keys

do the 3 steps on all 3 servers, make sure you can ssh log on any remote server without password prompt.

13. Configure Hadoop.
1) Open $HADOOP_HOME/conf/, set the line below
export JAVA_HOME=/usr/jdk1.6.0_37

2) Open $HADOOP_HOME/conf/masters, add line below

3) Open $HADOOP_HOME/conf/slavers, add line below

4) Edit $HADOOP_HOME/conf/core-site.xml

<!--- global properties -->
<description>A base for other temporary directories.</description>
<!-- file system properties -->

5) Edit $HADOOP_HOME/conf/hdfs-site.xml

<!--- global properties -->

6) Edit $HADOOP_HOME/conf/mapred-site.xml


do the same configuration on all 3 servers

13) disable firewall on all 3 servers
service iptables stop
chkconfig iptables off

14) format name node
cd /usr/hadoop-1.0.4/bin
./hadoop namenode -format

15) start hadoop on master(linux1)

16) run "jps" on all 3 servers to check if hadoop is running
or you can open the website below

you can check the log file in logs folder in case any process can not be run.

it is a good start to learn hadoop, even Microsoft is developing data solutions with hadoop on window platform, so it is time to learn new things



  1. I like the helpful hadoop information you provide for your tutorials. I’ll bookmark your weblog and check again here frequently. I am quite sure I’ll learn many new stuff proper here! Best of luck for the following!
    Hadoop Training in hyderabad

  2. Your information excellent i have to useful for your information.Thanks a lot.
    Hadoop Training in Chennai

  3. Thanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks!!

    Hadoop Course in Chennai

  4. Your posts is really helpful for me.Thanks for your wonderful post. I am very happy to read your post. It is really very helpful for us and I have gathered some important information from this blog.

    Salesforce Training

  5. Hi I am Victoria lives in Chennai. I am a technology freak. Recently I did Java Course in Chennai at a leading Java Training Institutes in Chennai. This is really helpful for me to make a bright carrer in IT industry.

  6. Salesforce training in Chennai
    Day by day I am getting new things and learn new concept through your blogs, I am feeling so confidants, and thanks for your informative blog keep your post as updated one...
    Salesforce training

  7. Dot Net Training Chennai

    Thanks for your wonderful post.It is really very helpful for us and I have gathered some important information from this blog.If anyone wants to get Dot Net Training in Chennai reach FITA, rated as No.1 Dot Net Training Institute in Chennai.

    Dot Net Course in Chennai

    Dot Net Training

  8. Automation Training in Chennai

    I have read your blog and i got a very useful and knowledgeable information from your blog.its really a very nice article. I did Loadrunner Training in Chennai. This is really useful for me. Suppose if anyone interested to learn Manual Testing Training in Chennai reach FITA academy located at Chennai Velachery.

  9. PHP Training Chennai

    I get a lot of great information from this blog. Thank you for your sharing this informative blog. Recently I did PHP course at a leading academy. If you are looking for best PHP Training Institute in Chennai visit FITA IT training academy which offer real timePHP Training in Chennai.

    PHP Course in Chennai