Saturday, December 29, 2012

Test Hadoop cluster on vmware

SQL Server MVP Jeremiah Peschka posted 2 articles about Hadoop, which makes me be interested on the nosql skill.

I don't have much knowledge on Nosql and Linux system, so I am going to setup a testing environment on my laptop in holidays

1. download CentOS Linux setup iso file

2. download java jdk 1.6

3. download hadoop setup file

I downloaded release 1.0.4

4. Create VM with VMware workstation
I created 3 vm
linux1 :   ----->master

linux2 :   ----->slaver
linux3 :   ----->slaver

5. install Linux OS

6. Configure vm ip address
vi /etc/sysconfig/network-scripts/ifcfg-eth0

7. Configure host name and hosts file
vi /etc/sysconfig/network          --------->set the hostname
vi /etc/hosts                              --------->add ip hostname mapping for all 3 servers, for instance linux1 linux2 linux3

8. Install JDK
Copy the jdk install file to vm with vmware share folders, and unzip it to local folder. I installed the jdk in /usr/jdk1.6.0-37

9. Install Hadoop
Copy the install file to vm with vmware share folders, and unzip it to local folder. I installed the hadoop files in /usr/hadoop-1.0.4

10. create folder to Hadoop
temp folder: /usr/hadoop-1.0.4/temp
Data folder: /usr/hadoopfiles/Data
Name folder:/usr/hadoopfiles/Name

make sure the folder owner is the user which will start hadoop thread. and for Data folder and Name folder, the permission should be 755
chmod 755 /usr/hadoopfiles/Data

11. Set environment variable
vi /etc/profile

then add the line below:

export JAVA_HOME
export PATH

12. Setup SSH
1) generate ssh pub key file on all 3 servers
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/ >> ~/.ssh/authorized_keys

run "ssh localhost" to test if ssh works. make sure the authorized_keys file has correct permission, that's important
chmod 644 authorized_keys

2)Copy the file to other 2 servers with a new file name, for instance
on linux1, copy the to lunix2 and linux3 with name

3) log on other 2 servers, import the new file
cat ~/.ssh/ >> ~/.ssh/authorized_keys

do the 3 steps on all 3 servers, make sure you can ssh log on any remote server without password prompt.

13. Configure Hadoop.
1) Open $HADOOP_HOME/conf/, set the line below
export JAVA_HOME=/usr/jdk1.6.0_37

2) Open $HADOOP_HOME/conf/masters, add line below

3) Open $HADOOP_HOME/conf/slavers, add line below

4) Edit $HADOOP_HOME/conf/core-site.xml

<!--- global properties -->
<description>A base for other temporary directories.</description>
<!-- file system properties -->

5) Edit $HADOOP_HOME/conf/hdfs-site.xml

<!--- global properties -->

6) Edit $HADOOP_HOME/conf/mapred-site.xml


do the same configuration on all 3 servers

13) disable firewall on all 3 servers
service iptables stop
chkconfig iptables off

14) format name node
cd /usr/hadoop-1.0.4/bin
./hadoop namenode -format

15) start hadoop on master(linux1)

16) run "jps" on all 3 servers to check if hadoop is running
or you can open the website below

you can check the log file in logs folder in case any process can not be run.

it is a good start to learn hadoop, even Microsoft is developing data solutions with hadoop on window platform, so it is time to learn new things



  1. I like the helpful hadoop information you provide for your tutorials. I’ll bookmark your weblog and check again here frequently. I am quite sure I’ll learn many new stuff proper here! Best of luck for the following!
    Hadoop Training in hyderabad

  2. Your information excellent i have to useful for your information.Thanks a lot.
    Hadoop Training in Chennai

  3. Thanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks!!

    Hadoop Course in Chennai

  4. Your posts is really helpful for me.Thanks for your wonderful post. I am very happy to read your post. It is really very helpful for us and I have gathered some important information from this blog.

    Salesforce Training

  5. Hi I am Victoria lives in Chennai. I am a technology freak. Recently I did Java Course in Chennai at a leading Java Training Institutes in Chennai. This is really helpful for me to make a bright carrer in IT industry.

  6. Salesforce training in Chennai
    Day by day I am getting new things and learn new concept through your blogs, I am feeling so confidants, and thanks for your informative blog keep your post as updated one...
    Salesforce training

  7. Dot Net Training Chennai

    Thanks for your wonderful post.It is really very helpful for us and I have gathered some important information from this blog.If anyone wants to get Dot Net Training in Chennai reach FITA, rated as No.1 Dot Net Training Institute in Chennai.

    Dot Net Course in Chennai

    Dot Net Training

  8. Automation Training in Chennai

    I have read your blog and i got a very useful and knowledgeable information from your blog.its really a very nice article. I did Loadrunner Training in Chennai. This is really useful for me. Suppose if anyone interested to learn Manual Testing Training in Chennai reach FITA academy located at Chennai Velachery.

  9. PHP Training Chennai

    I get a lot of great information from this blog. Thank you for your sharing this informative blog. Recently I did PHP course at a leading academy. If you are looking for best PHP Training Institute in Chennai visit FITA IT training academy which offer real timePHP Training in Chennai.

    PHP Course in Chennai

  10. QTP Training Chennai

    Hi, I wish to be a regular contributor of your blog. I have read your blog. Your information is really useful for beginner. I did Testing Training in Chennai at Fita training and placement academy which offer best Software Testing Training in Chennai with years of experienced professionals. This is really useful for me to make a bright career.


    Software Testing Training Institutes in Chennai

  11. HTML5 Training

    Hi, Thanks for sharing this valuable blog.I was really impressed by reading this blog. I did HTML5 Training in Chennai at reputed HTML5 Training Institutes in Chennai. This is really useful for me to make a bright future in designing field.

    HTML5 Courses in Chennai

  12. Uniqe informative article and of course True words, thanks for sharing. Today I see myself proud to be a hadoop professional with strong dedication and will power by blasting the obstacles. Thanks to Hadoop Training Chennai

  13. Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing.
    AWS Training in chennai | AWS Training chennai | AWS course in chennai

  14. very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.
    VMWare Training in chennai | VMWare Training chennai | VMWare course in chennai

  15. Nice piece of article you have shared here, my dream of becoming a hadoop professional become true with the help of Hadoop Training in Chennai, keep up your good work of sharing quality articles.

  16. Your blog is really awesome and I got some useful information from your blog. This is really useful for me. Thanks for sharing such a informative blog. Keep posting.

    Cloud Computing Course in Chennai

  17. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

  18. Cloud is one of the tremendous technology that any company in this world would rely on(Salesforce Training). Using this technology many tough tasks can be accomplished easily in no time. Your content are also explaining the same(Salesforce administrator training in chennai). Thanks for sharing this in here. You are running a great blog, keep up this good work.

  19. Truely a very good article on how to handle the future technology. This content creates a new hope and inspiration within me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks :)

    Software testing training in chennai | Software testing course chennai | Automation testing courses in chennai

  20. Using big data analytics may give the companies many fruitful results, the findings can be implemented in their business decisions so as to minimize their risk and to cut the costs.
    hadoop training in chennai|big data training|big data training in chennai

  21. Cloud computing is the next big thing, through cloud the users have the liberty to use a shared network. The companies can focus on core business parts rather than investing heavily on infrastucture.
    cloud computing training in chennai|cloud computing courses in chennai|cloud computing training

  22. This is my first visit to your blog, your post made productive reading, thank you.
    Selenium Training in Chennai