Saturday, December 29, 2012

Test Hadoop cluster on vmware

SQL Server MVP Jeremiah Peschka posted 2 articles about Hadoop, which makes me be interested on the nosql skill.

I don't have much knowledge on Nosql and Linux system, so I am going to setup a testing environment on my laptop in holidays

1. download CentOS Linux setup iso file
http://www.centos.org/

2. download java jdk 1.6
http://www.oracle.com/technetwork/java/javase/downloads/index.html

3. download hadoop setup file
http://hadoop.apache.org/#Download+Hadoop

I downloaded release 1.0.4

4. Create VM with VMware workstation
I created 3 vm
linux1 : 192.168.27.29   ----->master

linux2 : 192.168.27.31   ----->slaver
linux3 : 192.168.27.32   ----->slaver


5. install Linux OS

6. Configure vm ip address
vi /etc/sysconfig/network-scripts/ifcfg-eth0

7. Configure host name and hosts file
vi /etc/sysconfig/network          --------->set the hostname
vi /etc/hosts                              --------->add ip hostname mapping for all 3 servers, for instance
192.168.27.29 linux1
192.168.27.31 linux2
192.168.27.32 linux3

8. Install JDK
Copy the jdk install file to vm with vmware share folders, and unzip it to local folder. I installed the jdk in /usr/jdk1.6.0-37

9. Install Hadoop
Copy the install file to vm with vmware share folders, and unzip it to local folder. I installed the hadoop files in /usr/hadoop-1.0.4

10. create folder to Hadoop
temp folder: /usr/hadoop-1.0.4/temp
Data folder: /usr/hadoopfiles/Data
Name folder:/usr/hadoopfiles/Name

make sure the folder owner is the user which will start hadoop thread. and for Data folder and Name folder, the permission should be 755
chmod 755 /usr/hadoopfiles/Data

11. Set environment variable
vi /etc/profile

then add the line below:

HADOOP_HOME=/usr/hadoop-1.0.4
JAVA_HOME=/usr/jdk1.6.0_37
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$CLASSPATH
PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
export JAVA_HOME
export HADOOP_HOME
export CLASSPATH
export PATH

12. Setup SSH
1) generate ssh pub key file on all 3 servers
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

run "ssh localhost" to test if ssh works. make sure the authorized_keys file has correct permission, that's important
chmod 644 authorized_keys

2)Copy the file id_dsa.pub to other 2 servers with a new file name, for instance
on linux1, copy the id_dsa.pub to lunix2 and linux3 with name linux1_id_dsa.pub

3) log on other 2 servers, import the new file
cat ~/.ssh/linux1_id_dsa.pub >> ~/.ssh/authorized_keys

do the 3 steps on all 3 servers, make sure you can ssh log on any remote server without password prompt.


13. Configure Hadoop.
1) Open $HADOOP_HOME/conf/hadoop_env.sh, set the line below
export JAVA_HOME=/usr/jdk1.6.0_37

2) Open $HADOOP_HOME/conf/masters, add line below
linux1


3) Open $HADOOP_HOME/conf/slavers, add line below
linux2
linux3


4) Edit $HADOOP_HOME/conf/core-site.xml

<configuration>
<!--- global properties -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop-1.0.4/tmp</value>(这里可以自己配置一个存放tmp的文件夹路径)
<description>A base for other temporary directories.</description>
</property>
<!-- file system properties -->
<property>
<name>fs.default.name</name>
<value>hdfs://linux1:9000</value>
</property>
</configuration>


5) Edit $HADOOP_HOME/conf/hdfs-site.xml


<configuration>
<!--- global properties -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/HadoopFiles/Name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/HadoopFiles/Data</value>
</property>
</configuration>












6) Edit $HADOOP_HOME/conf/mapred-site.xml



<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>

do the same configuration on all 3 servers

13) disable firewall on all 3 servers
service iptables stop
chkconfig iptables off

14) format name node
cd /usr/hadoop-1.0.4/bin
./hadoop namenode -format

15) start hadoop on master(linux1)
./start-all.sh

16) run "jps" on all 3 servers to check if hadoop is running
or you can open the website below
http://linux1:50030
http://linux1:50070

you can check the log file in logs folder in case any process can not be run.

it is a good start to learn hadoop, even Microsoft is developing data solutions with hadoop on window platform, so it is time to learn new things

reference:
http://blog.csdn.net/skyering/article/details/6457466












19 comments:

  1. I like the helpful hadoop information you provide for your tutorials. I’ll bookmark your weblog and check again here frequently. I am quite sure I’ll learn many new stuff proper here! Best of luck for the following!
    Hadoop Training in hyderabad

    ReplyDelete
  2. Your information excellent i have to useful for your information.Thanks a lot.
    Hadoop Training in Chennai

    ReplyDelete
  3. Thanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks!!

    Hadoop Course in Chennai

    ReplyDelete
  4. Your posts is really helpful for me.Thanks for your wonderful post. I am very happy to read your post. It is really very helpful for us and I have gathered some important information from this blog.

    Salesforce Training

    ReplyDelete
  5. Hi I am Victoria lives in Chennai. I am a technology freak. Recently I did Java Course in Chennai at a leading Java Training Institutes in Chennai. This is really helpful for me to make a bright carrer in IT industry.

    ReplyDelete
  6. Salesforce training in Chennai
    Day by day I am getting new things and learn new concept through your blogs, I am feeling so confidants, and thanks for your informative blog keep your post as updated one...
    Salesforce training

    ReplyDelete
  7. Dot Net Training Chennai

    Thanks for your wonderful post.It is really very helpful for us and I have gathered some important information from this blog.If anyone wants to get Dot Net Training in Chennai reach FITA, rated as No.1 Dot Net Training Institute in Chennai.

    Dot Net Course in Chennai

    Dot Net Training


    ReplyDelete
  8. Automation Training in Chennai

    I have read your blog and i got a very useful and knowledgeable information from your blog.its really a very nice article. I did Loadrunner Training in Chennai. This is really useful for me. Suppose if anyone interested to learn Manual Testing Training in Chennai reach FITA academy located at Chennai Velachery.

    ReplyDelete
  9. PHP Training Chennai

    I get a lot of great information from this blog. Thank you for your sharing this informative blog. Recently I did PHP course at a leading academy. If you are looking for best PHP Training Institute in Chennai visit FITA IT training academy which offer real timePHP Training in Chennai.

    PHP Course in Chennai

    ReplyDelete
  10. QTP Training Chennai

    Hi, I wish to be a regular contributor of your blog. I have read your blog. Your information is really useful for beginner. I did Testing Training in Chennai at Fita training and placement academy which offer best Software Testing Training in Chennai with years of experienced professionals. This is really useful for me to make a bright career.

    Regards...

    Software Testing Training Institutes in Chennai

    ReplyDelete
  11. HTML5 Training

    Hi, Thanks for sharing this valuable blog.I was really impressed by reading this blog. I did HTML5 Training in Chennai at reputed HTML5 Training Institutes in Chennai. This is really useful for me to make a bright future in designing field.

    HTML5 Courses in Chennai

    ReplyDelete
  12. Uniqe informative article and of course True words, thanks for sharing. Today I see myself proud to be a hadoop professional with strong dedication and will power by blasting the obstacles. Thanks to Hadoop Training Chennai

    ReplyDelete
  13. Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing.
    AWS Training in chennai | AWS Training chennai | AWS course in chennai

    ReplyDelete
  14. very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.
    VMWare Training in chennai | VMWare Training chennai | VMWare course in chennai

    ReplyDelete
  15. Nice piece of article you have shared here, my dream of becoming a hadoop professional become true with the help of Hadoop Training in Chennai, keep up your good work of sharing quality articles.

    ReplyDelete