Setting up Hadoop made easy
Following are the details of components used to install hadoop, all license free:
- Hadoop 1.2.1
- Ubuntu LTS 12.04 (running on virtual Machine) 64 Bit
- Windows 8. (The same thing can be done on mac, i.e., install a virtual machine on mac and follow the below procedure).
Step 1 Installing Virtual Machine
Step 1.1 Download
Free version of Oracle VirtualBox can be downloaded from:
Download UBUNTU LTS 64 bit from the following link (Make sure its ISO format and for 64 bit):
Step 1.2 Installation
In the below screen shot click on the ‘+’ sign to add ISO which you have already downloaded to be loaded as CD drive.
If throws an error, saying something about that 64 bit support and about VT-x/AMD-V,
- It means that your BIOS does not support visualization.
- Perform the following steps. This is for my configuration yours may be a little different:
- Restart your computer and go to BIOS setup
- Goto UEFI Firmware>>Advanced>>CPU Setup >> Intel ® Virtualization Technology. Enable this.
- Save and exit.
- Now try to start the Ubuntu boot with the ISO image and it should work.
Click on install Ubuntu.
And after you have pressed continue the whole disk would be formatted. Nope just joking! (: Only the dynamic Disk allocated would be formatted.
(I live in Melbourne. One of the loveliest cities in the world.)
Step 2 Download Hadoop tar.gz
At this point you would like to reopen this document on Ubuntu. Transfer it by internet.
Most of the following steps are referred from:
- Downloading a stable release copy ending with tar.gz
- Create a new folder /home/hadoop
- Move the file hadoop.x.y.z.tar.gz to the folder /home/Hadoop
- Type/Copy/Paste: cd /home/hadoop
- Type/Copy/Paste: tar xzf hadoop*tar.gz
Step 3 Downloading and setting up Java
For more refer: http://www.wikihow.com/Install-Oracle-Java-on-Ubuntu-Linux
- Check if Java is already present, by
Type/Copy/Paste : java –version
- If it is 1.7.* then you can setup the JAVA_HOME Variable according to where it is setup.
- If you are confident to setup the JAVA_HOME variable please go ahead to step X. If not don’t worry and follow the following steps:
- First we will purge the Java installed.
Type/Copy/Paste : sudo apt-get purge openjdk-\*
- Make the directory where java would installed, by:
sudo mkdir -p /usr/local/java
- Download Java JDK and JRE from the link, look for linux, 64 bit and tar.gz ending file:
- Goto downloads folder and then copy to the folder we created for java:
Type/Copy/Paste: sudo cp -r jdk-*.tar.gz /usr/local/java
Type/Copy/Paste: sudo cp -r jre-*.tar.gz /usr/local/java
- Extract and install Java:
Type/Copy/Paste: cd /usr/local/java
Type/Copy/Paste: sudo tar xvzf jdk*.tar.gz
Type/Copy/Paste: sudo tar xvzf jre*.tar.gz
- Now put all the variables in the profile.
Type/Copy/Paste: sudo gedit /etc/profile
At the end copy paste the following.(Note: change the highlighted paths according to your installations. Version number would have changed from making this guide to your installation. So just make sure that the path you mention actually exists)
- Do the following so that Linux knows where Java is, (Note that the highlighted following paths may be needed to changed in accordance to your installation):
sudo update-alternatives –install “/usr/bin/java” “java” “/usr/local/java/jre1.7.0_40/bin/java” 1
sudo update-alternatives –install “/usr/bin/javac” “javac” “/usr/local/java/jdk1.7.0_40/bin/javac” 1
sudo update-alternatives –install “/usr/bin/javaws” “javaws” “/usr/local/java/jre1.7.0_40/bin/javaws” 1
sudo update-alternatives –set java /usr/local/java/jre1.7.0_40/bin/java
sudo update-alternatives –set javac /usr/local/java/jdk1.7.0_40/bin/javac
sudo update-alternatives –set javaws /usr/local/java/jre1.7.0_40/bin/javaws
- Refresh the profile by:
Type/Copy/Paste: . /etc/profile
- Test by typing Java –version.
Step 4 Stand Alone mode installed! Congratulations!
At this point you should have had got to the point that you can run Hadoop in Stand Alone mode. You can practice almost anything for practicing developments in Map Reduce. Test if you are successful:
Type/Copy/Paste: cd $HADOOP_INSTALL (going to the Hadoop directory)
Type/copy/Paste: mkdir input
Type/copy/Paste: bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
Type/copy/Paste: ls output/*
Step 5 Pseudo Distribution Mode
- Type/Copy/Paste: sudo apt-get install ssh (to install ssh)
- Type/Copy/Paste: sudo apt-get install rsync
- Change conf/core-site.xml to:
- Change conf/hdfs-site.xml to:
- Change conf/mapred-site.xml to:
- Edit conf/hadoop-env.sh look for JAVA_HOME and set it up
- Setup passwordless ssh by:
Type/copy/paste: ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
Type/copy/paste: cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
- To confirm that passwordless ssh has been setup type the following and you shouod not be prompted for a password.
Type/copy/paste: ssh localhost
- Format the name node:
Type/copy/paste: bin/hadoop namenode –format
- Start the all the demons:
- Stop all the demons:
I hope you made it this far. My heartiest Congratulations to you if you could install Hadoop !
You are on the road to learn one of the most complex and promising new technology in the current times!
Feel free to share your views on this .