Setting up Hadoop made easy

Click Here for Setting up Hadoop 2 Made Easy

Version details

Following are the details of components used to install hadoop, all license free:

  1. Hadoop 1.2.1
  2. Ubuntu LTS 12.04 (running on virtual Machine) 64 Bit
  3. Windows 8. (The same thing can be done on mac, i.e., install a virtual machine on mac and follow the below procedure).

Step 1 Installing Virtual Machine

Step 1.1 Download

Free version of Oracle VirtualBox can be downloaded from:

https://www.virtualbox.org/wiki/Downloads

Download UBUNTU LTS 64 bit from the following link (Make sure its ISO format and for 64 bit):

http://www.ubuntu.com/download/desktop

Step 1.2 Installation

In the below screen shot click on the ‘+’ sign to add ISO which you have already downloaded to be loaded as CD drive.

Press Start.

If throws an error, saying something about that 64 bit support and about VT-x/AMD-V,

  • It means that your BIOS does not support visualization.
  • Perform the following steps. This is for my configuration yours may be a little different:
    • Restart your computer and go to BIOS setup
    • Goto UEFI Firmware>>Advanced>>CPU Setup >> Intel ® Virtualization Technology. Enable this.
    • Save and exit.
  • Now try to start the Ubuntu boot with the ISO image and it should work.

Click on install Ubuntu.

And after you have pressed continue the whole disk would be formatted. Nope just joking! (: Only the dynamic Disk allocated would be formatted.

(I live in Melbourne. One of the loveliest cities in the world.)

Step 2 Download Hadoop tar.gz

At this point you would like to reopen this document on Ubuntu. Transfer it by internet.

Most of the following steps are referred from:

http://hadoop.apache.org/docs/stable/single_node_setup.html

  1. Downloading a stable release copy ending with tar.gz
  2. Create a new folder /home/hadoop
  3. Move the file hadoop.x.y.z.tar.gz to the folder /home/Hadoop
  4. Type/Copy/Paste: cd /home/hadoop
  5. Type/Copy/Paste: tar xzf hadoop*tar.gz

Step 3 Downloading and setting up Java

For more refer: http://www.wikihow.com/Install-Oracle-Java-on-Ubuntu-Linux

  1. Check if Java is already present, by

Type/Copy/Paste : java –version

  1. If it is 1.7.* then you can setup the JAVA_HOME Variable according to where it is setup.
  2. If you are confident to setup the JAVA_HOME variable please go ahead to step X. If not don’t worry and follow the following steps:
  3. First we will purge the Java installed.

Type/Copy/Paste : sudo apt-get purge openjdk-\*

  1. Make the directory where java would installed, by:

sudo mkdir -p /usr/local/java

  1. Download Java JDK and JRE from the link, look for linux, 64 bit and tar.gz ending file:

    http://www.oracle.com/technetwork/java/javase/downloads/index.html

  2. Goto downloads folder and then copy to the folder we created for java:

    Type/Copy/Paste: sudo cp -r jdk-*.tar.gz /usr/local/java

    Type/Copy/Paste: sudo cp -r jre-*.tar.gz /usr/local/java

  3. Extract and install Java:

    Type/Copy/Paste: cd /usr/local/java

    Type/Copy/Paste: sudo tar xvzf jdk*.tar.gz

    Type/Copy/Paste: sudo tar xvzf jre*.tar.gz

  4. Now put all the variables in the profile.

Type/Copy/Paste: sudo gedit /etc/profile

At the end copy paste the following.(Note: change the highlighted paths according to your installations. Version number would have changed from making this guide to your installation. So just make sure that the path you mention actually exists)

JAVA_HOME=/usr/local/java/jdk1.7.0_40

PATH=$PATH:$JAVA_HOME/bin

JRE_HOME=/usr/local/java/jre1.7.0_40

PATH=$PATH:$JRE_HOME/bin

HADOOP_INSTALL=/home/hadoop/Hadoop/hadoop-1.2.1

PATH=$PATH:$HADOOP_INSTALL/bin

export JAVA_HOME

export JRE_HOME

export PATH

  1. Do the following so that Linux knows where Java is, (Note that the highlighted following paths may be needed to changed in accordance to your installation):

sudo update-alternatives –install “/usr/bin/java” “java” “/usr/local/java/jre1.7.0_40/bin/java” 1

sudo update-alternatives –install “/usr/bin/javac” “javac” “/usr/local/java/jdk1.7.0_40/bin/javac” 1

sudo update-alternatives –install “/usr/bin/javaws” “javaws” “/usr/local/java/jre1.7.0_40/bin/javaws” 1

sudo update-alternatives –set java /usr/local/java/jre1.7.0_40/bin/java

sudo update-alternatives –set javac /usr/local/java/jdk1.7.0_40/bin/javac

sudo update-alternatives –set javaws /usr/local/java/jre1.7.0_40/bin/javaws

  1. Refresh the profile by:

Type/Copy/Paste: . /etc/profile

  1. Test by typing Java –version.

     

Step 4 Stand Alone mode installed! Congratulations!

At this point you should have had got to the point that you can run Hadoop in Stand Alone mode. You can practice almost anything for practicing developments in Map Reduce. Test if you are successful:

Type/Copy/Paste: cd $HADOOP_INSTALL (going to the Hadoop directory)

Type/copy/Paste: mkdir input

Type/copy/Paste: bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’

Type/copy/Paste: ls output/*

Step 5 Pseudo Distribution Mode

  1. Type/Copy/Paste: sudo apt-get install ssh (to install ssh)
  2. Type/Copy/Paste: sudo apt-get install rsync
  3. Change conf/core-site.xml to:

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

  1. Change conf/hdfs-site.xml to:

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

  1. Change conf/mapred-site.xml to:

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

  1. Edit conf/hadoop-env.sh look for JAVA_HOME and set it up

    export JAVA_HOME=/usr/local/java/jdk1.7.0_40

  2. Setup passwordless ssh by:

    Type/copy/paste: ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa

    Type/copy/paste: cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

  3. To confirm that passwordless ssh has been setup type the following and you shouod not be prompted for a password.

    Type/copy/paste: ssh localhost

  4. Format the name node:

    Type/copy/paste: bin/hadoop namenode –format

  5. Start the all the demons:

    Type/copy/paste: bin/start–all.sh

  6. UI for Name node http://localhost:50070/ and Jobtracker http://localhost:50030/
  7. Stop all the demons:

    Type/copy/paste: bin/stop–all.sh

 

I hope you made it this far. My heartiest Congratulations to you if you could install Hadoop !

You are on the road to learn one of the most complex and promising new technology in the current times!

Feel free to share your views on this smiley .

moderatorHadoop Tutorials

4 Comments on “Hadoop Tutorials”

  1. Alfred

    Hi,

    Everything goes well until the step whereby I need to establish passwordless shh.

    I got this error: “Generating public/private dsa key pair.
    passphrase too short: have 3 bytes, need > 4
    Saving the key failed: /home/alfred/.ssh/id_dsa.”

    Could you explain and advice me on this error please? Thank you..

    Regards,
    Alfred

  2. prateek

    after putiing the following command..
    prateek@prateek-VirtualBox:~/hadoop/hadoop-2.3.0$ bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’

    I am facing the following error..

    bin/hadoop: line 133: /usr/local/java/jdk1.8.0_05/bin/java: No such file or directory

  3. Nitesh

    Hey David,

    This are the instructions for installation of hadoop 1.2.1. 

    The jar file hadoop-examples-*.jar is shifted to a different folder in Hadoop 2.x.x. So you'll have to follow other steps. 

    I am soon going to do a video on Hadoop 2.x.x, so you can wait for it or else, you can go ahead and install Hadoop 1.2.1 as from programming purposes, they are not any different as both hadoop 2.x.x and hadoop 1.2.1 use the same api's

    Hope this helps.

    Best,

    Nitesh

  4. David William

    Hi, I followed every steps in the tutorial and video.

    However, when I reached Step 4:

    Type/Copy/Paste: cd $HADOOP_INSTALL (going to the Hadoop directory)

    Type/copy/Paste: mkdir input

    Type/copy/Paste: bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

    Type/copy/Paste: ls output/*

    I am having errors that says "Not a valid JAR: /home/david/hadoop/hadoop-2.3.0/hadoop-examples-*.jar
    david@ubuntu:~/hadoop/hadoop-2.3.0$ "

    Did I do anything wrong up to here? 

     

    And

    I continued to the next steps, and when I entered  Type/copy/paste: bin/start–all.sh

    It says that there is no such file or directory.

    It would be great if you can really help me out with this problem. I am looking forward to your replies.

    Thank you.

     

    Regards,

    David

     

     

     

Leave a Reply to David William Cancel reply

Your email address will not be published. Required fields are marked *