Install Apache Spark on Ubuntu

Its been long time since I have written a blog. Meanwhile i was exploring the work of Apache spark. so i thought i will share my knowledge on how to Install spark on Ubuntu.
    As usual I am going to share commands for both Ubuntu and Lubuntu. Both are very similar. First of all we have to make sure that we have java installed on our machine. I am using java 7 for spark. If you already have java installed on your machine, then you don't need to run following command. If you do not have java installed on your machine, then run following command to install java 7.

       sudo apt-get install openjdk-7-jdk

This will install jdk 7 on your machine. if you do not know which version of java is already installed, use following command

      java -version.

if you have multiple versions of java and want to change the default version. then use following command and select specific version.

     sudo update-alternatives --config java

After installing java, we have to make sure that Scala is installed on our machine.I am using scala version 2.10.4. you can download it from here
 
for all the next steps i am assuming that home directory in your machine is "/home/hduser" and we are going to install spark and scala in this directory. you can install scala and spark some where else also but just to keep things simple i am going to install every thing in "/home/hduser".

1. Download and copy scala in /home/hduser directory, you will get a tar compressed file.

2. Extract scala tar and change the name of folder from "scala-2.10.4" to "scala".

3. Set scala home in environment properties. Edit /etc/bash.bashrc file

In Ubuntu

      sudo gedit /etc/bash.bashrc

In Lubuntu

     sudo leafpad /etc/bash.bashrc

4. Copy following lines at the end of file.

      export SCALA_HOME=/home/hduser/scala
      export PATH=$PATH:$SCALA_HOME/bin

save the file and close.

5. Download Spark 1.3.1  from here

6. Untar the spark and copy it in "/home/hduser" directory

7. change the name of directory from "spark-1.3.1" to "spark"

8. open terminal and type scala and then enter. just to verify if scala path is set properly. It will show shell as below.

       scala>

    after this come out of scala shell.

9. Change your current folder to spark folder with following command

      cd /home/hduser/spark

10. Now we have to build spark with the hadoop version that we are using. In my case i am using hadoop 2.4.0 , so i will run following command.

      SPARK_HADOOP_VERSION=2.4.0 sbt/sbt assembly -mem 512

you should change this command for the version that you are using. I have assigned 512 memory while running this build process as some times scala will throw error while running this and show it is short of memory.

11. You can get a prebuild version of spark also and use it with your specific version of Hadoop from here

12. Once your installation is complete you can run following command to check the installation

       ./bin/run-example SparkPi 10

this command will run a already existing example and if it completes successfully then you can assume that your spark is also installed and built successfully.

thank you for going through the steps, please feel free to drop a comment if you face any issues.

Comments

Popular posts from this blog

Hive UDF Example

Custom UDF in Apache Spark

Hadoop series : Pseudo Distributed Mode Hadoop Instalation