Install Apache Spark on Ubuntu
Its been long time since I have written a blog. Meanwhile i was exploring the work of Apache spark. so i thought i will share my knowledge on how to Install spark on Ubuntu.
As usual I am going to share commands for both Ubuntu and Lubuntu. Both are very similar. First of all we have to make sure that we have java installed on our machine. I am using java 7 for spark. If you already have java installed on your machine, then you don't need to run following command. If you do not have java installed on your machine, then run following command to install java 7.
sudo apt-get install openjdk-7-jdk
This will install jdk 7 on your machine. if you do not know which version of java is already installed, use following command
java -version.
if you have multiple versions of java and want to change the default version. then use following command and select specific version.
sudo update-alternatives --config java
After installing java, we have to make sure that Scala is installed on our machine.I am using scala version 2.10.4. you can download it from here
for all the next steps i am assuming that home directory in your machine is "/home/hduser" and we are going to install spark and scala in this directory. you can install scala and spark some where else also but just to keep things simple i am going to install every thing in "/home/hduser".
1. Download and copy scala in /home/hduser directory, you will get a tar compressed file.
2. Extract scala tar and change the name of folder from "scala-2.10.4" to "scala".
3. Set scala home in environment properties. Edit /etc/bash.bashrc file
In Ubuntu
sudo gedit /etc/bash.bashrc
In Lubuntu
sudo leafpad /etc/bash.bashrc
4. Copy following lines at the end of file.
export SCALA_HOME=/home/hduser/scala
export PATH=$PATH:$SCALA_HOME/bin
save the file and close.
5. Download Spark 1.3.1 from here
6. Untar the spark and copy it in "/home/hduser" directory
7. change the name of directory from "spark-1.3.1" to "spark"
8. open terminal and type scala and then enter. just to verify if scala path is set properly. It will show shell as below.
scala>
after this come out of scala shell.
9. Change your current folder to spark folder with following command
cd /home/hduser/spark
10. Now we have to build spark with the hadoop version that we are using. In my case i am using hadoop 2.4.0 , so i will run following command.
SPARK_HADOOP_VERSION=2.4.0 sbt/sbt assembly -mem 512
you should change this command for the version that you are using. I have assigned 512 memory while running this build process as some times scala will throw error while running this and show it is short of memory.
11. You can get a prebuild version of spark also and use it with your specific version of Hadoop from here
12. Once your installation is complete you can run following command to check the installation
./bin/run-example SparkPi 10
this command will run a already existing example and if it completes successfully then you can assume that your spark is also installed and built successfully.
thank you for going through the steps, please feel free to drop a comment if you face any issues.
As usual I am going to share commands for both Ubuntu and Lubuntu. Both are very similar. First of all we have to make sure that we have java installed on our machine. I am using java 7 for spark. If you already have java installed on your machine, then you don't need to run following command. If you do not have java installed on your machine, then run following command to install java 7.
sudo apt-get install openjdk-7-jdk
This will install jdk 7 on your machine. if you do not know which version of java is already installed, use following command
java -version.
if you have multiple versions of java and want to change the default version. then use following command and select specific version.
sudo update-alternatives --config java
After installing java, we have to make sure that Scala is installed on our machine.I am using scala version 2.10.4. you can download it from here
for all the next steps i am assuming that home directory in your machine is "/home/hduser" and we are going to install spark and scala in this directory. you can install scala and spark some where else also but just to keep things simple i am going to install every thing in "/home/hduser".
1. Download and copy scala in /home/hduser directory, you will get a tar compressed file.
2. Extract scala tar and change the name of folder from "scala-2.10.4" to "scala".
3. Set scala home in environment properties. Edit /etc/bash.bashrc file
In Ubuntu
sudo gedit /etc/bash.bashrc
In Lubuntu
sudo leafpad /etc/bash.bashrc
4. Copy following lines at the end of file.
export SCALA_HOME=/home/hduser/scala
export PATH=$PATH:$SCALA_HOME/bin
save the file and close.
5. Download Spark 1.3.1 from here
6. Untar the spark and copy it in "/home/hduser" directory
7. change the name of directory from "spark-1.3.1" to "spark"
8. open terminal and type scala and then enter. just to verify if scala path is set properly. It will show shell as below.
scala>
after this come out of scala shell.
9. Change your current folder to spark folder with following command
cd /home/hduser/spark
10. Now we have to build spark with the hadoop version that we are using. In my case i am using hadoop 2.4.0 , so i will run following command.
SPARK_HADOOP_VERSION=2.4.0 sbt/sbt assembly -mem 512
you should change this command for the version that you are using. I have assigned 512 memory while running this build process as some times scala will throw error while running this and show it is short of memory.
11. You can get a prebuild version of spark also and use it with your specific version of Hadoop from here
12. Once your installation is complete you can run following command to check the installation
./bin/run-example SparkPi 10
this command will run a already existing example and if it completes successfully then you can assume that your spark is also installed and built successfully.
thank you for going through the steps, please feel free to drop a comment if you face any issues.
Comments
Post a Comment