Posts

Showing posts from February, 2014

Pig Installation

Today we will learn how to install Pig. Installing pig is very simple and straight forward. Following are the steps to install pig. Pig requires Hadoop and java to be already installed. If you have not installed it, follow the link  here 1. Download Pig from  Apache Pig website . Check the compatibility of pig you are downloading with already installed hadoop on your machine. I am going to download  Pig 0.10.1 . 2. After the download is complete, go to download directory and extract Pig 0.10.1.tar.gz 3. Copy the extracted folder in $HOME/pig directory. 4. Edit /etc/bash.bashrc and set PIG_HOME with following command     sudo leafpad /etc/bash.bashrc   Then go to last line and copy following      export PIG_HOME=$HOME/pig     export PATH=$PATH:PIG_HOME/bin also set JAVA_HOME if not set earlier using following command. In my case my java is installed at /usr/lib/jvm/java-6-jdk-i386     export JAVA_HOME= /usr/lib/jvm/java-6-jdk-i386 5. After

Hadoop Installation Video

Image
Hi Friends... As promised earlier i have created few videos for Hadoop Installation tutorial. Currently these videos are about Pseudo distributed mode installation of hadoop. I will create few videos for fully distributed mode installation also. For now please find the videos below.  1. Video for installation of Lubuntu on windows. For this you should download vmware player from  vmware site  and install on windows machine. you will also need .iso file for lubuntu.  You can download it from  Lubuntu website . 2. After you have installed Lubuntu, you can go through following video and install Hadoop in Pseudo distributed mode. Stay tuned for more stuff on this blog. Please share your feedback and let me know if you want a post on any specific topic.

Hadoop Series: Hadoop Distributed File System

In Previous posts we learned how to install hadoop , Introductionto hadoop etc. today we will learn about HDFS (Hadoop Distributed file system). HDFS is component of Hadoop. It handles storage part of Hadoop. HDFS follows master slave architecture. Let us discuss Master slave Arch. Let us discuss, what is Master slave Arch.      In Master slave Arch. we have two kind of machines. First set is Master other is slaves. Master does following two things. 1. Plan 2. Monitor     Master is like Manager of your team, He will plan. If he has some work to do, Master will plan whom to assign that work.    Slaves do Following two things. 1. Work 2. Report     Slave is like developer of your team( :P  Please dont feel offended it is just for analogy). Slave does the actual work. If Master assign work to slaves and slaves works and complete the work. Similar to Manager of your team who wants to develop some software, he will plan who is going to develop which component of software. Main

Hive UDF Example

UDF(User Defined Function) is a Very important functionality provided by Hive. It is very simple to create a UDF for Hive. In this tutorial we will learn creating UDF and how to use it with hive. There are two possible ways to create UDF. 1. using org.apache.hadoop.hive.ql.exec.UDF 2. using  org.apache.hadoop.hive.ql.udf.generic.GenericUDF        If input and output to your custom function is basic type eg. Text, FloatWritable, DoubleWritable,IntWritable etc the use  org.apache.hadoop.hive.ql.exec.UDF.        If yor input and output can be Map, set, list type of data structure the use  using  org.apache.hadoop.hive.ql.udf.generic.GenericUDF. We will discuss the first type of UDF here. I will write one more post to discuss the second approach. First of all lets assume I want to create a hive function called toUpper which will convert a string to uppercase. Follow the following steps to achieve it. 1. Download and install eclipse from  here 2. Hive should be installed, if