Harjeet's blog

Posts

Showing posts from February, 2014

Pig Installation

February 04, 2014

Today we will learn how to install Pig. Installing pig is very simple and straight forward. Following are the steps to install pig. Pig requires Hadoop and java to be already installed. If you have not installed it, follow the link here 1. Download Pig from Apache Pig website . Check the compatibility of pig you are downloading with already installed hadoop on your machine. I am going to download Pig 0.10.1 . 2. After the download is complete, go to download directory and extract Pig 0.10.1.tar.gz 3. Copy the extracted folder in $HOME/pig directory. 4. Edit /etc/bash.bashrc and set PIG_HOME with following command sudo leafpad /etc/bash.bashrc Then go to last line and copy following export PIG_HOME=$HOME/pig export PATH=$PATH:PIG_HOME/bin also set JAVA_HOME if not set earlier using following command. In my case my java is installed at /usr/lib/jvm/java-6-jdk-i386 ...

Hadoop Installation Video

February 03, 2014

Hi Friends... As promised earlier i have created few videos for Hadoop Installation tutorial. Currently these videos are about Pseudo distributed mode installation of hadoop. I will create few videos for fully distributed mode installation also. For now please find the videos below. 1. Video for installation of Lubuntu on windows. For this you should download vmware player from vmware site and install on windows machine. you will also need .iso file for lubuntu. You can download it from Lubuntu website . 2. After you have installed Lubuntu, you can go through following video and install Hadoop in Pseudo distributed mode. Stay tuned for more stuff on this blog. Please share your feedback and let me know if you want a post on any specific topic.

Hadoop Series: Hadoop Distributed File System

February 03, 2014

In Previous posts we learned how to install hadoop , Introductionto hadoop etc. today we will learn about HDFS (Hadoop Distributed file system). HDFS is component of Hadoop. It handles storage part of Hadoop. HDFS follows master slave architecture. Let us discuss Master slave Arch. Let us discuss, what is Master slave Arch. In Master slave Arch. we have two kind of machines. First set is Master other is slaves. Master does following two things. 1. Plan 2. Monitor Master is like Manager of your team, He will plan. If he has some work to do, Master will plan whom to assign that work. Slaves do Following two things. 1. Work 2. Report Slave is like developer of your team( :P Please dont feel offended it is just for analogy). Slave does the actual work. If Master assign work to slaves and slaves works and complete the work. Similar to Manager of your team who wants to develop some software, he will plan who...

Hive UDF Example

February 03, 2014

UDF(User Defined Function) is a Very important functionality provided by Hive. It is very simple to create a UDF for Hive. In this tutorial we will learn creating UDF and how to use it with hive. There are two possible ways to create UDF. 1. using org.apache.hadoop.hive.ql.exec.UDF 2. using org.apache.hadoop.hive.ql.udf.generic.GenericUDF If input and output to your custom function is basic type eg. Text, FloatWritable, DoubleWritable,IntWritable etc the use org.apache.hadoop.hive.ql.exec.UDF. If yor input and output can be Map, set, list type of data structure the use using org.apache.hadoop.hive.ql.udf.generic.GenericUDF. We will discuss the first type of UDF here. I will write one more post to discuss the second approach. First of all lets assume I want to create a hive function called toUpper which will convert a string to uppercase. Follow the following steps to achieve it. 1. Down...