Using Lzo compression codec in Hadoop
While working on Hadoop, most of the time, the files that we handle are very huge. It is very much required that we compress these kind of files and then use them with Hive or Pig. Hadoop provide various compression formats. there are different advantages and disadvantages of each format. Let us start with different options of compression of a file available with us. Name Tool Splittable gzip gzip No LZO lzop Yes(If Indexed ) bzip bzip2 Yes Snappy NA No Normally, you will like to chose an option where you can split the file and use power of Map Reduce to process that file. otherwise you will be forced to use single Mapper to process that file. I normally prefer LZO form