Posts

Showing posts from September, 2016

Enterprise Kafka and Spark : Kerberos based Integration

In previous posts we have seen how to integrate Kafka with Spark streaming. However in a typical enterprise environment the connection between Kafka and spark should be secure. If you are using cloudera or hortonworks distribution of Hadoop/Spark, then most likely it will be Kerberos enabled connection. In this post we will see what are the configurations we have to make to enable kerberized kafka spark integration.  Before getting into details of integration, i wanted to make it clear that I assume you are using cloudera/hortonworks/MapR provided kafka and spark kerberos enablled installation. we will see that as a developer what additional things has to be performed by you. For this we have to first select security protocol provided by kafka. if you see kafka documentation, security.protocol property is used for this. we can select following values 1. PLAINTEXT 2. SASL_PLAINTEXT 3. SASL_SSL We will select SASL_PLAINTEXT for kerberos.  This property has to be set while creating

Advanced Spark : Custom Receiver For Spark Streaming

Image
Apache Spark has become very widely used tool in Big data world. It provides us a one stop shop to do lot of activities with data. You can do data warehouse activities , build Machine Learning application, Build Reporting solutions, create Stream processing Applications. Spark Streaming provides API for connecting different frameworks. As the big data world is growing fast, there are always some new tools that we may need to use in our projects and integrate with spark or spark streaming. In this post , we will look see how can we integrate any system with spark streaming using custom receiver API. There can be many reasons one has to do this. I am just listing few below for your understanding. 1. Spark streaming does not provide integration with your data source 2. Spark streaming integration provided is using different version of API that you source is using 3. You want to handle lower level details of how spark connects with source. 4. A situation where you are using older