Enterprise Kafka and Spark : Kerberos based Integration

In previous posts we have seen how to integrate Kafka with Spark streaming. However in a typical enterprise environment the connection between Kafka and spark should be secure. If you are using cloudera or hortonworks distribution of Hadoop/Spark, then most likely it will be Kerberos enabled connection. In this post we will see what are the configurations we have to make to enable kerberized kafka spark integration.
 Before getting into details of integration, i wanted to make it clear that I assume you are using cloudera/hortonworks/MapR provided kafka and spark kerberos enablled installation. we will see that as a developer what additional things has to be performed by you. For this we have to first select security protocol provided by kafka. if you see kafka documentation, security.protocol property is used for this. we can select following values
1. PLAINTEXT
2. SASL_PLAINTEXT
3. SASL_SSL
We will select SASL_PLAINTEXT for kerberos. This property has to be set while creating kafka consumer object.
Once this is done you should create following jaas file(name it as kafka_jass.conf). 
If you are using keytab file for authentication
KafkaClient {
   com.sun.security.auth.module.Krb5LoginModule required
   useKeyTab=true
   keyTab="./keytab"
   storeKey=true
   useTicketCache=false
   serviceName="kafka"
   principal="my_id/_HOST@EXAMPLE.COM";

};
if you are using kerberos ticket for authentication , use following jaas configuration

KafkaClient {
   com.sun.security.auth.module.Krb5LoginModule required
   useKeyTab=false
   storeKey=true
   useTicketCache=true
   serviceName="kafka"
   principal="my_id/_HOST@EXAMPLE.COM";

};
Once this is done use following command to run the spark code. 
spark-submit --master yarn-client \
--files "kafka_jaas.conf,/path/to/keytab" \
--driver-java-options "-Djava.security.auth.login.config=./kafka_jaas.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./kafka_jaas.conf" \
--class <user-main-class> \
<user-application.jar> 

When using kerberos tickets make sure you are setting SPARK_YARN_USER_ENV property so that every executer knows where to pic kerberos ticket from
export SPARK_YARN_USER_ENV="KRB5CCNAME=/path/to/kerberos/ticket"
then run following
spark-submit --master yarn-client \
--files "kafka_jaas.conf" \
--driver-java-options "-Djava.security.auth.login.config=./kafka_jaas.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./kafka_jaas.conf" \
--class <user-main-class> \
<user-application.jar>
Now Let us understand some of important properties.
--files : helps you pass files to all executors. Files are copied in application container folder.
--driver-java-options : helps you set java properties for driver
spark.executor.extraJavaOptions : helps you set java properties for executors.
SPARK_YARN_USER_ENV : helps you set environment properties before any executor is started.
If you want to know about jaas files, go through this link   
If you want to see how to create custom receiver the follow this link.
Thats all i had for this post. Please feel free to share your thoughts on this.

Comments

Popular posts from this blog

Hive UDF Example

Custom UDF in Apache Spark

Hadoop series : Pseudo Distributed Mode Hadoop Instalation