Kafka Saprk Integration Issue - Task Not Serializable
When you Integrate Kafka with spark streaming. There are some very small things that needs to be taken care. These things are often ignored and we end up wasting lot of time in resolving issues which occur because of this. As part of this post , i will discuss these small small issues with you. 1. Task Not Serializable: If you have tried posting messages from spark streaming to kafka, there is very high probability that you will face this issues. Let us understand the problem first. when does it occur - when you create a object of a class and try to use it in any of transformation logic for dstream or rdd. Example - example code is below. If KafkaSink class is not declared as serializable, and you are using it as below, You will definitely face Task not serializable issue. kafkaStream.foreachRDD{ rdd => rdd.foreachPartition{ part => part.foreach(msg => kafkaSink.value.send("kafka_out_topic",msg.toSt ri...