Posts

Spark DataFrame

Image
Spark has evolved over the years and Spark Provide Multiple ways to write applications. You Can develop a spark application using following API. You can watch Video on This topic as part of Free Spark Course RDD DataFrame Dataset As of Today, There are 3 base Data structures in Spark to write applications. As part Of This Post we will focus on Dataframe API. When Spark was developed, In beginning it had only RDD API to develop application. RDD api were completely functional in nature. People who were from java background or SQL backround used to find it difficult to write huge applications in Spark using RDD. Currently, when i Take interviews, If i ask candidates to write code in rdd, Very few of them seem confident and try to solve the problem using RDD API. So Spark guys, got inspired from world of pandas and R and brought in concept of dataframe. Dataframe helps you Imagine your data as a table. you can do your traditional operations on dataframe like grouby , cou

Why Datasets are Type Safe | Spark Interview questions

Image
This is very famous interview question in spark interviews. What is difference between RDD and dataframe and datasets. If the person answers this question then generally next question is what is type safety in Dataset. How it is helpful. Even though this is very simple to explain,  people do get confused to explain this. So as part of following videos i have tried to explain these concepts in very simple way. I hope it will be useful for you Please subscribe to this channel and share videos . let us knwo what other topics you want us to cover

Introduction to Kafka

Image
Kafka is very well suited and acceptable tool for events processing in Tech world today. Most of Streaming applications use kafka because of its simplicity , scalabilty and ease of use. As part of this video I am planning to cover basics of Kafka. I am also going to explain why people are prefering kafka over traditional queue based systems like activemq , rabbitmq etc. Then we will discuss about kafka architecture. as part of that we will also understand what is broker, what is kafka producer and what is kafka topic and what are kafka partitions. Please subscribe to this youtube channel and share your feedback

Spark Streaming Introduction

Image
Streaming applications are new Hot Thing in data processing world today. everyone wants to process data as soon as it is available. To cater to this need , there are new set of tools in market. Spark Tries to provide streaming using its basic batch processing API. It has been very successful and accepted well in industry for this. Spark Treats input stream of messages as mini batch. These batches run as frequently as possible. Minimum time interval is 1 sec. after every one second a new RDD is generated from stream of input messages and data is processed in batch way. It gives a feeling to client that he is using batch processing system. As part of the following video i have covered basic concepts of spark streaming . Please watch this video and share your thoughts. Please subscribe to channel and share what other topics you would like me to cover.  

Dynamic Resource Allocation in Spark | Spark Interview questions

Image
I am back with one more interesting Post on Spark. One of my Youtube channel's viewer requested a Video post on Dynamic Resource allocation in spark. I have seen in my experience that lot of folks enable dynamic resource allocation in spark , but dont start external shuffle service. so eventually they are not using dynamic resource allocation as it is mandatory. External shuffle service takes away responsibility of shuffling data from executors and let them focus on execution of tasks. This helps us make optimum utilization of resources and enables dynamic resource allocation. External shuffle service is also implemented in Yarn using auxiliary api provided by yarn. You can enable dynamic resource allocation and external shuffle in yarn by setting properties in yarn-site.xml. I have shared the details of all these things in the video.  Here are the links      I hope this video was useful. Please subscribe to our channel. Please also let me know if you have any question

Spark Architecture | Spark Execution Model

Image
Hi Friends, I am back with one more video. I hope you are enjoying the series of the videos on spark and hadoop interview questions. as part of today's video we are going to explain  1. what is Architecture of spark. 2. what is role of Driver 3. what is role of cluster manager 4. what is role of spark context 5. how a job is executed on spark I hope you will enjoy this video. please share feed back on this video Please subscribe our channel and share video with your friends

Why Spark is Better than Hadoop

Image
I have been creating videos related to Hadoop and spark from some time now. So as part of the series i am sharing video about why spark is better than Hadoop. As part of this video i have explained follwoing 1. How spark avoids creating jvms again and again. Spark reuses the resources 2. Caching in spark 3. When spark gives far more faster results than Hadoop. 4. Code in spark is very compact It must be noted that there are lot of cases when Hadoop is better than spark.  I hope you will like the video. Please subscribe to my channel. Please also let us know. what are some other topics that you want a video on.