
Showing posts from 2018

Spark DataFrame

Spark has evolved over the years and Spark Provide Multiple ways to write applications. You Can develop a spark application using following API. You can watch Video on This topic as part of Free Spark Course RDD DataFrame Dataset As of Today, There are 3 base Data structures in Spark to write applications. As part Of This Post we will focus on Dataframe API. When Spark was developed, In beginning it had only RDD API to develop application. RDD api were completely functional in nature. People who were from java background or SQL backround used to find it difficult to write huge applications in Spark using RDD. Currently, when i Take interviews, If i ask candidates to write code in rdd, Very few of them seem confident and try to solve the problem using RDD API. So Spark guys, got inspired from world of pandas and R and brought in concept of dataframe. Dataframe helps you Imagine your data as a table. you can do your traditional operations on dataframe like grouby , cou...

Why Datasets are Type Safe | Spark Interview questions

This is very famous interview question in spark interviews. What is difference between RDD and dataframe and datasets. If the person answers this question then generally next question is what is type safety in Dataset. How it is helpful. Even though this is very simple to explain,  people do get confused to explain this. So as part of following videos i have tried to explain these concepts in very simple way. I hope it will be useful for you Please subscribe to this channel and share videos . let us knwo what other topics you want us to cover

Introduction to Kafka

Kafka is very well suited and acceptable tool for events processing in Tech world today. Most of Streaming applications use kafka because of its simplicity , scalabilty and ease of use. As part of this video I am planning to cover basics of Kafka. I am also going to explain why people are prefering kafka over traditional queue based systems like activemq , rabbitmq etc. Then we will discuss about kafka architecture. as part of that we will also understand what is broker, what is kafka producer and what is kafka topic and what are kafka partitions. Please subscribe to this youtube channel and share your feedback

Spark Streaming Introduction

Streaming applications are new Hot Thing in data processing world today. everyone wants to process data as soon as it is available. To cater to this need , there are new set of tools in market. Spark Tries to provide streaming using its basic batch processing API. It has been very successful and accepted well in industry for this. Spark Treats input stream of messages as mini batch. These batches run as frequently as possible. Minimum time interval is 1 sec. after every one second a new RDD is generated from stream of input messages and data is processed in batch way. It gives a feeling to client that he is using batch processing system. As part of the following video i have covered basic concepts of spark streaming . Please watch this video and share your thoughts. Please subscribe to channel and share what other topics you would like me to cover.  

Dynamic Resource Allocation in Spark | Spark Interview questions

I am back with one more interesting Post on Spark. One of my Youtube channel's viewer requested a Video post on Dynamic Resource allocation in spark. I have seen in my experience that lot of folks enable dynamic resource allocation in spark , but dont start external shuffle service. so eventually they are not using dynamic resource allocation as it is mandatory. External shuffle service takes away responsibility of shuffling data from executors and let them focus on execution of tasks. This helps us make optimum utilization of resources and enables dynamic resource allocation. External shuffle service is also implemented in Yarn using auxiliary api provided by yarn. You can enable dynamic resource allocation and external shuffle in yarn by setting properties in yarn-site.xml. I have shared the details of all these things in the video.  Here are the links      I hope this video was useful. Please subscribe to our channel. Please also let me know if you ha...

Spark Architecture | Spark Execution Model

Hi Friends, I am back with one more video. I hope you are enjoying the series of the videos on spark and hadoop interview questions. as part of today's video we are going to explain  1. what is Architecture of spark. 2. what is role of Driver 3. what is role of cluster manager 4. what is role of spark context 5. how a job is executed on spark I hope you will enjoy this video. please share feed back on this video Please subscribe our channel and share video with your friends

Why Spark is Better than Hadoop

I have been creating videos related to Hadoop and spark from some time now. So as part of the series i am sharing video about why spark is better than Hadoop. As part of this video i have explained follwoing 1. How spark avoids creating jvms again and again. Spark reuses the resources 2. Caching in spark 3. When spark gives far more faster results than Hadoop. 4. Code in spark is very compact It must be noted that there are lot of cases when Hadoop is better than spark.  I hope you will like the video. Please subscribe to my channel. Please also let us know. what are some other topics that you want a video on. 

What is Binary search tree

This post talks about a way to find out if a tree is binary search tree. this is very common interview question for data structures.  following is the link for video to find out What is Binary search Tree. Binary search Tree is also called BST Thank you for watching the video. Please share and subscribe our channel. Please also share your thoughts that what other topics you want us to cover as part of video and post.

Validate If a Tree is Binary Search Tree ( BST )

In This post we are going to find out that how we can write a algorithm to find if a Binary tree is Binary Search Tree or not. This is a very common data structure based interview question. I have tried to keep things simple while explaining the solution.  Here is the video of solution Thank You for watching this video. Please subscribe and share our channel. Please let us know, what other type of videos you want us to make.

Introduction to Hadoop

In This post, we have covered introductory concepts of Hadoop. following are the topics covered. 1. What is Big data 2. Introduction to Hadoop and origin. 3. Master Slave Architecture 4. What is Hive 5. What is Pig 6. what is HDFS 7. What is Map Reduce Following is the link of Video Thank You For watching the Video. Please subscribe and share our channel. Please also share your thoughts on what other big data concepts and hadoop concepts video we should prepare for you

Introduction To HDFS

This Post Discuss about basics of HDFS. following are the topics covered 1. What is HDFS 2. Architecture of HDFS 3. How data is accessed in HDFS 4. Concept of Namenode, Datanode and Secondary Namenode 5. advantages of Using HDFS 6. Replication Factor in HDFS 7. Block size in HDFS 8. Why HDFS is better Filesystem for Big Data Following is the link for video Thank you for watching Video. Please subscribe to out channel and share your thoughts. Please also tell us on what other big data topics you want videos.

Hive Introduction

This Post covers basics of Hive. It covers following concepts of Hive 1. What is Hive 2. Hive Architecture 3. Hive Tables  4. Internal Table and external table 5. How to Create Internal table 6. How to create external table 7. How to load data in a Hive table data can be loaded from local filesystem or from hdfs file system. Please watch the following video  Please share your comments and views

Hadoop Interview Questions | Combiner

As Past of This Post , I am sharing my thoughts on Some Hadoop Interview questions. Please share your comments and thoughts Hadoop Combiner Thanks For watching This. Please subscribe channel for more Hadoop Interview questions