Apache Spark: 3 Reasons Why You Should Not Use RDDs

Knoldus Blogs

Apache Spark, whenever we hear these two words, the first thing that comes to our mind is RDDs, i.e., Resilient Distributed Datasets. Now, it has been more than 5 years since Apache Spark came into existence and after its arrival a lot of things got changed in big data industry. But, the major change was dethroning of Hadoop MapReduce. I mean Spark literally replaced MapReduce and this happened because of easy to use API in Spark, lesser operational cost due to efficient use of resources, compatibility with a lot of existing technologies like YARN/Mesos, fault tolerance, and security.

Due to these reasons a lot of organizations have migrated their Big Data applications to Spark and the first thing they do is to learn – how to use RDDs. Which makes sense too as RDD is the building block of Spark and the whole idea of Spark is based on RDD. Also, it is a perfect replacement…

View original post 754 more words

About Himanshu Gupta

Himanshu Gupta is a lead consultant having more than 4 years of experience. He is always keen to learn new technologies. He not only likes programming languages but Data Analytics too. He has sound knowledge of "Machine Learning" and "Pattern Recognition".He believes that best result comes when everyone works as a team. He likes listening to Coding ,music, watch movies, and read science fiction books in his free time.
This entry was posted in Scala. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s