Author Archives: Himanshu Gupta

About Himanshu Gupta

Himanshu Gupta is a lead consultant having more than 4 years of experience. He is always keen to learn new technologies. He not only likes programming languages but Data Analytics too. He has sound knowledge of "Machine Learning" and "Pattern Recognition".He believes that best result comes when everyone works as a team. He likes listening to Coding ,music, watch movies, and read science fiction books in his free time.

Structured Streaming: Philosophy behind it

Knoldus Blogs In our previous blogs: Structured Streaming: What is it? &Structured Streaming: How it works? We got to know 2 major points about Structured Streaming – It is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users … Continue reading

Posted in Scala | 1 Comment

Structured Streaming: What is it?

Knoldus Blogs With the advent of streaming frameworks like Spark Streaming, Flink, Storm etc. developers stopped worrying about issues related to a streaming application, like – Fault Tolerance, i.e., zero data loss, Real-time processing of data, etc. and started focussing only on … Continue reading

Posted in Scala | Leave a comment

KnolX: Understanding Spark Structured Streaming

Knoldus Blogs Hello everyone, Knoldus organized a session on 05th January 2018. The topic was “Understanding Spark Structured Streaming”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides: … Continue reading

Posted in Scala | Leave a comment

A Beginner’s Guide to Deploying a Lagom Service Without ConductR

Knoldus Blogs How to deploy a Lagom Service without ConductR? This question has been asked and answered by many, on different forums. For example, take a look at this question on StackOverflow – Lagom without ConductR? Here the user is trying to … Continue reading

Posted in Scala | Leave a comment

Spark Structured Streaming: A Simple Definition

Knoldus Blogs “Structured Streaming”, nowadays we are hearing this term in Apache Spark ecosystem quite a lot, as it is being preached as next big thing in scalable big data world. Although, we all know that Structured Streaming means a … Continue reading

Posted in Scala | Leave a comment

Apache Spark: 3 Reasons Why You Should Not Use RDDs

Knoldus Blogs Apache Spark, whenever we hear these two words, the first thing that comes to our mind is RDDs, i.e., Resilient Distributed Datasets. Now, it has been more than 5 years since Apache Spark came into existence and after its arrival … Continue reading

Posted in Scala | Leave a comment

Partition-Aware Data Loading in Spark SQL

Knoldus Blogs Data loading, in Spark SQL, means loading data in memory/cache of Spark worker nodes. For which we use to write following code: val connectionProperties = new Properties() connectionProperties.put(“user”, “username”) connectionProperties.put(“password”, “password”) val jdbcDF = .jdbc(“jdbc:postgresql:dbserver”, “schema.table”, connectionProperties) In … Continue reading

Posted in Scala | Leave a comment

2017 – Year of FAST Data

Knoldus Blogs As we approach 2017, there is a strong focus on Fast Data. This is a combination of data at rest and data in motion and the speed has to be remarkably fast. In the deck that follows, we … Continue reading

Posted in Scala | Leave a comment

Migration From Spark 1.x to Spark 2.x

Knoldus Blogs Hello Folks, As we know that we have latest release of Spark 2.0, with to much enhancement and new features. If you are using Spark 1.x and now you want to move your application with Spark 2.0 that … Continue reading

Posted in Scala | Leave a comment

Finding the Impact of a Tweet using Spark GraphX

Knoldus Blogs Social Network Analysis (SNA), a process of investigating social structures using Networks and Graphs, has become a very hot topic nowadays. Using it, we can answer many questions like: How many connections an individual have ?What is the … Continue reading

Posted in Scala | Leave a comment