Himanshu Gupta

Lead Consultant, Knoldus Inc.

I am Lead Consultant at Knoldus Software LLP. I have been developing reactive products for the last 4 years using Spark/Scala and Akka. I have developed complex solutions for media and retail industry using machine learning in Scala and Spark. I am a technology enthusiast and blogs frequently about the Scala ecosystem. I also trains engineers on Spark and Akka.

Past sessions

Summit Europe 2019 Blue Pill/Red Pill: The Matrix of Thousands of Data Streams

October 16, 2019 05:00 PM PT

Designing a streaming application which has to process data from 1 or 2 streams is easy. Any streaming framework which provides scalability, high-throughput, and fault-tolerance would work. But when the number of streams start growing in order 100s or 1000s, managing them can be daunting. How would you share resources among 1000s of streams with all of them running 24x7? Manage their state, Apply advanced streaming operations, Add/Delete streams without restarting? This talk explains common scenarios & shows techniques that can handle thousands of streams using Spark Structured Streaming.

Summit Europe 2018 Smart Searching Through Trillion of Research Papers with Apache Spark ML

October 2, 2018 05:00 PM PT

Every publication has a rich set of documents that contain information about different domains. Mostly, these documents keeps on sitting in data warehouses. If used wisely, they can prove to be a golden set for companies operating in domains like pharma, medical, or financial institutions.

For example, today it takes any pharmaceutical company upto 12 years and $2 billion to bring a single new drug to market. Despite the huge spend, scientists in Pharma don't have a way to find the data on the work which is already done. They just redo the whole thing, wasting money on duplicate work.

The biggest challenge in making those documents searchable is that they need to be tagged with their corresponding topics for which SMEs [Subject Matter Experts] are required. SMEs would read the document and fetch the topics, tag it with the topics. This way of tagging documents is slow and expensive.

This talk explains how we can apply Spark ML to tag 100s of thousands of documents. Applying ML will not only make tagging process faster & less expensive but also can explore new fields which are overlooked by SMEs.

Session hashtag: #SAISEco3