Streaming Analytics with Spark, Kafka, Cassandra, and Akka

Download Slides

This talk will address how a new architecture is emerging for analytics, based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK). Popular architecture like Lambda separate layers of computation and delivery and require many technologies which have overlapping functionality. Some of this results in duplicated code, untyped processes, or high operational overhead, let alone the cost (i.e. ETL). I will discuss the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. We will cover how the particular set of technologies addresses common requirements and how collaboratively they work together to enrich and reinforce each other.

Learn more

  • Smack Stack and Beyond—Building Fast Data Pipelines
  • Real-Time End-to-End Integration with Apache Kafka in Apache Spark’s Structured Streaming

    « back
  • About Helena Edelson

    Helena has worked exclusively with Scala in production since 2010 on large scale distributed systems in the cloud. As a Senior Cloud Engineer she was on the first Scala team at VMware building multi-tenant cloud automation systems, then in big data architecting, building and deploying streaming and batch analytics pipelines for Cyber Security for real time threat analysis. Most recently she has worked on streaming analytics and machine learning at scale with Apache Spark, Cassandra, Kafka, Akka and Scala. Helena is a committer to the Spark Cassandra Connector and a contributor to Akka, adding new features in Akka Cluster such as the initial version of the cluster metrics API and AdaptiveLoadBalancingRouter. While working at SpringSource she was a contributor to several open source projects such as Spring Integration and Spring AMQP. Helena is a speaker at international Big Data and Scala conferences such as Spark Summit, QCon, Scala Days, and Philly Emerging Technology. She is currently VP of Product Engineering at Tuplejump.