How Metacog Implemented Agile Apache Spark Application Development to Release New Products Twice as Fast

We are proud to announce that Metacog, the provider of an online learning analytics platform, chose Databricks to implement their entire production Apache Spark environment.

You can read the press release here.

Metacog helps education institutions, corporations, and government entities to monitor and analyze how individuals tackle open-ended performance tasks to assess whether learning goals have been met. By using machine learning techniques to score how the people interact with the assessment, Metacog’s product can make more accurate and meaningful assessments compared to traditional multiple-choice tests. Since Metacog’s customers can tailor each assessment to their unique learning needs, Metacog’s ability to satisfy its customers hinges on building a platform that can quickly deploy a large number of machine learning models.

Metacog chose Apache Spark as the big data engine because of its flexibility in performing ETL and developing machine learning algorithms for billions of data points. Putting Apache Spark into production proved to be very challenging. Building Spark infrastructure directly from open source was impractical because it took too much time to keep Spark updated to the latest version, while using a cloud-based Spark provider also failed because the interface they offered was too rudimentary for efficient application development and testing. As a result, Metacog developers were not able to thoroughly test their code on Spark clusters during development, and serious bugs surfaced late in the release cycle, incurring significant delays.

Metacog partnered with Databricks because it offers unparalleled Spark expertise and a complete platform with full-featured APIs and a visual development environment. With Databricks, Metacog automated the entire test, integration, and delivery of their Spark code from concept to production allowing developers, data scientists, and the DevOps team to seamlessly:

  • Access the latest version of the product code.
  • Develop and run the code within their preferred toolset using real Spark clusters – allowing the team to integrate IDEs such as IntelliJ with Databricks in addition to using the built-in Databricks notebooks.
  • Merge improvements that then automatically deploy to a shared stage environment that mirrors the production environment.

Screen Shot 2016-03-30 at 6.38.48 AM

Metacog architecture – click on the picture for more details

Databricks enabled Metacog’s developers and data scientists to tune machine learning algorithms on real data using Spark clusters instead of smaller simulated data sets on their laptops. For Metacog’s DevOps team, Databricks helped them to simplify version management and server provisioning while optimizing cost with automated scripts.

Using Databricks, Metacog built a high-performance Spark application development and continuous integration system that allowed them to:

  • Double the release cadence from 12 to 24 times a year.
  • Achieve 28% infrastructure savings.
  • Reduce personnel onboarding time by 75%.
  • Reallocate 20% of engineering time from maintenance to product development.

Download this case study to learn more about how Metacog is using Databricks.

To try out Databricks for yourself, sign-up for a 14-day free trial today!

Try Databricks for free Get started

Sign up