Skip to main content

What is Apache Kylin?

A distributed OLAP engine that pre-calculates multidimensional cubes from Hadoop data, delivering sub-second queries on petabyte-scale datasets

10 Personas Data Management

Summary

  • Pre-computes OLAP cubes using MapReduce or Spark, storing results in HBase as key-value pairs that enable millisecond-level query responses on billions of rows
  • Provides ANSI SQL interface and seamless integration with BI tools like Tableau, Power BI, and Excel through JDBC, ODBC, and REST APIs for familiar analytics workflows
  • Handles star and snowflake schemas with support for incremental cube builds, approximate distinct counts via HyperLogLog, and compression techniques to optimize storage

What is Apache Kylin?

Apache Kylin is a distributed open source online analytics processing (OLAP) engine for interactive analytics Big Data. Apache Kylin has been designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop/Spark. In addition, it easily integrates with BI tools via ODBC driver, JDBC driver, and REST API. It was created by eBay in 2014, graduated to Top Level Project of Apache Software Foundation just one year later, in 2015 and won the Best Open Source Big Data Tool in 2015 as well as in 2016. Currently, it is being used by thousands of companies worldwide as their critical analytics application for Big Data. While other OLAP engines struggle with the data volume, Kylin enables query responses in the milliseconds. It provides sub-second level query latency over datasets scaling to petabytes. It gets its amazing speed by precomputing the various dimensional combinations and the measure aggregates via Hive queries and populating HBase with the results. Apache Kylin Infographic

A 5X LEADER

Gartner®: Databricks Cloud Database Leader

How Does Apache Kylin Work?

The Kylin query engine which can be accessed in Kylin’s user-friendly UI, via an API or via JDBC will leverage the Apache Calcite query processor and HBase features for rapid lookups. Kylin relies upon the Hadoop Eco-system:

  • Hive – Input source, pre-join star schema during cube building
  • MapReduce – Aggregate metrics during cube building
  • HDFS – Store intermediate files during cube building
  • HBase – Store and query data cubes
  • Calcite – SQL parsing, code generation, optimization How can Apache Kylin help your organization?
  • Very Fast OLAP Engine at Scale - Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data to seconds
  • ANSI SQL Interface on Hadoop - Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions. It can easily be used by both analysts and engineers as no programming needed
  • Seamless Integration with BI Tools - Kylin currently offers integration capability with BI Tools like Tableau, JDBC/ODBC/Rest API
  • Interactive Query Capability - Users can interact with Hadoop data via Kylin at sub-second latency
  • MOLAP cube query serving on billions of rows -  Users have the ability to define a data model and pre-build in Kylin even if it has more than 10+ billions of raw data records.

Open-source ODBC driver -  Kylin’s ODBC driver is built from scratch and works very well with Tableau.

Additional Resources

Never miss a Databricks post

Subscribe to our blog and get the latest posts delivered to your inbox