Apache Kudu is a free and open source columnar storage system developed for the Apache Hadoop. It is an engine intended for structured data that supports low-latency random access millisecond-scale access to individual rows together with great analytical access patterns. It is a Big Data engine created make the connection between the widely spread Hadoop Distributed File System [HDFS] and HBase NoSQL Database.
Apache Kudu merges the upsides of HBase and Parquet. It is as fast as HBase at ingesting data and almost as quick as Parquet when it comes to analytics queries. It supports multiple query types, allowing you to perform the following operations:
Apache Kudu uses the RAFT consensus algorithm, as a result, it can be scaled up or down as required horizontally. In addition it comes with a support for update-in-place feature.
Apache Kudu comes optimized for SSD and it is designed to take advantage of the next persistent memory. It is able to scale to 10s of cores per server and even benefit of SIMD operations for data-parallel computation.
It features a ‘slowly changing dimension’ also known as SCD. This capability allows the user to keep track of changes inside a dimensional reference data.
Do you want to access data via SQL? Then, you’ll be happy to hear that Apache Kudu has tight integration with Apache Impala as well as Spark. As a result, you will be able to use these tools to insert, query, update and delete data from Kudu tablets by using their SQL syntax. Moreover, you can use JDBC or ODBC to connect existing or new applications no matter the language they have been written in, frameworks, and even business intelligence tools to your Kudu data, using Impala as the tootle to do this.