Databricks on Alibaba

Deeply integrated with Alibaba Cloud services, Databricks DataInsight simplifies big data analytics and AI

Databricks on Alibaba

背景

Databricks DataInsight

Databricks DataInsight is a fully managed platform for data and analytics based on Apache Spark™. DataInsight is built on Databricks Runtime and Delta Lake. Integrated with Alibaba Cloud services, it ensures data security and allows you to configure monitoring and alert policies, as well as dynamic cluster scaling. It meets the analytics needs of data analysts, data engineers and data scientists.

Better performance

Better performance

Databricks Runtime provides a 50x improvement over open source Apache Spark™

Streaming and batch integration

Streaming and batch integration

Databricks Delta Lake provides ACID transaction capabilities for data lake analytics, processing both batch and streaming data sets

Collaborative analysis

Collaborative analysis

Databricks DataInsight meets the analytics needs of data scientists, data engineers and business analysts, and provides an interactive and collaborative notebook environment

Real-time data insight

Real-time data insight

Separate computing and storage reduces data redundancy and enables data access for multiple audiences, lowering data storage costs and providing independent scalability

A fully managed analytics platform

Quickly start fully managed clusters and pay only for what you use

Cluster size

Cluster size

Set the number of nodes according to job needs, with high availability cluster support

Instance selection

Instance selection

Supports three instance type families: ECS general type, computing type and memory type

A totally collaborative platform for innovation

Multiple users, across teams, can share data and collaborate interactively

Notebook

Notebook

A collaborative workspace that provides interactive job execution mode, supports Apache Spark, PySpark, Spark R and Spark SQL jobs, with a visual display of analytics results

Unified metadata

Unified metadata

Meta-information of databases and tables can be shared between clusters without duplication

Fully compatible with the Apache Spark ecosystem

100% compatible with open source Apache Spark

Databricks Runtime

Databricks Runtime

Performance optimized Databricks Runtime based on Apache Spark. I/O optimized for Alibaba Cloud OSS, providing a faster and more efficient analytics engine.

Databricks Delta Lake

Databricks Delta Lake

An optimized version of Delta Lake integrated with Alibaba Cloud Services

エンタープライズセキュリティ

Integrated with Alibaba Cloud RAM to ensure data security by controlling permissions based on users and roles

Big data analysis engine that unifies batch and stream processing

Deeply integrated with Alibaba Cloud services and features, such as the data governance and data lineage of DataWorks and the Machine Learning Platform for AI (PAI), to provide a more comprehensive data solution.

Stepone
シェル社導入事例
Beijing Jizhi Technology Co., Ltd. 北京基智科技有限公司

Watch this 10-minute video to hear how Beijing Jizhi Technology uses Databricks DataInsight for customer acquisition and management use cases.

Databricks DataInsight typical architecture

Deeply integrate with Alibaba Cloud products to build a real-time/offline data warehouse

Key roles

  • Data collection
    Receive real-time streaming data and batch data on external cloud storage
  • Data ETL
    Continuously and efficiently process incremental data, support data rollback and deletion, and provide ACID transactional guarantee
  • BI data analysis
    Support ad hoc queries, seamlessly integrated with a variety of BI analysis tools
  • AI data exploration
    Provide a complete machine learning platform

Databricks DataInsight typical architecture