Comparing Apache SparkTM and Databricks


Apache Spark capabilities provide speed, ease of use and breadth of use benefits and include APIs supporting a range of use cases:
  • Data integration and ETL
  • Interactive analytics
  • Machine learning and advanced analytics
  • Real-time data processing

v


Databricks builds on top of Spark and adds:
  • Highly reliable and performant data pipelines
  • Productive data science at scale

Want to learn more? Visit our platform page.

Funzionalità a confronto

Databricks

Maggiori informazioni

Yes No
Run multiple versions of Spark Yes No
Built-in file system optimized for cloud storage access (AWS S3, Redshift, Azure Blob) Yes No
Serverless pools offering auto-configuration of resources for SQL and Python workloads Yes No
Spark-native fine grained resource sharing for optimum utilization Yes No
Fault isolation of compute resources Yes No
Faster writes to S3 Yes No
Compute optimization during joins and filters Yes No
Rapid release cycles Yes No
Auto-scaling compute Yes No
Auto-scaling local storage Yes No
High availability for cluster Yes No
Multi-user cluster sharing Yes No
Automatic migration between spot and on-demand instances Yes No
Second-level billing Yes No

Yes No

ACID transactions Yes No
Schema management Yes No
Batch/Stream read/write support Yes No
Data versioning Yes No
Performance optimizations Yes No

Yes No
Interactive notebooks with support for multiple languages (SQL, Python, R and Scala) Yes No
Real-time collaboration Yes No
Notebook revision history and GitHub integration Yes No
One-click visualizations Yes No
Publish notebooks as interactive dashboards Yes No

Yes No
Spark job monitoring alerts Yes No
One-click deployment from notebooks to Spark Jobs Yes No
APIs to build workflows in notebooks Yes No
Production streaming with monitoring Yes No

Maggiori informazioni

Yes No
Access control for notebooks, clusters, jobs, and structured data Yes No
Audit logs Yes No
SSO with SAML 2.0 support Yes No
Data encryption (at rest and in motion) Yes No
Compliance (HIPAA, SOC 2 Type 2) Yes No

Yes No
Connect other BI tools via authenticated ODBC/JDBC (Tableau, Looker, etc) Yes No
REST API Yes No
Data source connectors Yes No

Yes No
Help and support from the committers who engineer Spark Yes No
SQL support Yes No

Additional Resources

Benchmarking Big Data SQL Platforms in the Cloud

Blog

 

How Hotels.com increased data analyzed by 20x without performance issues

REFERENZA

 

Managed Delta Lake: The best of data lakes, warehouses, and streaming systems.

Demo