Databricks Launches New Edition of Its Apache Spark-Based Cloud Platform for Data Engineers

April 12, 2017

San Francisco, CA -- (Marketwired - April 12, 2017) - Databricks, the company founded by the creators of the popular Apache Spark project and providers of the leading Spark-based cloud platform for data science, today announced an edition of its cloud platform optimized specifically for data engineering workloads called Databricks for Data Engineering. The new offering enables more cost-effective data engineering using Spark while empowering data engineers to easily combine SQL, structured streaming, Extract, Transform, Load (ETL), and machine learning workloads running on Spark to rapidly and securely deploy data pipelines into production. Databricks for Data Engineering will complement the company's existing cloud platform by providing all enterprises with a unified data analytics platform that fosters seamless collaboration to accelerate data-driven decisions across the organization.

"Databricks' latest developments for data engineering make it exceedingly easy to get started with Spark --- providing a platform that is apt as both an integrated development environment and deployment pipeline," said Brett Bevers, Engineering Manager, Data Engineering at Dollar Shave Club. "On our first day using Databricks, we were equipped to grapple with an entirely new class of data challenges."

Most organizations today encounter a variety of challenges in building systems on and around Spark to meet the needs of data engineering. Specifically, data engineers perform mission-critical data cleansing, transformations, and manipulations, to make business use cases such as real-time dashboards or fraud detection possible. As a result, for companies that set their sights on making data-driven decisions or automating business processes with intelligent algorithms, mastering data engineering is an essential step.

"The expansion of our product portfolio to meet the needs of data engineering workloads is a major step in our journey to make big data simple for very complex data problems," said Ali Ghodsi, CEO and Co-founder at Databricks. "Databricks for Data Engineering will offer organizations a unified environment for data science and data engineering users alike, while optimizing Apache Spark performance -- all with the reliability of an enterprise data and analytics platform at an efficient price."

Databricks for Data Engineering ensures all data engineers run their workloads at scale on highly optimized infrastructure in a reliable and cost effective manner. Available starting today, the new Databricks offering provides:

Performance optimization: Databricks I/O technology (DBIO) takes processing speeds to the next level with a tuned and optimized version of Spark for a wide variety of instance types, in addition to an optimized AWS S3 access layer --- accelerating data exploration by up to 10x.
Cost management: Cluster management capabilities such as auto-scaling and AWS Spot instances reduces operational costs by avoiding time-consuming tasks to build, configure, and maintain complex Spark infrastructure.
Optimized integration: Comprehensive REST APIs to programmatically launch clusters and jobs and integrate tools or services, such as Redshift, Kinesis, and machine learning frameworks such as TensorFlow, with the Databricks platform. An integrated data sources catalog makes every data source immediately available to all Databricks users without duplicating data ingest work.
Enterprise security: Turnkey security standards including SOC 2 Type 1 certification and HIPAA compliance, data encryption, detailed logs easily accessible in AWS S3 for debugging, and IT admin capabilities such as Single Sign-On with SAML 2.0 support and role-based access controls for clusters, jobs, and notebooks.
Collaboration with data science: Integration with the data science workspaces in Databricks, enabling a seamless transition between data engineering and interactive data science workloads.

This new offering, priced based on data engineering workloads such as ETL and automated jobs ($0.20 per Databricks Unit plus the cost of AWS), helps data and machine learning engineers build and deploy highly optimized and reliable data infrastructure in the cloud.

Visit databricks.com/product/pricing for more information.
Contact Databricks to get started: databricks.com/contact-databricks.
Access a trial of Databricks: databricks.com/try-databricks.

About Databricks

Databricks’ vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache® Spark™, a powerful open source data processing engine built for sophisticated analytics, ease of use, and speed. Databricks is the largest contributor to the open source Apache Spark project providing 10x more code than any other company. The company has also trained over 40,000 users on Apache Spark, and has the largest number of customers deploying Spark to date. Databricks provides a virtual analytics platform, to simplify data integration, real-time experimentation, and robust deployment of production applications. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, contact <info@databricks.com>.