Skip to main content

Today we are excited to announce the release of a set of APIs on Databricks that enable our users to manage Apache Spark clusters and production jobs via a RESTful interface.

You can read the press release here.

For the impatient, the full documentation of the APIs is here.

API + GUI: The Best of Both Worlds

The graphical user interface in Databricks has already simplified Spark operations for our users when they need to launch a cluster or schedule a job quickly. However, many want something more than a point-and-click interface because they prefer the command line, or they need to automate common operations using scripts or continuous integration tools such as Jenkins. These new APIs expose the core infrastructure functionality of Databricks so that users have complete freedom to choose how they want to manage their clusters and put applications into production.

One Platform For Data Science and Production Spark Applications

To effectively deploy data-driven applications, organizations need a wide variety of capabilities from their data platforms because of the different skill sets and responsibilities of the teams involved. Spark application developers typically work with command line and APIs to be efficient; DevOps in IT want to automate as much process as possible to improve reliability; while data science and analysts just want easy access to powerful clusters that work reliably, and an interactive environment to develop algorithms and visualize data.

Typically, each team pursues different solutions in an uncoordinated fashion. As a result, organizations end up with a complex IT infrastructure or become extremely unproductive as release cycles get bogged down with a sprawl of tools and manual processes.

No platform has been able to meet these disparate needs out of the box. With the release of these APIs, we are proud to say that Databricks is the first company to unify the full spectrum of capabilities in one Spark platform.

What’s Next

The APIs are very simple to use - you can try them out in a terminal with the cURL command. A few basic examples are below:

Create a new cluster

curl -u user:pwd -H "Content-Type: application/json" -X POST -d 
  '{ "cluster_name": "flights", "spark_version": "1.6.x-ubuntu15.10", 
  "spark_conf": { "spark.speculation": true }, 
  "aws_attributes": { "availability": "SPOT", "zone_id": "us-west-2c" }, 
  "num_workers": 2 }'

Delete a cluster

curl -u user:pwd -H "Content-Type: application/json" -X POST -d 

Run a job

curl -u user:pwd -H "Content-Type: application/json" -X POST -d 
  '{ "job_id":2, "jar_params": ["param1", "param2"]}'

We will continue to release more APIs as we add new features to the Databricks platform - stay tuned. In the meantime, try out these APIs for yourself in Databricks for free.

Try Databricks for free

Related posts

Riding the AI Wave

March 15, 2022 by Danny Healy in Data Strategy
“...incorporating machine learning into a company’s application development is difficult…” It’s been almost a decade since Marc Andreesen hailed that software was eating...
See all Product posts