Apache Spark™

Apache Spark とは、ビッグデータと機械学習のための極めて高速な分散処理フレームワークです。これはもともと、2009 年にカリフォルニア大学バークレー校で開発されました。

データ処理で最大のオープンソースプロジェクト

ビッグデータ分析に最適なオープンソースの分散処理システムであるApache Sparkはそのリリース以来、さまざまな業種の企業に採用され、急速な拡大を遂げています。Netflix、Yahoo、eBay などのインターネット大手も、Spark を大規模にデプロイし、8000 を超えるノードのクラスターで、複数のペタバイトデータをまとめて処理しています。Apache Spark は現在、250 を超える組織から 1000 名以上が参加する、ビッグデータの最大のオープンソースコミュニティへと急速に成長しています。

Databricks は、カリフォルニア大学バークレー校で Spark の研究プロジェクトを開始したチームによって 2013 年に設立されました。

Apache Spark は 100％オープンソースで、ベンダーに依存しない Apache Software Foundation によってホストされています。Databricks では、このオープンな開発モデルを維持することに全力で取り組んでいます。Databricks は Spark コミュニティと協力し、開発とコミュニティの活動の両方を通じて Apache Spark プロジェクトに大きく貢献しています。

動画を見る

What is Apache Spark - Benefits of Apache Spark

Speed

Engineered from the bottom-up for performance, Spark can be 100x faster than Hadoop for large scale data processing by exploiting in memory computing and other optimizations. Spark is also fast when data is stored on disk, and currently holds the world record for large-scale on-disk sorting.

Ease of Use

Spark has easy-to-use APIs for operating on large datasets. This includes a collection of over 100 operators for transforming data and familiar data frame APIs for manipulating semi-structured data.

A Unified Engine

Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning and graph processing. These standard libraries increase developer productivity and can be seamlessly combined to create complex workflows.

Databricks クラウドで Apache Spark を無料でお試しください。

Databricks の統合分析プラットフォームは、分散処理システムの Spark、対話型ノートブック、統合されたワークフロー、およびエンタープライズセキュリティについて、5 倍のパフォーマンスを提供します。これらは全て、フルマネージド型のクラウドプラットフォームで実行されます。

Databricks 無料トライアル

オープンソースの Apache Spark プロジェクトはこちらからダウンロード可能です。