Databricks Labs
Databricks Labs는 고객이 사용 사례를 프로덕션에 더 빠르게 적용할 수 있도록 현장 팀에서 만든 프로젝트입니다.
DQX
Simplified Data Quality checking at Scale for PySpark Workloads on streaming and standard DataFrames.
Kasal
Kasal is an interactive, low-code way to build and deploy AI Agents on the Databricks platform.
모자이크
Mosaic은 일반적인 오픈 소스 지리 공간 라이브러리와 Apache Spark™️를 함께 바인딩하여 확장 가능한 지리 공간 데이터 파이프라인의 구현을 간소화하는 도구입니다. Mosaic은 또한 일반적인 지리 공간 사용 사례에 대한 일련의 예제와 모범 사례를 제공합니다. ST_ 표현식 및 GRID_ 표현식을 위한 API를 제공하여 H3 및 British National Grid와 같은 그리드 인덱스 시스템을 지원합니다.
기타 프로젝트
Databricks MCP
A collection of MCP servers to help AI agents fetch enterprise data from Databricks and automate common developer actions on Databricks.
Conversational Agent App
Application featuring a chat interface powered by Databricks Genie Conversation APIs, built specifically to run as a Databricks App.
Knowledge Assistant Chatbot Application
Example Databricks Knowledge Assistant chatbot application.
Feature Registry Application
The app provides a user-friendly interface for exploring existing features in Unity Catalog. Additionally, users can generate code for creating feature specs and training sets to train machine learning models and deploy features as Feature Serving Endpoints.
Mosaic
Mosaic is a tool that simplifies the implementation of scalable geospatial data pipelines by binding together common open source geospatial libraries and Apache Spark™️. Mosaic also provides a set of examples and best practices for common geospatial use cases. It provides APIs for ST_ expressions and GRID_ expressions, supporting grid index systems such as H3 and British National Grid.
DLT-메타
이 프레임워크를 사용하면 delta live table 및 메타데이터를 사용하여 데이터를 쉽게 수집할 수 있습니다. DLT-META를 사용하면 한 명의 데이터 엔지니어가 수천 개의 테이블을 쉽게 관리할 수 있습니다. 몇몇 Databricks 고객은 프로덕션에서 1000+ 테이블을 처리하기 위해 DLT-META를 사용하고 있습니다.
Smolder
Smolder provides an Apache Spark™ SQL data source for loading EHR data from HL7v2 message formats. Additionally, Smolder provides helper functions that can be used on a Spark SQL DataFrame to parse HL7 message text, and to extract segments, fields, and subfields from a message.
Geoscan
육각형 계층적 공간 인덱스 를 기반으로 하는 밀도 기반 공간 cluster 에 대한 Apache Spark ML Estimator입니다.
Data Generator
Generate relevant data quickly for your projects. The Databricks data generator can be used to generate large simulated/synthetic data sets for test, POCs, and other uses
Splunk Integration
Add-on for Splunk, an app that allows Splunk Enterprise and Splunk Cloud users to run queries and execute actions, such as running notebooks and jobs, in Databricks.
DiscoverX
DiscoverX automates administration tasks that require inspecting or applying operations to a large number of Lakehouse assets.
brickster
{brickster} is the R toolkit for Databricks, it includes:
- Wrappers for Databricks API's (e.g. db_cluster_list, db_volume_read)
- Browser workspace assets via RStudio Connections Pane (open_workspace())
- Exposes the databricks-sql-connector via {reticulate} (docs)
- Interactive Databricks REPL
DBX
This tool simplifies jobs launch and deployment process across multiple environments. It also helps to package your project and deliver it to your Databricks environment in a versioned fashion. Designed in a CLI-first manner, it is built to be actively used both inside CI/CD pipelines and as a part of local tooling for fast prototyping.
Tempo
The purpose of this project is to provide an API for manipulating time series on top of Apache Spark™. Functionality includes featurization using lagged time values, rolling statistics (mean, avg, sum, count, etc.), AS OF joins, and downsampling and interpolation. This has been tested on TB-scale of historical data.
PyLint Plugin
This plugin extends PyLint with checks for common mistakes and issues in Python code specifically in Databricks Environment.
PyTester
PyTester is a powerful way to manage test setup and teardown in Python. This library provides a set of fixtures to help you write integration tests for Databricks.
Delta Sharing Java Connector
The Java connector follows the Delta Sharing protocol to read shared tables from a Delta Sharing Server. To further reduce and limit egress costs on the Data Provider side, we implemented a persistent cache to reduce and limit the egress costs on the Data Provider side by removing any unnecessary reads.
Overwatch
Analyze all of your jobs and clusters across all of your workspaces to quickly identify where you can make the biggest adjustments for performance gains and cost savings.
UCX
UCX is a toolkit for enabling Unity Catalog (UC) in your Databricks workspace. UCX provides commands and workflows for migrate tables and views to UC. UCX allows to rewrite dashboards, jobs and notebooks to use the migrated data assets in UC. And there are many more features.
https://github.com/databrickslabs 의 모든 프로젝트는 계정은 탐색 분석용으로만 제공되며 SLA(서비스 수준 계약)가 있는 Databricks 에서 공식적으로 지원하지 않습니다. 그것들은 있는 그대로 제공되며 우리는 어떤 종류의 보증도 하지 않습니다. 이러한 프로젝트의 사용으로 인해 발생하는 문제와 관련된 지원 티켓을 제출하지 마십시오. 이 프로젝트를 사용하여 발견된 모든 문제는 리포지토리에 GitHub 문제로 제출해야 합니다. 시간이 허락하는 대로 검토되지만 지원을 위한 공식적인 SLA는 없습니다.


