์ฃผ์š” ์ปจํ…์ธ ๋กœ ์ด๋™
์ด๋ฒคํŠธ

Databricks at SIGMOD 2026

์ž‘์„ฑ์ž: ์ธ๋“œ๋ผ์ง€ํŠธ ๋กœ์ด

  • Databricks๊ฐ€ ๋ณต์žกํ•œ ETL ๋ฐ ์ŠคํŠธ๋ฆฌ๋ฐ ์›Œํฌ๋กœ๋“œ๋ฅผ ๊ฐ„์†Œํ™”ํ•˜๋Š” ์ฐจ์„ธ๋Œ€ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง์ธ Spark Declarative Pipelines(SDP)๋ฅผ ์–ด๋–ป๊ฒŒ ์„ ๋„ํ•˜๊ณ  ์žˆ๋Š”์ง€ ์•Œ์•„๋ณด์„ธ์š”.
  • SIGMOD ์ปจํผ๋Ÿฐ์Šค์—์„œ ์šฐ์ˆ˜ ๋…ผ๋ฌธ์ƒ์„ ์ˆ˜์ƒํ•œ ์ ์ง„์  ๋ทฐ ์œ ์ง€ ๊ด€๋ฆฌ ์—”์ง„์ธ Enzyme์— ๋Œ€ํ•œ ์‹ฌ์ธต ๋ถ„์„์„ ๋ฐ›์•„๋ณด์„ธ์š”.
  • ์ปจํผ๋Ÿฐ์Šค์—์„œ ์ €ํฌ ์—”์ง€๋‹ˆ์–ด๋“ค์„ ๋งŒ๋‚˜ ์ด๋Ÿฌํ•œ ์—…๊ณ„ ์„ ๋„์ ์ธ ํ˜์‹ ์— ๋Œ€ํ•ด ๋…ผ์˜ํ•˜์„ธ์š”.

Databricks๋Š” ๋ฐ์ดํ„ฐ ๋ฐ AI ๋ถ„์•ผ์—์„œ ๊ฐ€๋Šฅํ•œ ๊ฒƒ์˜ ํ•œ๊ณ„๋ฅผ ์ง€์†์ ์œผ๋กœ ๋„“ํžˆ๋ฉฐ ์—”์ง€๋‹ˆ์–ด๋ง ํ˜์‹ ์„ ์„ ๋„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Spark Declarative Pipelines์— ๋Œ€ํ•œ ์ €ํฌ์˜ ์ž‘์—…์ด SIGMOD 2026์—์„œ ์†Œ๊ฐœ๋  ์˜ˆ์ •์ด๋ฉฐ, ํ•ด๋‹น ์ž‘์—…์ด ํ•™ํšŒ์—์„œ ์šฐ์ˆ˜ ๋…ผ๋ฌธ์ƒ(honorable mention award)์„ ์ˆ˜์ƒํ•˜๊ฒŒ ๋˜์—ˆ์Œ์„ ๋ฐœํ‘œํ•˜๊ฒŒ ๋˜์–ด ๊ธฐ์ฉ๋‹ˆ๋‹ค. ์ €ํฌ๋Š” ๋‹ค๊ฐ€์˜ค๋Š” 6์›” 1์ผ๋ถ€ํ„ฐ 5์ผ๊นŒ์ง€ ํ”Œ๋ž˜ํ‹ฐ๋„˜ ์Šคํฐ์„œ๋กœ SIGMOD์— ์ฐธ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. SIGMOD๋Š” ์ธ๋„ ๋ฐฉ๊ฐˆ๋กœ๋ฅด์—์„œ ๊ฐœ์ตœ๋  ์˜ˆ์ •์ด๋ฉฐ, ์ด๊ณณ์€ Databricks์˜ ์ฃผ์š” R&D ํ—ˆ๋ธŒ์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง์— ๋Œ€ํ•œ ์ €ํฌ์˜ ์ตœ์‹  ๋…ผ๋ฌธ๋“ค์€ Databricks๊ฐ€ ๊ณ ๊ฐ์„ ์œ„ํ•ด ์ ์ง„์  ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋ฅผ ์–ด๋–ป๊ฒŒ ๊ฐ„์†Œํ™”ํ–ˆ๋Š”์ง€ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. Spark Declarative Pipelines(SDP)์—์„œ ์ ์ง„์  ํ”„๋กœ๊ทธ๋žจ์„ ์ž‘์„ฑํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ์œผ๋ฉฐ, ๊ณ ๊ฐ์€ ํŒŒ์ดํ”„๋ผ์ธ ๋‚ด์—์„œ ์ด ๋‘ ๊ฐ€์ง€๋ฅผ ํ˜ผํ•ฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋Š” ๋ณ€ํ™˜์„ ์œ„ํ•ด Materialized Views๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Enzyme ์—”์ง„์€ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๊ฐ€ ๋„์ฐฉํ•จ์— ๋”ฐ๋ผ ์ด๋ฅผ ์ ์ง„์ ์œผ๋กœ ์œ ์ง€ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ ์ง„์  ์ฒ˜๋ฆฌ์˜ ๋ชจ๋“  ๋ณต์žก์„ฑ์€ materialized view๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ์™„์ „ํžˆ ์ˆจ๊ฒจ์ง‘๋‹ˆ๋‹ค. SIGMOD 2026 ๋…ผ๋ฌธ โ€œEnzyme: Incremental View Maintenance for Data Engineeringโ€์€ ์ด๋Ÿฌํ•œ ์•„์ด๋””์–ด ์ค‘ ์ผ๋ถ€๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค.
  • ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ์— ๋Šฅ์ˆ™ํ•œ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋Š” ๋Œ€์‹  SDP์˜ ์ŠคํŠธ๋ฆฌ๋ฐ ์—”์ง„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ ์ง„์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ŠคํŠธ๋ฆฌ๋ฐ API๋Š” ์ƒํƒœ ์ €์žฅ ์—ฐ์‚ฐ์ž๋ถ€ํ„ฐ ์›Œํ„ฐ๋งˆํฌ๊นŒ์ง€ ๋‹ค์–‘ํ•œ ๊ตฌ๋ฌธ์„ ์ œ๊ณตํ•˜์—ฌ ๋ณต์žกํ•œ ๋น„์ฆˆ๋‹ˆ์Šค ๋กœ์ง(์˜ˆ: ์‚ฌ์šฉ์ž ์ •์˜ ์ง‘๊ณ„)์„ ์‰ฝ๊ฒŒ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ €ํฌ ์ŠคํŠธ๋ฆฌ๋ฐ ์ œํ’ˆ์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” VLDB 2026 ๋…ผ๋ฌธ โ€œA Decade of Apache Spark Structured Streaming: How We Evolved the Architecture To Meet Real-world Needsโ€์— ํฌํ•จ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Enzyme ๋…ผ๋ฌธ๊ณผ ํŒ€์ด ์ž‘์—…ํ•ด ์˜จ ๋‚ด์šฉ์— ๋Œ€ํ•œ ๋ฏธ๋ฆฌ๋ณด๊ธฐ๋ฅผ ํ™•์ธํ•ด ๋ณด์„ธ์š”:

SIGMOD 2026์˜ Enzyme

์ ์ง„์  ๋ทฐ ์œ ์ง€ ๊ด€๋ฆฌ

ํšŒ์‚ฌ์—์„œ ๋ถ„์„๊ฐ€๋ผ๊ณ  ๊ฐ€์ •ํ•ด ๋ด…์‹œ๋‹ค. ํŠน์ • ์ง€์—ญ์—์„œ ํŒ๋งค๋œ ์ด ์ฃผ๋ฌธ ์ˆ˜๋ฅผ ๋ถ„์„ํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์˜ materialized view๊ฐ€ ๋‹ต์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

CREATE MATERIALIZED VIEW order_report as

SELECT region, sum(orders)

FROM customer_and_order_table

GROUP by region

์ƒˆ๋กœ์šด ์ฃผ๋ฌธ์ด ์ถ”๊ฐ€๋จ์— ๋”ฐ๋ผ materialized view๊ฐ€ ์ตœ์‹  ์ƒํƒœ๋กœ ์œ ์ง€๋˜๊ธฐ๋ฅผ ๊ธฐ๋Œ€ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ ์œ ์ง€ ๊ด€๋ฆฌ๋Š” ๋ณธ์งˆ์ ์œผ๋กœ ์ ์ง„์  ๋ทฐ ์œ ์ง€ ๊ด€๋ฆฌ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ์œ„์˜ ๊ฐ„๋‹จํ•œ MV๋ฅผ ์ตœ์‹  ์ƒํƒœ๋กœ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์€ ๊ฐ„๋‹จํ•ด ๋ณด์ด์ง€๋งŒ, MV๊ฐ€ ์—ฌ๋Ÿฌ ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์กฐ์ธํ•ด์•ผ ํ•˜๊ฑฐ๋‚˜ ์ฐฝ ํ•จ์ˆ˜๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๊ฑฐ๋‚˜ LLM ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์ƒ์ƒํ•ด ๋ณด์„ธ์š”.

Enzyme์˜ ํ˜์‹ 

Materialized views(MVs)๋Š” ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ๊ฐ€์†ํ™”ํ•˜๋Š” ์ฟผ๋ฆฌ ๊ฐ€์†ํ™”์— ์ธ๊ธฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. Spark Declarative Pipelines๋ฅผ ์ƒ์„ฑํ•  ๋•Œ, ์ €ํฌ๋Š” ์ฟผ๋ฆฌ ๊ฐ€์†ํ™”๋ฅผ ๋„˜์–ด์„œ materialized views๋ฅผ extract-transform-load(ETL) ์‚ฌ์šฉ ์‚ฌ๋ก€์— ์ ์šฉํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ €ํฌ์˜ ํ•ต์‹ฌ ๊ด€์ฐฐ์€ MV๋ฅผ ํšจ์œจ์ ์ด๊ณ  ์ ์ง„์ ์œผ๋กœ ์œ ์ง€ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด, ๋ณต์žกํ•œ ์‚ฌ์šฉ์ž ์ •์˜ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด์•ผ ํ•˜๋Š” ETL ์›Œํฌ๋กœ๋“œ๋ฅผ ํฌ๊ฒŒ ๊ฐ„์†Œํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Enzyme์€ ์ ์ง„์ ์œผ๋กœ materialized views๋ฅผ ์œ ์ง€ ๊ด€๋ฆฌํ•˜๋Š” ํ’๋ถ€ํ•œ ๋ฌธํ—Œ์— ๊ธฐ์—ฌํ•˜๋ฉฐ, ํ”„๋กœ๋•์…˜ ์›Œํฌ๋กœ๋“œ์—์„œ ์ด๋Ÿฌํ•œ ๊ธฐ์ˆ ์„ ํ™•์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํŒ€์ด ์ž‘์—…ํ•œ ํ˜์‹  ์ค‘ ์ผ๋ถ€๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ๊ด‘๋ฒ”์œ„ํ•œ MV ํŒจํ„ด ์ง€์›: Enzyme์€ ์กฐ์ธ, ์ฐฝ ํ•จ์ˆ˜, ์ง‘๊ณ„ ๋ฐ ์ด๋“ค์˜ ์กฐํ•ฉ์„ ํฌํ•จํ•œ ๋ณต์žกํ•œ MV๋ฅผ ํ”„๋กœ๋•์…˜์—์„œ ์ ์ง„์ ์œผ๋กœ ์œ ์ง€ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์—…๊ณ„ ์†”๋ฃจ์…˜๊ณผ ๋‹ฌ๋ฆฌ Enzyme์€ current_date()์™€ ๊ฐ™์€ ๋น„๊ฒฐ์ •์  ํ•จ์ˆ˜ ๋ฐ AI ๊ด€๋ จ ํ•จ์ˆ˜๋„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
  • ๋‹ค๊ตญ์–ด ์ง€์›: ๋Œ€๋ถ€๋ถ„์˜ ์—…๊ณ„ ์†”๋ฃจ์…˜์€ SQL์—๋งŒ ์ค‘์ ์„ ๋‘์ง€๋งŒ, Enzyme์€ Python์œผ๋กœ ์ง€์ •๋œ MV๋„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. Python์€ ์ด์ œ ๋Œ€๋ถ€๋ถ„์˜ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ๋ฐ AI ์›Œํฌ๋กœ๋“œ์—์„œ ์„ ํ˜ธ๋˜๋Š” ์–ธ์–ด์ž…๋‹ˆ๋‹ค. Enzyme์€ MV ์ •์˜์˜ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ์ •ํ™•ํ•˜๊ฒŒ ๊ฐ์ง€ํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์ด ๋‹ค๊ตญ์–ด ์ง€์›๊ณผ ๊ด€๋ จ๋œ ๋งŽ์€ ํฅ๋ฏธ๋กœ์šด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
  • ์„ฑ๋Šฅ ์ตœ์ ํ™”: Enzyme์€ ํŒŒํ‹ฐ์…˜ ์ˆ˜์ค€ ๋˜๋Š” ํ–‰ ์ˆ˜์ค€์—์„œ ์—…๋ฐ์ดํŠธ๋ฅผ ์ ์šฉํ•ด์•ผ ํ•˜๋Š”์ง€ ์ž๋™์œผ๋กœ ๊ฒฐ์ •ํ•˜๋Š” ๊ธฐ์ˆ ์„ ํฌํ•จํ•˜์—ฌ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•˜๋Š” ๋ฐ์ดํ„ฐ ์–‘์„ ์ค„์ด๋Š” ์—ฌ๋Ÿฌ ์ตœ์ ํ™”๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์žฌ์ž‘์„ฑ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ IO ๋น„์šฉ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฅผ ์„ ํƒ์ ์œผ๋กœ ์บ์‹ฑํ•ฉ๋‹ˆ๋‹ค. ๊ณ„ํš ์ •๋ณด์™€ ์ด์ „ ์‹คํ–‰์„ ํ™œ์šฉํ•˜์—ฌ ๊ฐ€์žฅ ํšจ์œจ์ ์ธ ์ ์ง„ํ™” ์ „๋žต์„ ๊ฒฐ์ •ํ•˜๋Š” ๋น„์šฉ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
Figure 1: Enzyme has significantly better performance than another competing industry solution (name anonymized to CV-IVM due to licensing restrictions).

๊ทธ๋ฆผ 1: Enzyme์€ ๋‹ค๋ฅธ ๊ฒฝ์Ÿ ์—…๊ณ„ ์†”๋ฃจ์…˜(๋ผ์ด์„ ์Šค ์ œํ•œ์œผ๋กœ ์ธํ•ด CV-IVM์œผ๋กœ ์ต๋ช… ์ฒ˜๋ฆฌ๋จ)๋ณด๋‹ค ํ›จ์”ฌ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๋” ์ž์„ธํžˆ ์•Œ๊ณ  ์‹ถ์œผ์‹ ๊ฐ€์š”? ๋…ผ๋ฌธ์„ ํ™•์ธํ•˜์‹œ๊ณ , SIGMOD์— ์ฐธ์„ํ•˜์‹ ๋‹ค๋ฉด ๋” ์ž์„ธํ•œ ๋‚ด์šฉ์„ ์œ„ํ•ด ์ €ํฌ ๋ฐœํ‘œ์— ์ฐธ์„ํ•ด ์ฃผ์„ธ์š”.

SIGMOD์—์„œ ํŒ€ ๋งŒ๋‚˜๊ธฐ:

์ €ํฌ ๋ถ€์Šค์— ๋“ค๋Ÿฌ ํŒ€์„ ๋งŒ๋‚˜๊ณ  Databricks์—์„œ ์ง„ํ–‰ ์ค‘์ธ ํ˜์‹ ์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด์„ธ์š”. ๋˜ํ•œ, Ritwik Yadav์˜ SIGMOD ๋ฐœํ‘œ๋ฅผ ์ง์ ‘ ๋“ค์„ ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๋ฅผ ๋†“์น˜์ง€ ๋งˆ์„ธ์š”!

(์ด ๊ธ€์€ AI์˜ ๋„์›€์„ ๋ฐ›์•„ ๋ฒˆ์—ญ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์›๋ฌธ์ด ๊ถ๊ธˆํ•˜์‹œ๋‹ค๋ฉด ์—ฌ๊ธฐ๋ฅผ ํด๋ฆญํ•ด ์ฃผ์„ธ์š”)

์ตœ์‹  ๊ฒŒ์‹œ๋ฌผ์„ ์ด๋ฉ”์ผ๋กœ ๋ฐ›์•„๋ณด์„ธ์š”

๋ธ”๋กœ๊ทธ๋ฅผ ๊ตฌ๋…ํ•˜๊ณ  ์ตœ์‹  ๊ฒŒ์‹œ๋ฌผ์„ ์ด๋ฉ”์ผ๋กœ ๋ฐ›์•„๋ณด์„ธ์š”.