주요 컨텐츠로 이동

Databricks를 위한 시맨틱 데이터 레이어를 지원하는 지식 그래프 사용

db-194-blog-img-og

발행일: 2022년 6월 17일

파트너6 min read

This is a collaborative post between Databricks and Stardog. We thank Aaron Wallace, Sr. Product Manager at Stardog, for their contribution.

 

Knowledge Graphs have become ubiquitous, we just don't know it. We experience it every day when we search on Google or watch the feeds that run through our social media accounts of people we know, companies we follow or the content we like. Similarly, Enterprise Knowledge Graphs provide a foundation for structuring your organization's content, data and information assets by extracting, relating and delivering knowledge as answers, recommendations and insights to every data-driven application from chatbots to recommendation engines or supercharging your BI and Analytics.

In this blog, you will learn how Databricks and Stardog solve the last mile challenge in democratizing data and insights. Databricks provides a lakehouse platform for data, analytics and artificial intelligence (AI) workloads on a multi–cloud platform. Stardog provides a knowledge graph platform that can model complex relationships against data that is wide, and not just big, to describe people, places, things and how they relate. The Databricks Lakehouse Platform, coupled with Stardog's Knowledge Graph-enabled semantic layer, provide organizations with a foundation for an enterprise data fabric architecture that makes it possible for cross-functional, cross-enterprise or cross-organizational teams to ask and answer complex queries across domain silos.

The growing need for a Data Fabric Architecture

Rapid innovation and disruption in the data management space are helping organizations unlock value from data available both inside and outside the enterprise. Organizations operating across physical and digital boundaries are finding new opportunities to serve customers in the way they want to be served.

These organizations have connected all relevant data across the data supply chain to create a complete and accurate picture in the context of their use-cases. Most industries that look to operate and share data across organizational boundaries to harmonize data and enable data sharing are adopting open standards in the form of prescribed ontologies, from FIBO in Financial Services to D3FEND in the Cybersecurity domain. These business ontologies (or semantic models) reflect how we think about data with meaning attached, i.e. "things'' rather than how data is structured and stored, i.e. "strings", and make data sharing and re-use possible.

The idea of a semantic layer is not new. It has been around for over 30 years, often promoted by BI vendors helping companies build purpose-built dashboards. However, broad adoption has been impeded, given the embedded nature of that layer as part of a proprietary BI system. This layer is often too rigid and complex, suffering from the same limitations as a physical relational database system which models data to optimize for its structured query language rather than how data is related in the real world—many-to-many. A knowledge graph-powered semantic data layer that operates between your storage and consumption layers provides that glue and multiplier that connects all data to deliver value in context of the business use-case to citizen data scientists and analysts that otherwise are unable to participate and collaborate in data-centric architectures outside of a handful of specialists.

Enable a use case around insurance

Let's look at a real-world example of a multi-carrier insurance organization to illustrate how Stardog and Databricks work together. Like most large companies, many insurance companies struggle with similar challenges when it comes to data, such as the lack of broad availability of data from internal and external sources for decision-making by critical stakeholders. Everyone from underwriting risk assessment to policy administration to claims management and agencies struggle with leveraging the right data and insights to make critical decisions. They all need an enterprise-wide data fabric that brings the elements of a modern data and analytics architecture to make data FAIR - Findable, Accessible, Interoperable and Reusable. Most companies start their journey by bringing all data sources into a data lake. The Databricks lakehouse approach provides companies with a great foundation for storing all their analytics data and making all data accessible to anyone inside the enterprise. In this data layer, all cleansing, transformation, and disambiguation takes place. The next step in that journey is data harmonization, connecting data based on its meaning to provide richer context. A semantic layer, delivered by a knowledge graph, shifts the focus to data analysis and processing and provides a connected fabric of cross-domain insights to underwriters, risk analysts, agents and customer service teams to manage risk and deliver an exceptional customer experience.

We will examine how this would work with a simplified semantic model as a starting point.

Easily model domain-specific entities and cross-domain relationships

Visually creating a semantic data model through a whiteboard-like experience is the initial step in creating a semantic data layer. Inside the Stardog Designer project, just click to create specific classes (or entities) that are critical in answering your business questions. Once a class is created, you can add all the necessary attributes and data types to describe this new entity. Linking classes (or entities) together is easy. With an entity selected, just click to add a link and drag the point of the new relationship until it snaps to the other entity. Give this new relationship a name that describes the business meaning (e.g., a "Customer" "owns" a "Vehicle").

Add a new class and link it to an existing class to create a relationship

Map metadata from the Databricks Lakehouse Platform

What's a model without data? Stardog users can connect to a variety of structured, semi-structured and unstructured data sources by persisting or virtualizing data, or some combination, when and where it makes sense. In Designer, it is easy to connect data from existing sources like Delta Lake to connect the metadata from user-specified tables. This enables initial access to that data through its virtualization layer without moving or copying it into the knowledge graph. The virtualization layer automatically translates incoming queries from Stardog from its open-standards based SPARQL to optimized push-down SQL queries in Databricks SQL.

Add a new data source as a project resource

Click to add a new project resource and select from one of the available connections, such as Databricks. This connection leverages the new SQL endpoint recently released by Databricks. Define a scope for the data and specify any additional properties. Use the preview pane to quickly glance at the data before adding to your project.

가이드

최신 분석을 위한 컴팩트 가이드

Incorporate additional data from a variety of locations

Designer makes it simple to incorporate data from other data sources and files such as CSVs, for teams looking to conduct ad-hoc data analysis, combining data from Delta with this new information. Once added as a resource, you simply add a link and drag and drop to a class to map the data. Give the mapping a meaningful name, specify a data column for the primary identifier, the label, and any other data columns that match the attributes for the entity.

Map data from a project resource to a class

Publish your work

Within the Designer you can publish this project's model and data directly to your Stardog server for use in Stardog Explorer. The designer also allows you to publish and consume the output of the knowledge graph in various ways. You can publish directly to a zipped folder of files, including your model and mappings, to your version control system.

Stardog 데이터베이스에 직접 게시

데이터가 Stardog에 게시되면 데이터 분석가는 Tableau와 같은 인기 있는 BI 도구를 사용하여 Stardog의 BI/SQL 엔드포인트를 통해 연결하고 시맨틱 계층을 통해 보고서나 대시보드로 데이터를 가져올 수 있습니다. SQL과 호환되는 모든 도구 내의 자동 생성 스키마를 통해 사용자는 Knowledge Graph에 대해 SQL 쿼리를 작성할 수 있습니다. SQL 계층을 통해 들어오는 쿼리는 Knowledge Graph의 쿼리 언어인 SPARQL로 자동 변환되고, 자동 생성된 소스 최적화 쿼리를 통해 가상 계층을 거쳐 Databricks의 Databricks SQL 엔드포인트를 통해 계산을 위해 소스로 푸시됩니다. 동일한 정보는 Stardog의 Python API인 pystardog을 사용하여 노트북에서 Databricks 사용자가 사용할 수도 있습니다. 또한 Stardog의 GraphQL API를 사용하여 가상 그래프를 애플리케이션 내에서 직접 사용할 수 있도록 포함할 수 있습니다. 레이크하우스 위에 있는 시맨틱 계층은 모든 유형의 사용자와 선호하는 도구를 위한 단일 환경을 제공하여 일관된 데이터 세트로 운영을 지원합니다.

Stardog Explorer Application을 통한 연결된 데이터 탐색
Stardog의 BI-SQL 엔드포인트를 통한 Tableau에서의 연결된 Knowledge Graph 데이터 시각화
Knowledge Graph에서 데이터를 쿼리하기 위해 pystardog을 사용하는 Databricks의 데이터 과학 노트북

생산성 향상 및 새로운 인사이트 개발

데이터를 Knowledge Graph로 구성함으로써 데이터 팀은 임시 데이터 분석을 지원하기 위해 외부 소스의 데이터를 정리하는 데 소비하는 시간을 줄여 생산성을 높입니다. Databricks 외부의 데이터는 Stardog의 가상화 계층을 통해 연합되어 Databricks 내부의 데이터에 연결될 수 있습니다. 또한 통계 및/또는 논리적 추론과 같은 기술을 사용하여 지식 그래프에 명시적으로 모델링하지 않고도 개체 간에 새로운 관계를 추론할 수 있습니다. Databricks와 Stardog은 원활하게 함께 작동하므로 이 조합은 복잡한 교차 도메인 쿼리 및 분석을 단순화하는 진정한 엔드투엔드 경험을 제공합니다. 또한 시맨틱 계층은 엔터프라이즈 데이터 패브릭 기반의 일부로 살아있는 공유 및 사용하기 쉬운 계층이 되어 새로운 데이터 기반 이니셔티브를 지원하는 엔터프라이즈 전체의 지식을 제공합니다.

Databricks 및 Stardog 시작하기

이 블로그에서는 Stardog이 Databricks Lakehouse Platform 위에 Knowledge Graph 기반 시맨틱 데이터 계층을 어떻게 활성화하는지에 대한 개요를 제공했습니다. 자세한 내용을 알아보려면 심층 데모를 확인하세요. Stardog은 연결된 데이터 자산의 우주를 통해 지식 작업자에게 중요한 적시 인사이트를 제공하여 분석을 강화하고 데이터 레이크 투자의 가치를 가속화합니다. Databricks와 Stardog을 함께 사용하면 데이터 및 분석 팀이 조직의 성장하는 요구 사항에 맞춰 발전하는 데이터 패브릭을 신속하게 구축할 수 있습니다.

Databricks 및 Stardog 시작하기: 아래에서 무료 평가판을 요청하세요:
https://www.databricks.com/try-databricks
https://cloud.stardog.com/get-started
https://www.stardog.com/learn-stardog/

(이 글은 AI의 도움을 받아 번역되었습니다. 원문이 궁금하시다면 여기를 클릭해 주세요)

게시물을 놓치지 마세요

관심 있는 카테고리를 구독하고 최신 게시물을 받은편지함으로 받아보세요