HomepageData + AI Summit 2022 Logo
Watch on demand

The Semantics of Biology—Vaccine and Drug Research with Knowledge Graphs and Logical Inferencing on Apache Spark

On Demand

Type

  • Session

Format

  • In-Person

Track

  • Research

Industry

  • Healthcare and Life Sciences

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 151

Duration

  • 35 min
Download session slides

Overview

From the organization of the tree of life, to the tissues and structures of living organisms: trees and graphs are a recurring data structure in biology. Given the tree-like relationships between biological entities, Knowledge Graphs are emerging as the ideal way to store and retrieve biological data.

In our first Data + AI talk (https://www.youtube.com/watch?v=Kj5bZ2afWSU), we presented the Bellman open source library (https://github.com/gsk-aiops/bellman). Bellman was developed to translate SPARQL queries into Apache Spark Dataset operations so that scientists can submit graph queries in familiar environments like Jupyter and Databricks notebooks.

In this talk, we present the new logical inferencing capabilities we've built into the Bellman OSS library. We will demonstrate how connections between biological entities that are not explicitly connected in the data are deduced from ontologies. These inferred connections are returned to the scientist to aid in the discovery of new connections with the intent on accelerating gene to disease research. To demonstrate these capabilities, we will take a deep dive into the "subclassOf" logical entailment to retrieve all subclasses of a biological entity. The performance characteristics of inference algorithms like forward and backward chaining will also be compared.

Session Speakers

John Hunter

Senior Product Director AI/Ops

GSK

See the best of Data+AI Summit

Watch on demand