HomepageData + AI Summit 2022 Logo
Watch on demand

The Semantics of Biology—Vaccine and Drug Research with Knowledge Graphs and Logical Inferencing on Apache Spark

On Demand


  • Session


  • In-Person


  • Research


  • Healthcare and Life Sciences


  • Intermediate


  • Moscone South | Upper Mezzanine | 151


  • 35 min
Download session slides


From the organization of the tree of life, to the tissues and structures of living organisms: trees and graphs are a recurring data structure in biology. Given the tree-like relationships between biological entities, Knowledge Graphs are emerging as the ideal way to store and retrieve biological data.

In our first Data + AI talk (https://www.youtube.com/watch?v=Kj5bZ2afWSU), we presented the Bellman open source library (https://github.com/gsk-aiops/bellman). Bellman was developed to translate SPARQL queries into Apache Spark Dataset operations so that scientists can submit graph queries in familiar environments like Jupyter and Databricks notebooks.

In this talk, we present the new logical inferencing capabilities we've built into the Bellman OSS library. We will demonstrate how connections between biological entities that are not explicitly connected in the data are deduced from ontologies. These inferred connections are returned to the scientist to aid in the discovery of new connections with the intent on accelerating gene to disease research. To demonstrate these capabilities, we will take a deep dive into the "subclassOf" logical entailment to retrieve all subclasses of a biological entity. The performance characteristics of inference algorithms like forward and backward chaining will also be compared.

Session Speakers

John Hunter

Senior Product Director AI/Ops


See the best of Data+AI Summit

Watch on demand