Turning Big Biology Data into Insights on Disease – The Power of Circulating Biomarkers
- Moscone South | Upper Mezzanine | 151
- 35 min
Profiling small molecules in human blood across global populations gives rise to a greater understanding of the varied biological pathways and processes that contribute to human health and diseases. Herein, we describe the development of a comprehensive Human Biology Database, derived from nontargeted molecular profiling of over 300,000 human blood samples from individuals across diverse backgrounds, demographics, geographical locations, lifestyles, diseases, and medication regimens, and its applications to inform drug development.
Approximately 11,000 circulating molecules have been captured and measured per sample using Sapient’s high-throughput, high-specificity rapid liquid chromatography-mass spectrometry (rLC-MS) platform. The samples come from cohorts with adjudicated clinical outcomes from prospective studies lasting 10-25 years, as well as data on individuals’ diet, nutrition, physical exercise, and mental health. Genetic information for a subset of subjects is also included and we have added microbiome sequencing data from over 150,000 human samples in diverse diseases.
An efficient data science environment is established to enable effective health insight mining across this vast database. Built on a customized AWS and Databricks “infrastructure-as-code” Terraform configuration, we employ streamlined data ETL and machine learning-based approaches for rapid rLC-MS data extraction. In mining the database, we have been able to identify circulating molecules potentially causal to disease; illuminate the impact of human exposures like diet and environment on disease development, aging, and mortality over decades of time; and support drug development efforts through identification of biomarkers of target engagement, pharmacodynamics, safety, efficacy, and more.