Vishakha Sharma

Principal Data Scientist, Roche

Vishakha Sharma is a principal data scientist for diagnostic information solutions at Roche, where she leads advanced analytics initiatives such as natural language processing (NLP) and machine learning (ML) to discover key insights improving NAVIFY product portfolio, leading to better and more efficient patient care. Vishakha has authored 40+ peer-reviewed publications and proceedings and has given 15+ invited talks. She serves on the program committee of the ACM-W, NeurIPS, AMIA, and ACM-BCB. Her research work has been funded by the NIH Big Data to Knowledge (BD2K) initiative to build an NLP precision medicine software. She holds a PhD in computer science.

Past sessions

Unstructured free-text medical notes are the only source for many critical facts in healthcare. As a result, accurate natural language processing is a critical component of many healthcare AI applications like clinical decision support, clinical pathway recommendation, cohort selection, patient risk or abnormality detection. Recent advances in deep learning for NLP have enabled a new level of accuracy and scalability for clinical language understanding making a broad set of applications possible for the first time.

The first part of this talk will cover the deep learning techniques, explain-ability features, and NLP pipeline architecture that has been applied. We'll provide a short overview of the key underlying technologies: Spark NLP for Healthcare, BERT embeddings, and healthcare-specific embeddings. Then, we'll describe how these were applied to tackle the challenges of a healthcare setting: understanding clinical terminology, extracting specialty-specific facts of interest, and using transfer learning to minimize the required amount of task-specific annotation. The use of MLflow and its integration with Spark NLP to track experiments and reproduce results will also be covered.

The second part of the talk will cover automated deep learning: the system's ability to train, tune and measure models once clinical annotators add or correct labeled data. We will cover the annotation process and guidelines; why automation was required to handle the variety in clinical language across providers, document types, and geographies; and how this works in practice. Providing explainable results - including highlighting evidence in the text for extracted semantic facts - is another critical business requirement that we'll show how we've addressed. This talk is intended for data scientists, software engineers, architects and leaders who must design real-world clinical AI applications and are interested in lessons learned applying the latest advances in NLP and deep learning in this space.