Overview and extended description: AI is expected to be the engine of technological advancements in the healthcare industry, especially in the areas of radiology and image processing. The purpose of this session is to demonstrate how we can build a AI-based Radiologist system using Apache Spark and Analytics Zoo to detect pneumonia and other diseases from chest x-ray images. The dataset, released by the NIH, contains around 110,00 X-ray images of around 30,000 unique patients, annotated with up to 14 different thoracic pathology labels. Stanford University developed a state-of-the-art model using CNN and exceeds average radiologist performance on the F1 metric.
This talk focuses on how we can build a multi-label image classification model in a distributed Apache Spark infrastructure, and demonstrate how to build complex image transformations and deep learning pipelines using BigDL and Analytics Zoo with scalability and ease of use. Some practical image pre-processing procedures and evaluation metrics are introduced. We will also discuss runtime configuration, near-linear scalability for training and model serving, and other general performance topics.
Bala Chandrasekaran is a Technical Staff Engineer at Dell Technologies, where he is responsible for building machine learning and deep learning infrastructure solutions. His has over 15 years of experience in the areas of high performance computing, virtualization infrastructure, cloud computing and big data.
Yuhao Yang is a software engineer at Intel, where he provides implementation, consulting, and tuning advice on the Hadoop ecosystem to industry partners. Yuhao’s area of focus is distributed machine learning, especially large-scale analytical applications and infrastructure on Spark. He’s also an active contributor to Spark MLlib (50+ patches), has delivered the implementation of online LDA, QR decomposition, and several transformers of Spark feature engineering, and has provided improvements on some important algorithms.