Yuhao Yang

Software Engineer, Intel

Yuhao Yang is a software engineer at Intel, where he provides implementation, consulting, and tuning advice on the Hadoop ecosystem to industry partners. Yuhao’s area of focus is distributed machine learning, especially large-scale analytical applications and infrastructure on Spark. He’s also an active contributor to Spark MLlib (50+ patches), has delivered the implementation of online LDA, QR decomposition, and several transformers of Spark feature engineering, and has provided improvements on some important algorithms.

Past sessions

Overview and extended description: AI is expected to be the engine of technological advancements in the healthcare industry, especially in the areas of radiology and image processing. The purpose of this session is to demonstrate how we can build a AI-based Radiologist system using Apache Spark and Analytics Zoo to detect pneumonia and other diseases from chest x-ray images. The dataset, released by the NIH, contains around 110,00 X-ray images of around 30,000 unique patients, annotated with up to 14 different thoracic pathology labels. Stanford University developed a state-of-the-art model using CNN and exceeds average radiologist performance on the F1 metric.

This talk focuses on how we can build a multi-label image classification model in a distributed Apache Spark infrastructure, and demonstrate how to build complex image transformations and deep learning pipelines using BigDL and Analytics Zoo with scalability and ease of use. Some practical image pre-processing procedures and evaluation metrics are introduced. We will also discuss runtime configuration, near-linear scalability for training and model serving, and other general performance topics.

Summit 2019 Game Playing Using AI on Apache Spark

April 24, 2019 05:00 PM PT

Using AI to play games is often perceived as an early step towards achieving general machine intelligence, as the ability to reason and make decisions based on sensed information is an essential part of general intelligence. Games are good playgrounds for experimenting with intelligent agents as the goals and action rules are often well-defined and abstract. People have been interested in using AI to play games for quite a while. Recent development of deep neural networks allowed visual information in games to be processed effectively and directly used for decision making of agents. And the area of deep reinforcement learning and meta-learning are also being explored in this aspect.

In this presentation we will share experiences from our attempts in using AI on Spark for game playing. The talk will include demos and some details of the experiments, and our learnings, for example, whether Spark is a good fit for implementing game related AI, which parts needs to be improved, and the chances of Spark in the area of AI game playing.

Deep Reinforcement Learning (DRL) is a thriving area in the current AI battlefield. AlphaGO by DeepMind is a very successful application of DRL which has drawn the attention of the entire world. Besides playing games, DRL also has many practical use in industry, e.g. autonomous driving, chatbots, financial investment, inventory management, and even recommendation systems. Although DRL applications has something in common with supervised Computer Vision or Natural Language Processing tasks, they are unique in many ways.

For example, they have to interact (explore) with the environment to obtain training samples along the optimization, and the method to improve the model is usually different from common supervised applications. In this talk we will share our experience of building Deep Reinforcement Learning applications on BigDL/Spark. BigDL is a well-developed deep learning library on Spark which is handy for Big Data users, but it has been mostly used for supervised and unsupervised machine learning. We have made extensions particularly for DRL algorithms (e.g. DQN, PG, TRPO and PPO, etc.), implemented classical DRL algorithms, built applications with them and did performance tuning. We are happy to share what we have learnt during this process.

We hope our experience will help our audience learn how to build a RL application on their own for in their production business.

Session hashtag: #DLSAIS10