Reema has received a Masters in Computer Science from George Washington University and has over 4 years of experience as a Software Engineer, with an expertise in full stack development and is passionate about learning something new everyday from new languages to technologies She is currently working on the AI platform team at Microsoft, contributing to the development of Microsoft’s prosperity technology (Cosmos, Azure Machine Learning) and industry leading Open Source technology (Spark, Databricks). She is looking forward to explore her career as Data engineer.
June 25, 2020 05:00 PM PT
We demonstrate how to deploy a PySpark based Multi-class classification model trained on Azure Databricks using Azure Machine Learning (AML) onto Azure Kubernetes (AKS) and associate the model to web services. This presentation covers end-to-end development cycle; from training the model to using it in web application. Machine Learning problem formulation The current solutions to detect semantic types of tabular data mostly rely on dictionary/vocabulary, regular expressions and rule-based look up to identify the semantic types. However, these solutions are 1. Not robust to dirty and complex data and 2. Not generalized to diverse data types. We formulate this into a Machine Learning problem by training a multi-class classifier to automatically predict the semantic type for tabular data. Model Training on Azure Databricks We choose Azure Databricks to perform the featurization and model training using PySpark SQL and Machine Learning Library. To speed up the featurization process, we leverage the PySpark Functions (UDF) to register and distribute the featurization functions into UDFs. For the model training, we pick the Random Forests as the classification algorithms and optimize the model hyperparameters using PySpark MLLib. Model Deployment using Azure Machine Learning Azure Machine Learning provided the reusable and scalable capabilities to manager the lifecycle of Machine Learning models. We developed the E2E deployment pipeline on Azure Machine Learning including model preparation, computing initialization, model registration, and web service deployment. Serving as Web Service on Azure Kubernetes Azure Kubernetes provide the fast response and autoscaling capabilities serving model as web service together with the security authorization. We customized the AKS cluster with PySpark runtime to support PySpark based featurization and model scoring. Our model and scoring service are being deployed onto AKS cluster and served as HTTPS endpoints with both key-based and token-based authentication.