Offering prediction APIs for fee is a fast growing industry and is an important aspect of machine learning as a service. While many such services are available, the heterogeneity in their price and performance makes it challenging for users to decide which API or combination of APIs to use for their own data and budget. We take a first step towards addressing this challenge by proposing FrugalML, a principled framework that jointly learns the strength and weakness of each API on different data, and performs an efficient optimization to automatically identify the best sequential strategy to adaptively use the available APIs within a budget constraint. Our theoretical analysis shows that natural sparsity in the formulation can be leveraged to make FrugalML efficient. We conduct systematic experiments using ML APIs from Google, Microsoft, Amazon, IBM, Baidu and other providers for tasks including facial emotion recognition, sentiment analysis and speech recognition. Across various tasks, FrugalML can achieve up to 90% cost reduction while matching the accuracy of the best single API, or up to 5% better accuracy while matching the best API’s cost.
Lingjiao Chen: Hi everyone. My name is Lingjiao, and today I’m happy to talk about our work on the using ML prediction APIs in a more accurate and cheap way. This is a joint work with Professors Matei Zaharia and James Zou. Today’s talk is divided into three main component. First we will give an introduction to machine learning as a service. And next we will talk about our proposed solution FrugalML, which talks about how to save up to 90% of the cost while using machine learning APIs. Where we’ll go over the main idea behind its framework, how to use it, as well as the empirical evaluation on some real world machine learning APIs. And finally, we will track evaluations and talk about what is next.
So first of all, what is machine learning as a service? For the past few years, people have already observed a big success of machine learning in many domains and want to extend that to many other domains. But the people in other domains may not always be the machine learning expert. And therefore, we’re going to help those people to benefit from the machine learning breakthroughs without worrying about the low level overheads such as a model training, data labeling, et cetera. Nowadays, there are many participators in this market such as Google, Microsoft, Amazon, IBM, and many other companies.
According to the market research company Mordor Intelligence, the market side at machine learning as a service was already 1.0 billion in 2019 and expected to reach more than $8 billion in the next five years. Here is example of machine learning as a service. Suppose one wants to do a facial emotion recognition task. Instead of hiring his or her own ML expert and building his or her own ML model to do this task, you can basically call the Google Vision API and the process is very simple. You just upload this image to this machine learning vision API and then APL returns the emotions as well as a confidence score.
For instance here, this Google Vision API gives the label happy or joy of this image with a pretty high confidence score. The cost is pretty tiny, it’s only about $0.0015 per image. [inaudible] good, right? But what is the problem? There are many commercial APIs with the same functionality. For instance, for this facial emotion recognition task in Google, Microsoft, Amazon, and many other companies all provide the same kind of a functionality for this particular task. But there is a big difference in their performance and cost. Therefore, initial question is which API or combinations of API to use?
To address this problem, we proposed a solution called FrugalML which essentially optimizes for the best sequential strategy within a budget constraint. Across all the tasks and datasets we evaluate, we observe that up to 90% of cost savings or 5% better accuracy with the same cost. How does FrugalML work? It works as follows. First, it calls a base service. Say it’s a GitHub model and then it takes the predicted quality score and label from the base service as a feature to decide if the prediction is already acceptable or if an additional API should be invoked and which one should be invoked.
Take this figure as one example. Suppose we’re doing a facial emotion recognition task. We get a facial image x, and assign it to GitHub model first. If the prediction is happy with quality score less than 0.9 or surprise with quality score less than 0.6, then we send this image again to a Google API to get a better prediction. Or if the prediction is sad and quality score less than 0.8 or if it’s anger with quality score less than 0.7, we pass the image to the Microsoft API to get a better prediction. Otherwise, we believe this GitHub model already gives a good prediction and simply accept it.
To the real user those things are essential to black box. Then how should a user actually use this in practice? Actually the usage of FrugalML would be very similar to the usage for a real commercial API. Let’s take face detection as your example. If you want to use the Google API for face detection, you basically invoke a Google client and then call the function face detection on a particular image. And then that’s it. For FrugalML during this deploying phase it’s almost the same, you invoke a FrugalML strategy and then call the face detection function to get a response. The only difference is that FrugalML requires additional training process, where you can set up your own budget requirement and ask FrugalML to find the best strategy for you. Notice that this training phase is essentially offline.
So how do we actually train a good FrugalML strategy? Notice that there are many perimeters in a strategy. For instance, how to pick the right optimal base service, how to pick the right or optimal add-on services. How do you set up the thresholds, and et cetera? This is modeled as a combinatorial optimization problem and the interesting question is, how do we develop a provable efficient solver? This involves statistical efficiency meaning that we want to use as few samples as possible, and also computational efficiency meaning that we don’t want to take a super long time to finish the training process.
To tackle the problem we observe initial sparsity structure [inaudible] the optimal coding strategy. And based on that observation we’re able to develop a approximation solver that gives a one over N accuracy loss compared to the best strategy. So here N is a number of samples we need from the APIs, meaning that we want to annotate those N samples from each of the APIs. The computational cost of this provable efficient solver is proved to be linear in N from relatively large N. So how does FrugalML actually work in practice? This figure gives a case study on a facial emotion recognition dataset, FER+. Here we set the budget to be $5 which is the same as the cheapest commercial API.
As you can see from this learned FrugalML strategy, for more than 50% of the time we would only require to call a GitHub model. And that’s probably why FrugalML could save the budget. And second, we do use different quality score for different facial emotions. And that tells us the importance of this label-dependent quality score. How does this actually perform? What is the performance metric? So this figure gives the accuracy and cost comparison across different ML APIs as well as FrugalML. So first we notice that the cost of FrugalML is the same as the cost of Face++, which verifies that our budget requirement is satisfied.
Second, notice that Google API’s cost is actually higher than any other individual APIs. However, it’s accuracy is lower than Microsoft’s accuracy. This essentially tells us that higher cost does not always imply higher accuracy, therefore it’s very important to decide which API to use even for the users with high budget. Perhaps most surprisingly while FrugalML incurs a relatively small cost its accuracy is even better than the best commercial API, Microsoft. This is probably because FrugalML to some extent could be viewed as a symbol of all its individual APIs? Therefore, it uses the information from all the individual APIs and should be better than any individual of them.
Now let’s zoom out a little bit and try to understand what is the accuracy and budget trade-offs achieved by FrugalML. So from this figure, we can first notice that compared to any individual API FrugalML enables the user to select their own budget or their own accuracy requirement. This gives a large flexibility compared to individual APIs. Second, to match the best commercial APIs accuracy, say Microsoft API’s accuracy, FrugalML requires only 30% cost and therefore could save you quite a bit of money. Also if you do allow FrugalML to use a relatively large budget, then its accuracy could be higher than the best commercial API.
We also compared FrugalML with a simple cascade approach, which is the red line in this curve. As you can see across different budget requirement FrugalML, which is the blue line, consistently out performs this simple cascade approach. So, does FrugalML only works for facial emotion recognition? Of course not. So here we studied FrugalML’s performance across 12 different dataset for three different tasks, including facial emotion recognition, sentiment analysis, and speech recognition. This figure shows the cost of savings achieved by FrugalML while match the best commercial API’s accuracy. As you can say here, typically FrugalML could achieve more than 50% cost savings and sometimes up to 90%.
This figure shows FrugalML’s accuracy improvement while match the best commercial API’s cost. In fact, FrugalML could reach as high as 5% accuracy improvement. So this shows FrugalML’s other benefit which is, utilizing the power of all APIs in a market can actually lead to overall higher accuracy. So let’s look at what we have already looked at. In this work, we essentially try to study how to best use machine learning APIs in the marketplace within a budget constraint. Towards this problem we proposed a solution called FrugalML which gives provable performance and efficiency guarantee. Empirically, we noticed up to 90% of cost savings or 5% better accuracy with the same cost.
To stimulate more research in this area, we also released the dataset with more than 600,000 samples annotated by those machine learning APIs as well as our code. Is this the end of the story? No. In fact this is a really under explored area and there are many open problems here. For instance, in this work we focus on the simple classification tasks. But there are many machine learning APIs focusing on the more complicated tasks. For example, the segmentation task in computer vision or the language translation task in the NLP processing, how can we actually allow FrugalML to work for those more complicated tasks?
In addition, in this work we assume that all those APIs they remain stable. However, in practice many APIs could have frequent update. So then their performance may not be stable, then how can we detect such a performance shift and adjust our ML API coding strategy to take that into account? That also remains open. And finally, in this work we focus on accuracy and cost which are important metrics for the users. But there are also other metrics for instance, fairness metric or latency requirement. How could one take into account all those different metrics and build a holistic approach that also remains open?
In fact, there are many interesting open problems in this domain, and we’re really excited about keeping working on this exciting area. Cool. Thanks for your attention. Our code and data are already available in this GitHub repository. If you’re interested in more about the theoretical analysis and the empirical results, feel free to visit our project website and our full paper or send an email to us. Thank you so much.
Lingjiao Chen is a PhD researcher at Stanford University. He is broadly interested in machine learning, data management and optimization. Working with Matei Zaharia and James Zou, he is currently expl...