MLCommons aims to accelerate machine learning to benefit everyone.
MLCommons will build a a common set of tools for ML practitioners including:
Benchmarks to measure progress: MLCommons will leverage MLPerf (built on DAWNbench) to measure speed, but also expand benchmarking other aspects of ML such as accuracy and algorithmic efficiency. ML models continue to increase in size and consequently cost. Sustaining growth in capability will require learning how to do more (accuracy) with less (efficiency).
Public datasets to fuel research: MLCommons new People’s Speech project seeks to develop a public dataset that, in addition to being larger than any other public speech dataset by more than an order of magnitude (86K hours labeled speech), better reflects diverse languages and accents. Public datasets drive machine learning like nothing else; consider ImageNet’s impact on the field of computer vision.
Best practices to accelerate development: MLCommons will make it easier to develop and deploy machine learning solutions by fostering consistent best practices. For instance, MLCommons’ MLCube project provides a common container interface for machine learning models to make them easier to share, experiment with (including benchmark), develop, and ultimately deploy.
David Kanter: Hello, my name is David Kanter. I’m the executive director of MLCommons and good morning, good afternoon, good evening wherever you are. I want to thank you for taking the time to join me. I’m privileged to be able to give this talk about MLCommons, which is an organization that I helped to found and now run.
The structure of my talk is a bit of a dialogue in a Socratic method as it were and my goal is to answer these six questions about MLCommons in the 20 odd minutes that I have to talk, and then during the Q&A, I’ll get to answer the rest of them. Without further ado, I want to start with the first one, what is MLCommons? Well, MLCommons is an open engineering organization that is focused on making better ML for everyone. I’m going to talk a little bit about why in the history.
As a motivation, I think if you’ve been paying careful attention for the last decade, we’ve seen the tremendous power that machine learning has to benefit all of society. A couple years ago I went to Beijing and I don’t speak any Mandarin, but using translation services, all powered by machine learning, I was able to not get tremendously lost inside of Beijing and converse with many people. It was fantastic. In the world of health and medicine, we’ve seen a lot of research and even some products where they’re applying machine learning algorithms to imaging, whether it’s x-rays or MRIs to improve or speed up diagnostics.
And then certainly self-driving vehicles have captured everyone’s attention. To me, one of the things that’s really exciting is, every year tens of thousands of people die in car accidents. And one of the promises is that autonomous vehicles can probably cut those deaths down by an order of magnitude or two, saving a huge number of lives, and also be greener and more efficient. I’m probably not the most gas-efficient driver, and I’m sure an AI can beat me at it, so we’ve got a tremendous opportunity to use machine learning for the benefit of everyone across the world.
This doesn’t need to be a purely altruistic endeavor. Now, I’m not a big believer in analyst numbers, but they’re a good indicator of the general direction. So if you look at some of these analysts reports, they’re saying that by the middle of this decade, that the market for machine learning systems and components is going to be about $30 billion. And now the software and services on top of that will be yet larger again. So there’s both a tremendous potential to help everyone across the world and a tremendous commercial potential.
However, machine learning is really in its infancy. If you look at some of the great industrial revolutions in the last century, things like flight and automobiles and computers were still very much in the early days. If you look back at some of those early images of planes, I recall the ones that looked like someone took a bicycle and attached wings, and it was totally custom built by some guys in a garage. Fortunately, when I am allowed to travel, it will be on something that’s a little bit more robust than that. But that’s kind of what AI is like today.
But we want the industry to grow. We want to get to the point where all those potential benefits that I talked about can be realized by many people. And so, the machine learning world needs a lot of things to help the technology grow. I think about this like a winery, which is that we’re after the grapes, but we’ve got to build the right trellises to support those grapes. So we need more than just technology, we need other things to help the technology grow and thrive. And that brings us to MLCommons.
When we were starting this organization, we saw that need to help drive the whole industry forward, and when we looked around, we actually didn’t even want to make our own organization. The first thing we said is, “Gosh, there must be someone doing this.” So we looked and we saw a lot of engineering organizations like IEEE or USB that do great open engineering, but none of them focus on machine learning. And then we looked and we saw a lot of machine learning organizations, things like the partnership on AI that are doing fantastic work in AI or machine learning, but none of them really focused on the open engineering aspect. And this we said, “Ah, we know what we’ve got to do.” And thus, the genesis of MLCommons. We’re focused on open engineering to create better ML for everyone.
So this is a bit about who we are. We are a collaboration between industry and academia, our founders hail from leading universities, as well as some of the most prominent companies in machine learning. Folks who make critical hardware systems, cloud providers, and so forth, spread across the whole globe, nearly every continent. Now, what that means is that if you happen to be one of those rare folks in Antarctica who’s working on machine learning, you should give me a call because we’d really like to get that last continent. But it’s a very broad, open and welcoming community.
And it really is a community, it’s powered by individuals. These are some of the people that I’m privileged to work with and they span so many different parts of the world, different backgrounds. We have folks who are medical researchers at hospitals, we have folks who work at tech companies, folks who work at scientific computing labs, on all different continents. And ultimately we’re all pulling together on the open engineering and it’s really a wonderful. It’s got warm, open, welcoming community. One of our written rules is we celebrate with cake, so as soon as we can get together, there’s definitely going to be some cake going on.
I described our high level vision, which is better ML for everyone. And realistically, we want to approach that through three pillars, creating benchmarks, creating large open datasets and best practices to reduce friction, and I’m going to dive into those in a little bit. And then our fourth pillar or perhaps the foundation really as the diagram suggests is research, which is there’s so many different ideas because machine learning is still a new and early industry that we constantly have new ideas that feed into these pillars from research.
Why benchmarks? I like to think of benchmarks as a barometer on progress and also a way of agreeing on terminology. When I say that our mission is to make better ML for everyone, what does better really mean? Well, that’s what a benchmark is, it defines what is better. And part of the value, a huge part of it is aligning everyone across different continents, cultures, geographies, companies, and functions, so that whether you’re in sales or academia or marketing or an executive, you can understand what does better mean. In the words of Peter Drucker, once you can measure it, you can improve it. And so by getting everyone rowing in the same direction, we will unlock tremendous possibilities and drive the industry forward.
I want to give an example of that. MLPerf is the industry standard benchmarks for machine learning, for training neural networks and doing inference. The MLPerf training benchmarks have been around since 2018 and the chart in the left hand side here shows the improvement in performance over about 18 months. And you can see it starts out and we were able to accelerate training machine for neural networks for some key tasks, things like transformer, which is used in translation, SSD and Mask R-CNN are used for object detection. So very core tasks for many different people and they’ve been accelerated by 10X, by 25X, by even maybe 3X on the low end.
And so this is bringing more capabilities into the hands of researchers and folks across the industry and academia. Fortunately, we’ve had very kind coverage from the press. This is an example of how benchmarks can really drive the industry forward and make machine learning better for everyone.
Now, we started with just MLPerf training, and as you can see, we’ve branched out to be much broader. Late last year, we got to run some benchmarks on Fugaku, the world’s largest supercomputer. And this year we’re going to bring this down all the way to the very low end, to embedded microcontrollers that just consume tens of mil or micro Watts. So we’re spanning everything from the lowest and most power-efficient and compact systems to 20 megawatts supercomputers and everything in between. We’ve added in new benchmarks for recommendation, for speech to text, for medical imaging.
And then we’ve added in additional capabilities like measuring power consumption, so that not only can we get these workloads to run faster, but more efficiently, and save energy, and help to reduce power consumption and greenhouse gas emissions.
Let me turn to the second pillar of our vision, datasets. Why do we want to build datasets? I think some of this, you can trace back in history and we were very much inspired by Imagenet. Now, for those of you who aren’t machine learning historians, Imagenet was put together by a team of fantastic academics for about $300,000. It’s an open dataset for image classification to try and train computer systems, is it a cat or is it a dog? And when Imagenet came out, it’s open, and after a while, eventually researchers discovered that using convolutional neural networks, you can beat humans at image recognition. And that unlocked a revolution in machine learning and got us to where we are today.
And what we want to do is build the next generation of Imagenets. Now, they already did a great job for images, but there’s many other areas that could use this kind of transformation. This is critically important because even at the world’s leading machine learning and AI companies, they use public datasets. Folks like Google or Amazon who have massive internal datasets, their researchers still want to use public datasets because it allows sharing tips and techniques with everyone else. It allows reproducing results. You hear the phrase on the shoulder of giants, using public datasets allows you to stand on the shoulders of those who came before, so they’re absolutely critical.
But one of the challenges is a lot of these public datasets are small, and we know that for machine learning, you need big datasets. There’s oftentimes restrictions around commercial usage. If the dataset’s not commercially usable, that’s a real hindrance. It may not be redistributable, so you can’t share it with your friends and colleagues. And then there’s a well-known diversity problem in data for machine learning. And folks like myself who have relatively conventional accents, there’s a lot of data for that. But if you have a unique accent, a lot of AI-based services might not work well. I’m sure many of you have seen the videos of folks with a thick, Scottish brogue trying to use voice assistance and it doesn’t turn out very well. Better data, more diverse data would help a lot.
And last, the world is a dynamic place and we want our datasets to reflect that. If you build a speech recognition system that works today but it never gets updated, it’ll never pick up the latest slang or as the language evolves, so we need datasets that are going to be large, commercially usable, diverse, and continuously improving. That’s the charter.
Now, we’re building large open datasets in accordance with our vision and we’re starting with speech to text. Partially because we think it’s going to be a tremendously prolific technology that’ll reach frankly most people by the middle of this decade. Not only is it going to have tremendous breadth, but it’s also tremendously powerful. My mother actually had a stroke three years ago and her eyes don’t work tremendously well, and so she dramatically prefers to rather than type to interact with a speech through text engine. I also used to have a roommate who lost his sight when he was in his 20s. Again, speech to text is tremendously powerful for him.
So this is a really empowering technology, but as it is today, most of the public datasets are pretty small, especially once you get outside of the common languages like Mandarin and English. If you speak… Sorry, Yoruba or Polish, there just aren’t large public datasets, and that impedes progress. So we’re building a dataset called the People’s Speech. It’s going to be released under a commercial… Sorry, a Creative Commons license, redistributable, commercially usable, and it contains 86,000 hours. It’s over 10 years of speech.
Compared to what was available just a year or two ago, it’s I think 30X bigger. We’re currently sharing it with our members so that they can kick the tires and make sure that it’s going to definitely drive the state-of-the-art forward. And as with many things, we’re beta testing, try to work out all the kinks. Hopefully we’ll have a public release later this year and like a garden we intend to curate it over time so that it evolves and becomes better. On the other side of the slide, you can see some of our ideas for making this dataset better. We’re starting with English, Read text.
So we want to branch out into other languages to tackle this diversity problem, number one, but we also want to handle more complex situations. Right now, I’m speaking at a conference with an excellent microphone and it’s just me, but in practice when you really want to do speech to text, it might actually be in the hallway session of this conference with half a dozen people speaking in acoustically challenging space and many different languages all together constantly interrupting each other. That’s much harder, but we need to… Hopefully we can get the data that will unlock that capability.
The other key part of our vision for datasets is we’re going to start with this, with People’s Speech, but we believe that a lot of the tools we develop and the expertise is actually reusable. And so, again, harking back to Imagenet, one of the things that I thought was fantastic is what they accomplished, but since it was an academic project, everyone went their own separate ways. The graduate students graduated and are now professors or in the industry and the knowledge was really scattered. What we want to do as we build these datasets is to keep that knowledge in-house to maintain that center of expertise, that critical mass, and then begin to apply it to other areas.
Whether it’s outside of speech and in imaging or in something altogether different, but we fundamentally believe that the data engineering that we’re going to build for this is going to be reusable and the expertise is going to help us do much better when it gets to other datasets.
Now I get to turn to best practices, our third pillar, removing friction in machine learning. The key to me about this is that when I look at how ideas trickle throughout the industry, oftentimes it starts with one paper that’s revolutionary, maybe it’s AlexNet or BERT in NLP. So you get a paper with a bunch of key ideas and maybe some code. Maybe you’re lucky and you get a dataset, but when it comes time to take that model and start playing around with it yourself, it’s just headaches from start to finish. You got to deal with the software dependencies, you got to try and get the dataset. It might not even run on the hardware you have.
Suppose you’re using Alibaba Cloud and it was developed internally at Facebook or at Azure, it won’t work, simply not portable. So this is days or weeks of work to just get everything up and running, and then you’re going to train this model. There’ll probably be a bug because that’s how life is and it probably won’t train to the stated accuracy, so now you’re going to have to debug it and figure out what’s going on. And maybe it’s just you pointed it to the wrong portion of the data or whatever. But the whole process is just really error prone. It’s sort of rather than doing assembly line work, it’s doing things by hand, and that just leads to subpar results.
So we have a project called MLCube that was inspired by shipping containers. The core idea here is, in the shipping industry, you can send me whatever you want from the other side of the planet. You don’t have to know about what ship it’s going on. Although hopefully it’s not that one that got stuck in the Suez Canal, and I don’t need to know how you send it there. You just pile stuff in the shipping container, it’s a dead, simple interface. There’s not much brains in a shipping container. I get it, I open it up and I’m good to go, so that was our inspiration. We said, “We want to make the shipping container for ML models.”
The core idea here is it’s like Docker with a consistent command line and metadata. Now, in reality, it’s not tied to Docker, it’ll work for any container. We’ve got simple runners for local machines, different clouds, Kubernetes, you can hook up your own infrastructure, but the core idea is, you grab an MLCube and it’s as easy as hitting a run. So rather than having two or three weeks of wrangling, it’s minutes hopefully. It’s up on GitHub, you can take a look, you can drop in and say hi. If you have ideas, we’d welcome them. Again, we’re an open community, we’d love to get more folks involved, so by all means, please check it out.
Where are we going to next? I shared our initial three pillars, but there was also the foundation, research where we toy around with new ideas until they’re solid enough that we really decided that they should graduate and go on to become real projects. And I want to share just three of those projects that I’m excited about now, and maybe you’ll be excited as well. The first is algorithmic benchmarking. This is looking at the core algorithms of machine learning to make them more efficient. How can we train a network using fewer samples, for example? Maybe by doing something other than stochastic gradient descent, just as one example.
Second area that we’re looking at is how can we bring ML into the medical field? And just starting with a small step, which is federated evaluation. One of the big challenges in the medical arena is of course, all the medical data is private, so it’s hard to share and build up this one giant dataset. But when you have a researcher who develops a model, say in Japan, you want to test that out in other places, so maybe a more diverse area like say Chicago. So how do you ship that model out to a hospital in Chicago, evaluate it, protect all the IP, get it up and running and just to have it be easy. And the key is, we’re talking about medical researchers who are not ML experts, so how do we make that easy and low friction? And we think that that will unlock a lot of innovation in the medical space.
And then the third project that I’m pretty excited about is our scientific research working group, which is focusing on how can we improve machine learning that works with scientific instruments. I like to think of this as, how can we make data analysis at CERN better? But there’s many other things we’re looking at too, but specifically with science data. And I want to mention, when it comes to research, we’re open to any ideas, so if you have a great idea by all means, show up, come get involved. And maybe you’ll find that there’s a half a dozen people who think you have a great idea as well and they want to help out.
How can you get involved in MLCommons? Hopefully I’ve made a compelling pitch and you are interested. Again, we’re a pretty open community. You can join our mailing lists. If you go to our webpage, there’s instructions on how to get involved in that section of the site. We’ve got community events four times a year. Normally, we do them in Silicon Valley and then have one in Asia, one in Europe, but that hasn’t been possible since COVID. But we’re eagerly looking forward to our next in-person community meetings. You can join as a member, if you’re an academic, it’s free and it’s frankly pretty affordable for most companies.
If you want to benchmark your ML solution, you can join some of the MLPerf working groups and submit their. We actually just released the latest round of MLPerf inference, which was the first debut power measurement. So come and join us at MLCommons, get involved and you can always shoot me a line as well.
That concludes my talk, so I want to thank you for your attention, for your time, choosing to spend it with me. Last, I want to encourage you to leave feedback. That helps this conference be better, it helps me be better as a speaker, so thank you for doing that as well.
David Kanter is a Founder and the Executive Director of MLCommons where he helps lead the MLPerf benchmarks and other initiatives. He has 16+ years of experience in semiconductors, computing, and mach...