Tracking and analyzing how our individual products come together has always been an elusive problem for Steelcase. Our problem can be thought of in the following way: “we know how many Lego pieces we sell, yet we don’t know what Lego set our customers buy.” The Data Science team took over this initiative, which resulted in an evolution of our analytics journey. It is a story of innovation, resilience, agility and grit.
The effects of the COVID-19 pandemic on corporate America shined the spotlight on office furniture manufacturers to solve for ways on which the office can be made safe again. The team would have never imagined how relevant our work on product application analytics would become. Product application analytics became an industry priority overnight.
The proposal presented this year is the story of how data science is helping corporations bring people back to the office and set the path to lead the reinvention of the office space.
After groundbreaking milestones to overcome technical challenges, the most important question is: What do we do with this? How do we scale this? How do we turn this opportunity into a true competitive advantage? The response: stop thinking about this work as a data science project and start to think about this as an analytics-enabled experience.
During our session we will cover the technical elements that we overcame as a team to set-up a pipeline that ingests semi-structured and unstructured data at scale, performs analytics and produces digital experiences for multiple users.
This presentation will be particularly insightful for Data Scientists, Data Engineers and analytics leaders who are seeking to better understand how to augment the value of data for their organization
Jorge Lozano: Hi, welcome to our session. My name is Jorge Lozano, and today I am joined by my peer, Kevin Tian. We’ll be talking today about this topic that we’ve labeled “Analytics-enabled experiences: The New Secret Weapon.” Thank you so much for joining. We have very limited time so rather than going through the agenda in detail, I’ll just get right into it.
Both Kevin and I work at Steelcase, which is one of the leading office manufacturers in the world. The boring way to think about what we do is to say that we manufacture desks and office chairs, but in reality, we like to think of ourselves as doing a lot more than that.
We like to think of ourselves as designers of workspaces. We unlock human promise by creating great work experiences, wherever work happens. And Steelcase is a very historical company. We were founded in 1912, so over 100 years ago, and it’s a company with a lot of history. We’re not like one of those digitally natives. We have so much history behind us. In fact, one of the images that you see there is the signing of the Japanese surrender onboard the USS Missouri in World War II. That document happens to be signed on top of a Steelcase table, so it’s a very rich organization with a lot of history.
And with that, obviously, comes a lot of problems. So what is one of the problems that we’re going to talk about today? The problem that we want to share to you is something that’s been on our mind and that’s been a challenge for years, and it has to do with our ability to understand how our individual products become product applications. Our saying is that, “We know how many Lego pieces we sell, yet we don’t know what Lego sets our customers buy.” Let me elaborate on this a little bit. This speaks to our inability to understand how individual products come together to form a product application.
On one side, you see the breakdown of individual products, which is what we manufacture. On the other side, you see what a customer actually buys, which is that product setting. The blend in between those two is often challenging for us to understand and analyze at scale. We know exactly how many chairs we sold, how many screens we sold, how many caddies we sold, but we don’t really have that rich information of how these came together to form specific settings. So The data science team was tasked to try to solve this problem. It became a very important problem to solve. It became a priority and we were making some really, really good progress.
That is until overnight the world shut down and our initiative was no longer important, it became critical. Why? Well, it became critical because a lot of the information that we were using to derive these analytics, deemed extremely insightful to understand the current state of the North American office, in terms of the layouts, the division, the density, and the placement of workers within a space.
Our priorities had to change overnight. This story is a story of how that change in priorities, and mindset, and focusing on something broader and bigger, eventually led us to a much more successful state. When we switched the question, now it all became focused or geared towards the idea of, “How is data science helping corporations bring people back into the office, and at the same time, set the path to lead the reinvention of the office space?” How did this happen?
Well, a lot of the data that we were collecting, as I mentioned, became essential into understanding how our current offices were set and whether that presented a risk for disease transmission. We were able to use this data to understand the distribution of distance between workers and how often you would have a co-worker sitting right in front of you, or next to you, or behind you with no division to protect you from the disease transmission. We were also able to really understand core templates, or frameworks, or models of what workstations from our customers look like.
How do we take this further? Well, we were able to develop industry benchmarks that deemed significant and extremely relevant to guide the decision-making from even state and federal government. For example, we developed a framework that would classify workstations based on the level of risk that they would present by its abidance to standard social distance measures. In this case, we were seeing that 78% of workstations did not have enough distance or division to comply with those standard social distance measures.
That is a big deal and that is a big deal when we’re talking about worker safety. What we began thinking is, “How can I fix this? How can we solve this? What happens if I add some division in between workers?” “What happens if I add some distance between workers? Will it solve the problem? Will it even help and to what extent?” To do this, we actually thought that we needed a very solid foundation on science. So for this reason, we decided to partner with MIT to understand the fluid dynamics of disease transmission in an office setting.
We wanted to understand the extent to which our retrofitting strategies would play any sort of effect in mitigating the spread of pathogens amongst coworkers. Here’s where we’re at. We have a data-driven starting point that allows us to understand what the most common settings are in what products from within our portfolio could be used to retrofit them. We put these to the test in a state-of-the-art laboratory that tests the spread of pathogens in an office setting, and we use the results for that to understand how should we be thinking about optimizing our retrofitting strategies and propose those to our customers.
But that in itself isn’t the end. In fact, this is the most important thing that you need to learn from this session. If you really want to be able to change the game, you really want to be able to turn this into a competitive advantage, you should stop to think about this as a data science problem and think about this as an analytics-enabled experience. In other words, it’s not about how do you take data to feed models? It’s about how do you take data to create experiences.
For Steelcase, that experience became something that we call Space Scan. Space Scan is a service that we offer to our current customers, on which we’re able to ingest their current floor plans, analyze them, highlight the workstations or settings that are deemed as high risk, and propose retrofitting strategies that have been vetted to improve the safety, in terms of a diminishing in the transmission of pathogens in the office. We do that automatically, we can do that at scale, and we can do that in ways that we can optimize, based on what the customer’s looking for.
Are they looking to be able to bring all of their employees back, only a portion of them? Are they more focused on distance? Are they more focused on division? This is the secret weapon. The secret weapon is to be able to translate things into an experience. It’s not until you make it an experience that you change the game. The focus is about how do you turn data into information? How do you turn that information into insights? Then how do you leverage those insights to create an experience?
How do you do this? How do we go about and do this? And we’re going to talk a little bit about that. First, I want to say there’s two things that come into play. The first one has to do with the organizational structure, and the latter has to do with technical competencies. This is particularly important for non-digitally native organizations, as it is often the case that our structure is much more rigid and potentially different and non-conducive to resource and execute against these type of initiatives. What I can say though, is that a very important thing that you should keep in mind is that you don’t need a team of data scientists, you need a data science team.
What do I mean by this? What I mean by this is that the breadth of resources and skills that are required to transform data into experiences goes way beyond the traditional data scientist. You need to be mindful of that, and you need to be receptive and set up teams that are conducive to execute against these type of initiatives. I’m now going to hand it over to my peer, Kevin Tian, that is going to talk about the technical competencies that are involved in being able to execute initiatives such as this.
Kevin Tian: Let’s talk about the data engineering and infrastructure piece of this work. Back to several years ago when the application mapping world came to the table with all other initiatives. We were all excited about the possibilities. We began to envision how all the information and the insights could empower the business. But when we began to architect the solution, they realized that our choice is very limited. It is like we plan to cook a spaghetti with a top-notch, secret recipe, but found that the only thing we could use to cook and eat the spaghetti is a hammer.
Back then, we were still using our Legacy Analytical System to a wide and [inaudible] legal concern. Let’s just call it a Legacy System. I can tell you some hints. It’s an industry leader since nineties and a popular name about being replaced in the last three years, but it is still popular and highly regulated in industries like pharmaceutical and finance. Back to our story. The Legacy System could still fulfill our regular analytical need. But whenever we want to push the boundary, we began to hit a wall.
The wall consists of two-part. If we want to do that, our Legacy System, we need to pay extra money to enable the capability. The other part is some of the capabilities only works in the marketing materials or in another word, it will only work in certain scenarios. And our scenario [inaudible] the next release. That happens in three key features. We need model deployment, the planning and the distributed training. So we began to explore the possibilities of modernizing our analytical system. And then we finally landed on building a cloud data platform based on aggregated breaks, footballs, data science, and data engineering.
On the application mapping side, we started by thinking back. Our end goal is more about embedding the analytical driven solutions in the business operation rather than the standard report or dashboard. But back to implementation, we need to start small. Based on what the data tells us, they move from one step to another.
What is our big data problem in applications mapping? The data volume and the velocity is high, but can be easily handled by our system. We’ve got more challenges of data variety and the date of our recipe site. For applications mapping, we need to ingest data of different formats. Break out enterprise call data from relational databases, which is easy to consume. Semi-structured data like Jayma and XML can be a little challenging. When they come in good format, that is not a problem at all. Then they’d have Jayma support in spark, an open source library, like spark-xml can handle them accurately and quickly.
Unstructured data definitely needs special treatment. You already have these special softwares. From data engineering perspective, the challenges will be special software automation and the integrates the special software, right into a job scheduler. That may require a collaboration with other teams specialized in this area, or support from software vendors. Now let’s delve a little more deep in the data voracity problem in the two semi-structured data .
Those are 40-some messages we got from IOT devices. If you take a look at the first three messages, you probably will get the idea. They express the same meaning with different schema, and if you are familiar with Chas JSON schema, you may prefer the first messages. So implementation, since it yields JSON native pulling format. But there is nothing wrong in choosing the second or the third solution, as long as it’s consistent. But where all the four implementations exist, that becomes a problem.
The first three, will fill most of the schema inference system, where the fourth one we will be recognized as an invalid JSON. Ideally the problems should be fixed at the machine side. Fixing the problem, your data pipeline is possible, but we also have slowed down the process. And more importantly, it will complicates your data pipeline. The problem is you already cost the buyer undefined data schema of Vic schema design. Schema on read is a great feature in big data solution, but schema on read doesn’t necessarily mean no schema.
Schema design and validation is still important. Let’s take a look at a similar issue on the XML side. In this case, the schema is defined, but not enforced. The orientation is defined as three decimal, X, Y, and Z. But let’s look at the x-value in the data. What does it tell me here? As a human, I can interpret it at almost zero, but a computer will fail interpreting it and treat it as a bad data. Because of the small problem, your one line, the whole XML file was 100,000 lights will fail to work. On our end, the root cause of this problem is that the data may come from different sources.
Though some data source have schema facilitation into the enterprise message queue, several of them don’t. For this particular problem, they ended up fixing the problem in data pipeline. Since this is a common problem and that fixing the problem in the data source is a human integral process.
By ingesting the raw data into our cloud data platform, we enable delivering that the script tape analytics pass-through. On the particular XML passing problem, it took us four days to pass the whole data sets with VML Python, but only 10 minutes with spark-xml.
With the new platform we further enable exploratory data analysis by making both raw data and the refined data accessible for exploration. The refined data can be either a commonly used data table drawn by several raw data tables. A machine-learning scoring result are results table created by another data scientist for a different purpose. And also the beauty in data visualization capability, making verifies that data and a step real easy.
The collaborative notebook is a feature shows much more they’ll be solved before. At first we thought that is good to have as a collaborative tutor notebook, but we ended up use that a lot. That enabled sharing and collaborating within the same team and across teams. I can not count how many times a data scientists asked me for a feature and I shared my sample or notebooks to him or her. And that definitely helped a lot during COVID time, but everyone is working remotely.
But as a software engineer or data engineer, that also helps eliminate another famous problem. It works on my machine. We have done that a lot when each of us running Python on our local machines, but not anymore. We are doing diagnostic analysis. We want to make sure that diagnose can be trusted. Thinking of seeing a doctor without x-rays result. How angry would you be if you know that diagnose your heard three days ago, what’s based on others x-ray? Our diagnose take analysis, the data quality is the top priority. Trust our data is hard to get and easy to lose.
Our first version of the model is based on deep learning. By leveraging a big GPU cluster, we accelerated the trending from Wix to a day with comparable cost. But on the other hand, more machines may not solve all your problems, especially when your task cannot be paralleled. In that case, more machines will cost you more and that may slow down the process due to the overhead of cluster creation and the communication.
In the deep learning example, we accelerated our success as getting off the mat, but accelerating is also important for failure as well. This is one example about measuring distance between workers. In this example, we’re assume that distance between chairs is a good representation of the distance between workers. Then they found an interesting result, that four percentage of workers are based in 12 inches. That is very counter-intuitive since most of our chairs are wider than 12 inches.
And even in our peer programming scenario, two coworkers should not be that. So we dig the further into the scenario and found head rest are counted as a set. By eliminating them out of the measurement, they out a much better result. Back to Space Scan with the data and the information we created a machine learning model, which represents of our insights in helping customer detect high risk sightings. But how can we embed the model into current process? How can we go beyond a presentation or a dashboard?
By collaborating with our smart host team we embed the Space Scan experience into the design tool our dealer use. Thus, he does not need to go to 10 different places to use the tool. Inside the feature comes handy in their data process. We enabled the native integration by continuizing our remote. The dealer tool will make an API call to the model in the container to receive the result in time. For customers with more strict data privacy concern, we make the container available for edge deployment. That’s the sensitive customer information will not be shared with us. The actual deployment will also accelerate the response time, especially for big floor plans. Here covers the technical competency part of the project. Let me hand it over to Jorge.
Jorge Lozano: Thank you so much, Kevin. So now the question is where are we headed next? Where do we go from here? Well, there’s a couple of things. First is, there’s two important things to highlight here. The first one is that the office is here to stay, right? And some of the leading organizations have been very vocal about the fact that they are looking for ways in which they can bring their employees back because they understand the value in culture and innovation that the physical space brings to them.
However, we understand that employees are not going to want to come back to the office that they left. Because people now have new needs and expectations. And this forces a need to rethink the role that the office fulfills on the worker. So new design principles will prevail and we need to be mindful and adaptive to those. What this is creating is a revolution, and we need to be mindful of that revolution. So how do we stay on top of it? How do we stay connected to it and understand the things that are changing?
I think the most important thing here is to make sure we always remember that we don’t sell what we build. We build what our customers are willing to buy. This slight distinction is very important because understanding that that distinction implies that we understand the need to stay connected to our customers and to understand their ever so changing needs. So how do we do this? What we want to be able to do is we want to be able to build a digital thread that can empower the creation of a flywheel.
A flywheel that allows us to stay connected with our customers across different points of the journey. We know that the value generation coming from a digital thread will unlock the potential across our business, not just today, but tomorrow.
Kevin Tian: How can we further enable the office revolution from the data engineering side? First, we need to focus more on the near real time data collection. Currently, most of our pipelines work in a batch fashion. That’s the insights generation is days or weeks behind. Migrating most of our pipeline into streaming will enable near real time data collection and a faster insights generation.
Second part is about catching the truth. Again and again, customers ask us what the future of office would be like. As a thought leader in the area, our research team collaborates with top-notch researchers around the world. But from our side, catching the data drift in design cold, we will definitely help understand what is happening in the field at a large scale.
Also, data could further push that several qualitative research ideas into quantitative measurement. The last part is to broaden self service analytics to dealers so they could generate insights based on their own needs. Of course, this needs to be done right since the data governance and the security issue would be different from our current model. Thank you all for attending this session.
"With a background in Economics and Actuarial Science, Jorge has been involved in data science initiatives at Steelcase for the past 10 years.
During this time, he has been part of the evolution o...
Originally from China, Kevin has an educational background in Computer Science and Electrical Engineering from The University of Alabama in Huntsville. He is currently the lead Data Engineer for Steel...