The Executive’s Guide to Data, Analytics and AI Transformation, Part 5: Make informed build vs. buy decisions

Published: June 1, 2023

by Chris D’Agostino, Mimi Park and Usman Zubair

A key piece of your data and AI transformation strategy will involve the decision around which components of the data ecosystem are built by the in-house engineering team and which are purchased through a vendor relationship. There is increased emphasis within engineering teams on taking a "builder" approach. In other words, the engineering teams prefer to develop their own solutions in-house rather than rely on vendor products.

Competitive advantage

This "roll your own'' approach has some advantages — including being able to establish the overall product vision, prioritize features and directly allocate the resources to build the software. However, it is important to keep in mind which aspects of your development effort give you the most competitive advantage.

Spend some time working with the data transformation steering committee and other stakeholders to debate the pros and cons of building out various pieces of the data ecosystem. The primary factor should come down to whether or not a given solution offers true competitive advantage for the organization. Does building this piece of software make it harder for your competitors to compete with you? If the answer is no, then it is better to focus your engineering and data science resources on deriving insights from your data.

Beware: becoming your own software vendor

As many engineering leaders know, building your own software is an exciting challenge. However, it does come with added responsibility — namely, managing the overall project timeline and costs, and being responsible for the design, implementation, testing, documentation, training, and ongoing maintenance and updates. You basically are becoming your own software vendor for every component of the ecosystem that you build yourself. When you consider the cost of a standard-sized team, it is not uncommon to spend several million dollars per year building out individual component parts of the new data system. This doesn't include the cost to operate and maintain the software once it is in production.

To offset the anticipated development costs, engineering teams will oftentimes make the argument that they are starting with open source software and extending it to meet the "unique requirements" of your organization. It's worth pressure testing this approach and making sure that a) the requirements truly are unique and b) the development offers the competitive advantage that you need.

Even software built on top of open source still requires significant investment in integration and testing. The integration work is particularly challenging because of the large number of open source libraries that are required in the data science space. The question becomes, "Is this really the area that you want your engineering teams focused on?" Or would it be better to "outsource" this component to a third party?

How long will it take? Can the organization afford to wait?

Even if you decide the software component provides a competitive advantage and is something worth building in-house, the next question that you should ask is, "How long will it take?" There is definitely a time-to-market consideration, and the build vs. buy decision needs to also account for the impact to the business due to the anticipated delivery schedule. Keep in mind that software development projects usually take longer and cost more money than initially planned.

The organization should understand the impact to the overall performance and capabilities of the daily ecosystem for any features tied to the in-house development effort. Your business partners likely do not care how the data ecosystem is implemented as long as it works, meets their needs, is performant, is reliable and is delivered on time. Carefully weigh the trade-offs among competitive advantage, cost, features and schedule.

Don't forget about the data

Perhaps the single most important feature of a modern data stack is its ability to help make data sets and "data assets" consumable to the end users or systems. Data insights, model training and model execution cannot happen in a reliable manner unless the data they depend on can be trusted and is of good quality. In large organizations, revenue opportunities and the ability to reduce risk often depend on merging data sets from multiple lines of business or departments. Focusing your data engineering and data science efforts on curating data and creating robust and reliable pipelines likely provides the best chance at creating true competitive advantage.

The amount of work required to properly catalog, schema enforce, quality check, partition, secure and serve up data for analysis should not be underestimated. The value of this work is equally important to the business. The ability to curate data to enable game-changing insights should be the focus of the work led by the CDO and CIO. This has much more to do with the data than it does with the ability to have your engineers innovate on components that don't bring true competitive advantage.

To learn how you can establish a centralized and cohesive data management, data science and data governance platform for your enterprise, please contact us today.

This blog post, part of a multi-part series for senior executives, has been adapted from the Databricks' eBook Transform and Scale Your Organization With Data and AI. Access the full content here.