Delivering trusted insights using alternative data
SaaS data sources ingested into Databricks via Fivetran
Amazon Redshift clusters eliminated
Company growth since implementing Databricks
“By working directly with their raw data, our stakeholders can understand every decision that’s made about it. Fivetran makes it easy to feed data into Databricks, and Databricks empowers our stakeholders to do their own analysis without involving engineers.”
YipitData is the trusted source of insights using alternative data. The company’s product teams analyze billions of data points each day to provide the granular insights that drive the successful decision-making of hundreds of investment funds and highly innovative corporations. When YipitData began to grow rapidly, the company struggled to scale its analytics activities, which were running on dozens of Amazon Redshift clusters. That’s when YipitData migrated to the Databricks Lakehouse Platform to create a self-service data platform for its product analysts. This approach worked so well that the company began onboarding operational teams such as Revenue Operations and Finance to Databricks. For this persona, YipitData uses Fivetran to ingest data from 20 SaaS platforms into Databricks so that operational teams can access raw data without involving engineers. As YipitData continues to grow, it will become a data-centric organization by empowering everyone to use Databricks and Fivetran to answer questions with data.
Data cluster maintenance slows down analytics
Decision-makers who are under pressure rely on YipitData’s products to help them make the right choices. That’s why the company must remain a step ahead of its market. YipitData aims to know what information its customers need before they even need it, which is only possible by analyzing how they’ve interacted with data products in the past.
“Our core business is to sell very valuable data products to investment funds and corporations to help them make better business decisions,” said Bobby Muldoon, Co-Head of Data Engineering at YipitData. “We maintain deep relationships with our clients, and we aim to understand their key questions, what challenges they face and what kinds of products they care about most. Much of that information is captured in our Salesforce solution, so we need to move it to an analytics platform to allow our Revenue Operations team to analyze and report on it.”
Before adopting Databricks, YipitData’s analytics activities revolved around dozens of Amazon Redshift clusters. Each team within the company maintained its own clusters, writing code in Airflow and using a homemade framework to manage SQL queries. As the company grew, this arrangement became cumbersome — especially when YipitData needed to share common data sets across teams.
“It was really difficult to scale our business with Redshift clusters isolated to each product team,” recalled Muldoon. “When we started ingesting email and credit card data from partners, suddenly we needed to share that data across every team, which meant copying it into about 50 Redshift clusters. That’s when we realized we needed a proper data lake.”
Connecting stakeholders to raw data leads to better decisions
YipitData evaluated several solutions before choosing the Databricks Lakehouse Platform as the foundation of its new data stack. “We knew we didn’t want a data warehouse because it would require a team of hardcore data engineers to maintain it,” said Muldoon. “We bought into the idea of a lakehouse because we needed to get raw data into the hands of our business users with as little friction and engineering involvement as possible.”
YipitData decided to support its Databricks solution with Fivetran, a solution the company had previously used to move large amounts of raw Salesforce data into Redshift for analysis and reporting by its Revenue Operations team. Soon after YipitData switched to Databricks, the company exploded from less than 100 employees to more than 500. As YipitData continues to add new SaaS solutions to its technology footprint, the company uses Fivetran connectors to simplify data ingestion. The list of integrated solutions includes Salesforce, Marketo, NetSuite and Greenhouse.
“We’re ingesting more than 20 SaaS platforms with Fivetran, along with many internal application databases,” said Muldoon. “It makes life much easier for our operational teams who are running internal reporting. Whenever we need to ingest a consistent data set that doesn’t require custom ETL, Fivetran is our solution.”
Every operational team at YipitData must generate analysis and reports. The company not only uses Fivetran connectors to feed teams’ data into Databricks, but also gives stakeholders access to raw data in Databricks so that they can perform their own analysis.
“We believe our business decisions should be made entirely by business users,” explained Muldoon. “So we put raw data in their hands without requiring involvement from engineers. Our Finance team writes SQL queries and PySpark in Databricks to analyze their data. Fivetran makes it easy to get that data, and Databricks makes it easy to analyze it.”
By integrating NetSuite with Databricks, YipitData helped its Finance team overcome reporting limitations. Finance now gets better information with less manual effort.
“Before Databricks, when my reports were beyond a certain length, I had to download them in chunks and manipulate the data to come up with the report I needed,” recalled Emily Frey, Senior Director of Finance at YipitData. “Today, I use basic SQL commands to create my own tables based on the data that’s coming into Databricks through Fivetran. It’s set up as a recurring job that saves me six to eight hours per month. We also automated our cash model reporting with Databricks, which gave us access to more metrics and is saving us four or five man-hours per week.”
Self-service data helps stakeholders meet client needs
Across YipitData, self-service data is now a reality. Product and operational teams can answer their own questions about their data. If they need to build a data product or reporting workflow, they can control the entire ELT process.
“Even our users who aren’t very technical are trained how to derive tables in Databricks to avoid rerunning the same queries constantly, and they’re using multitask jobs to orchestrate complex workflows,” said Muldoon. “It’s been an empowering experience for these teams, and they’ve been very successful at it.”
With Databricks, YipitData’s Finance department works far more productively while dealing with tens of thousands of rows of data in spreadsheets. Rather than enduring frequent spreadsheet crashes and wondering about the integrity of data, the team now uses queries to produce reports in minutes.
“Populating Google Sheets with massive data sets used to be a slow, unreliable process,” said Frey. “Today, I can run a job in Databricks and it automatically refreshes my Google Sheet in about three minutes with no crashing. That’s saving me about three hours during every monthly closing period.”
YipitData’s journey with Databricks and Fivetran has been about exploring new opportunities rather than simply cutting costs. “I can’t say we’re saving money on integrating our SaaS platforms with Databricks because if it had taken us a team of 10 engineers to do it, we wouldn’t have bothered,” said Muldoon. “We never would have gotten budget approval.”
YipitData’s business teams can now collaborate and iterate as they build new types of reports. Having eliminated their biggest technical barriers with Databricks and Fivetran, teams can apply their business knowledge to the toughest data challenges and come up with the insights clients need. “We know our business users won’t build the best data pipelines in the world, but that’s not essential for their use case,” explained Muldoon. “What’s important is that they can take full ownership of the quality of their reporting. They’re the ones who will know if raw values aren’t matching up with expectations, and what to do in each situation. This involves critical business judgment that should not be ceded to an engineer that doesn’t have domain expertise. With Databricks, they can access the most raw form of the data and iterate quickly on their analysis and reporting needs.”
As YipitData grows, the company will continue to give its business users access to ever-increasing volumes of raw data. Muldoon sees this as a key to YipitData’s success. “I don’t think we could have built this business without Databricks,” he concluded. “We couldn’t have enabled this level of self-service on Snowflake, Amazon Web Services or Amazon EMR. With Databricks and Fivetran, we’ve unlocked business value for nearly every team at the company.”