Helping 30 million people and 50,000 teams communicate more effectively
Faster querying, at 10% of the cost to ingest, than a data warehouse
Daily events available for analytics in under 15 minutes
“Databricks Lakehouse has given us the flexibility to unleash our data without compromise. That flexibility has allowed us to speed up analytics to a pace we’ve never achieved before.”
– Chris Locklin, Engineering Manager, Data Platforms, Grammarly
Grammarly’s mission is to improve lives by improving communication. The company’s trusted AI-powered communication assistance provides real-time suggestions to help individuals and teams write more confidently and achieve better results. Its comprehensive offerings — Grammarly Premium, Grammarly Business, Grammarly for Education and Grammarly for Developers — deliver leading communication support wherever writing happens. As the company grew over the years, its legacy, homegrown analytics system made it challenging to evaluate large data sets quickly and cost-effectively. By migrating to the Databricks Lakehouse Platform, Grammarly is now able to sustain a flexible, scalable and highly secure analytics platform that helps 30 million people and 50,000 teams worldwide write more effectively every day.
Harnessing data to improve communications for millions of users and thousands of teams
When people use Grammarly’s AI communication assistance, they receive suggestions to help them improve multiple dimensions of communication, including spelling and grammar correctness, clarity and conciseness, word choice, style, and tone. Grammarly receives feedback when users accept, reject or ignore its suggestions through app-created events, which total about 5 billion events per day.
Historically, Grammarly relied on a homegrown legacy analytics platform and leveraged an in-house SQL-like language that was time-intensive to learn and made it challenging to onboard new hires. As the company grew, Grammarly data analysts found that the platform did not sufficiently meet the needs of its essential business functions, especially marketing, sales and customer success. Analysts found themselves copying and pasting data from spreadsheets because the existing system couldn’t effectively ingest the external data needed to answer questions such as, “Which marketing channel delivers the highest ROI?” Reporting proved challenging because the existing system didn’t support Tableau dashboards, and company leaders and analysts needed to ensure they could make decisions quickly and confidently.
Grammarly also sought to unify its data warehouses in order to scale and improve data storage and query capabilities. As it stood, large Amazon EMR clusters ran 24/7 and drove up costs. With the various data sources, the team also needed to maintain access control. “Access control in a distributed file system is difficult, and it only gets more complicated as you ingest more data sources,” says Chris Locklin, Engineering Manager, Data Platforms at Grammarly. Meanwhile, reliance on a single streaming workflow made collaboration among teams challenging. Data silos emerged as different business areas implemented analytics tools individually. “Every team decided to solve their analytics needs in the best way they saw fit,” says Locklin. “That created challenges in consistency and knowing which data set was correct.”
As its data strategy was evolving, Grammarly’s priority was to get the most out of analytical data while keeping it secure. This was crucial because security is Grammarly’s number-one priority and most important feature, both in how it protects its users’ data and how it ensures its own company data remains secure. To accomplish that, Grammarly’s data platform team sought to consolidate data and unify the company on a single platform. That meant sustaining a highly secure infrastructure that could scale alongside the company’s growth, improving ingestion flexibility, reducing costs and fueling collaboration.
Improving analytics, visualization and decision-making with the lakehouse
After conducting several proofs of concept to enhance its infrastructure, Grammarly migrated to the Databricks Lakehouse Platform. Bringing all the analytical data into the lakehouse created a central hub for all data producers and consumers across Grammarly, with Delta Lake at the core.
Using the lakehouse architecture, data analysts within Grammarly now have a consolidated interface for analytics, which leads to a single source of truth and confidence in the accuracy and availability of all data managed by the data platform team. Across the organization, teams are using Databricks SQL to conduct queries within the platform on both internally generated product data and external data from digital advertising platform partners. Now, they can easily connect to Tableau and create dashboards and visualizations to present to executives and key stakeholders.
“Security is of utmost importance at Grammarly, and our team’s number-one objective is to own and protect our analytical data,” says Locklin. “Other companies ask for your data, hold it for you, and then let you perform analytics on it. Just as Grammarly ensures our users’ data always remains theirs, we wanted to ensure our company data remained ours. Grammarly’s data stays inside of Grammarly.”
With its data consolidated in the lakehouse, different areas of Grammarly’s business can now analyze data more thoroughly and effectively. For example, Grammarly’s marketing team uses advertising to attract new business. Using Databricks, the team can consolidate data from various sources to extrapolate a user’s lifetime value, compare it with customer acquisition costs and get rapid feedback on campaigns. Elsewhere, data captured from user interactions flow into a set of tables used by analysts for ad hoc analysis to inform and improve the user experience.
By consolidating data onto one unified platform, Grammarly has eliminated data silos. “The ability to bring all these capabilities, data processing and analysis under the same platform using Databricks is extremely valuable,” says Sergey Blanket, Head of Business Intelligence at Grammarly. “Doing everything from ETL and engineering to analytics and ML under the same umbrella removes barriers and makes it easy for everyone to work with the data and each other.”
To manage access control, enable end-to-end observability and monitor data quality, Grammarly relies on the data lineage capabilities within Unity Catalog. “Data lineage allows us to effectively monitor usage of our data and ensure it upholds the standards we set as a data platform team,” says Locklin. “Lineage is the last crucial piece for access control. It allows analysts to leverage data to do their jobs while adhering to all usage standards and access controls, even when recreating tables and data sets in another environment.”
Faster time to insight drives more intelligent business decisions
Using the Databricks Lakehouse Platform, Grammarly’s engineering teams now have a tailored, centralized platform and a consistent data source across the company, resulting in greater speed and efficiency and reduced costs. The lakehouse architecture has led to 110% faster querying, at 10% of the cost to ingest, than a data warehouse. Grammarly can now make its 5 billion daily events available for analytics in under 15 minutes rather than 4 hours, enabling low-latency data aggregation and query optimization. This allows the team to quickly receive feedback about new features being rolled out and understand if they are being adopted as expected. Ultimately, it helps them understand how groups of users engage with the UX, improving the experience and ensuring features and product releases bring the most value to users. “Everything my team does is focused on creating a rich, personalized experience that empowers people to communicate more effectively and achieve their potential,” says Locklin.
Moving to the lakehouse architecture also solved the challenge of access control over distributed file systems, while Unity Catalog enabled fine-grained, role-based access controls and real-time data lineage. “Unity Catalog gives us the ability to manage file permissions with more flexibility than a database would allow,” says Locklin. “It solved a problem my team couldn’t solve at scale. While using Databricks allows us to keep analytical data in-house, Unity Catalog helps us continue to uphold the highest standards of data protection by controlling access paradigms inside our data. That opens a whole new world of things that we can do.”
Ultimately, migrating to the Databricks Lakehouse Platform has helped Grammarly to foster a data-driven culture where employees get fast access to analytics without having to write complex queries, all while maintaining Grammarly’s enterprise-grade security practices. “Our team’s mission is to help Grammarly make better, faster business decisions,” adds Blanket. “My team would not be able to effectively execute on that mission if we did not have a platform like Databricks available to us.” Perhaps most critically, migrating off its rigid legacy infrastructure gives Grammarly the adaptability to do more while knowing the platform will evolve as its needs evolve. “Databricks has given us the flexibility to unleash our data without compromise,” says Locklin. “That flexibility has allowed us to speed up analytics to a pace we’ve never achieved before.”