Skip to main content

An integrated platform for rapidly evolving e-commerce data management


Improvement in performance compared to the existing Hadoop MapReduce code


Resource usage of the existing Hadoop code

PLATFORM USE CASE: Delta Lake,Databricks SQL
CLOUD: Azure

​​“Gmarket is a platform that enables end-to-end data science, and Databricks is accompanying Gmarket on its challenge.”

— Daehong Seo, Platform Technology Manager, Gmarket

The fast-changing e-commerce landscape and data requirements

E-commerce platforms around the world are seeking data solutions utilizing big data and artificial intelligence technologies to keep pace with the rapidly changing e-commerce landscape and the rapidly increasing demand for digital shopping. In line with this trend, Gmarket, as a leader in the e-commerce field with millions of customers in various markets at home and abroad, provides optimized product recommendations based on data analytics, personalized services, and convenient purchasing routes to increase customer satisfaction. Gmarket sought to lay the foundation for growth with the mission of analyzing the large amount of data generated in the open market with speed and accuracy to realize “every transaction imaginable.” However, the legacy Hadoop on-premises system showed many shortcomings in meeting the data requirements, which were becoming more and more massive over time. Thus, Gmarket recognized the need for a next-generation big data platform and implemented Databricks to solve the problem of managing large amounts of data. As a result, Gmarket has experienced cost reduction and increased efficiency through the integrated lakehouse.

The search for a flexible, safe and efficient system

Gmarket operates the largest e-commerce platform in the domestic market, supporting online transactions in various categories in domestic and international consumer markets. As an open market powerhouse in the e-commerce field, it focuses on building connections between sellers and buyers, and provides the best business environment for sellers and differentiated customer experiences for buyers, by utilizing data-based technology as the core.

However, Gmarket experienced many difficulties in meeting the fast-changing e-commerce business environment and big data requirements with its existing Hadoop on-premises system. The use of Hadoop systems was becoming a burdensome legacy, resulting in the inability to respond adequately to the introduction of new technologies, which gradually led to the increase in the share of technical debt in overall productivity. Daehong Seo, Platform Technology Manager at Gmarket, described the situation before the implementation of Databricks as “a situation where technical debt was creating debt.”

Before using Databricks, managing large amounts of data was the biggest challenge for the Gmarket Technology Team. Legacy systems require a lot of time and money for adding equipment and building infrastructures to meet the ever-increasing demand for large-volume data analysis and storage. In addition, since the Hadoop ecosystem is composed of several dedicated solutions, creating a data pipeline required a significant learning curve. Therefore, when new employees were recruited or existing employees were replaced, a lot of time and resources were consumed in combining several existing solutions to build the pipeline. This also increased system complexity and maintenance costs. In turn, it hindered the company’s efforts to reduce time-to-market, which is the key success factor for e-commerce businesses.

Manager Daehong Seo emphasized, “We need a solution that is more flexible, safer and more efficient than Hadoop.” Gmarket realized the urgency to implement an integrated platform that could respond to various requests such as flexible infrastructures that enable speedy expansion, processing and storage resources that can be expanded independently, and rapid general distributed processing. Gmarket recognized the changes in technological requirements that fit the times, and eventually decided to convert the legacy system, seeking ways to introduce a data cloud system that will solve this issue.

Implementation of Databricks Data Intelligence Platform, the next-generation big data platform

After identifying the necessary technologies for the next-generation big data platform, Gmarket named the ideal platform the “cloud-native data lakehouse” and started looking for a platform that would correspond. The conditions that Gmarket considered most important are as follows: First, it should be capable of taking full advantage of the strengths of the cloud, with the rapid expansion through automatic scale-in/out. Also, it should be capable of independently scalable processing and storage through a separate architecture, and of using an in-memory-based distributed data pipeline processing engine to realize a faster processing speed compared to Hadoop. Furthermore, it should have the ability to process data on one platform through a data lakehouse that combines a data lake and a data warehouse.

Gmarket chose Databricks Data Intelligence Platform as the next-generation big data platform that meets these various conditions. Since the creators of Apache Spark™, a distributed processing open source, participated in the founding of Databricks, Gmarket greatly appreciated that Databricks had a great influence on the open source ecosystem with its strong performance. In addition, the excellent scalability and elasticity of Databricks were also considered. The scale-up/down operation based on acceptance allows selection of appropriate hardware, and the automatic scale-in/out function, which automatically adjusts resources according to load, were highly appreciated. Furthermore, a high value was placed on the efficiency of expanding and reducing each resource as needed when the cluster is not in use. The outstanding convenience of Databricks was also considered. Unlike existing systems, Databricks Data Intelligence Platform has the advantage of being able to use the notebook source directly when performing a job. Additionally, the advantage of convenient development on a single integrated platform was also cited as an important factor in the decision to adopt Databricks.

Increased performance and cost-effectiveness of data pipelines with Databricks

Gmarket has achieved innovative results through the implementation of Databricks. The previous MapReduce code developed by Hadoop has been redeveloped using Databricks, resulting in an improvement in performance by 4 times and a groundbreaking reduction in resource usage to one-fourth. This has significantly improved the performance and cost effectiveness of the data pipeline.

Gmarket had experienced difficulties in processing stream data due to the complex data pipeline requirements in the existing legacy system, but it has been able to develop and operate the data pipeline simply by using the integrated platform of Databricks. The decision to adopt Databricks has improved project lead time by more than 50%, leading to a great improvement in development productivity, speed and convenience.

Now, Gmarket can perform comprehensive data analytics by integrating data in one place through Databricks. Comprehensive insights were obtained through this process, which was very helpful in maximizing the value of data. Furthermore, Databricks provided the ability to separate and use resources in a cloud-native environment, enabling data analytics to be performed smoothly even during peak times. This has increased the flexibility of resource management, enabled efficient data processing and analysis, and contributed significantly to improving business performance.

In particular, Databricks’ job orchestration function and the ability to quickly deploy and manage pipelines through a laptop has increased the efficiency of pipeline development and management. This has significantly helped achieve high productivity in data processing and analysis and in realizing business goals. At the same time, the lead time required for infrastructure preparation and platform learning has been reduced, and the development speed has been increased while work efficiency has improved through integrated development based on the database.

Realization of data science

Gmarket has increased efficiency by utilizing Databricks’ fast and efficient expansion and processing power, and it has also succeeded in reducing time and cost by a significant amount. The integrated platform has improved collaboration not only across data teams, but also across various departments, resulting in innovative business outcomes. By doing so, Gmarket is overcoming the difficulties experienced with the existing system and achieving more advanced data maturity with the help of Databricks. Daehong Seo, the manager of the Gmarket Platform Technology Team said, “Gmarket is a platform capable of end-to-end data science, and Databricks is accompanying Gmarket on its challenge,” as he emphasized the importance of the collaboration with Databricks in Gmarket’s future data lakehouse roadmap.