SESSION

Bring Text-to-SQL to BI Production in Large Enterprise

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Science and Machine Learning
INDUSTRYEnterprise Technology, Media and Entertainment, Professional Services
TECHNOLOGIESGenAI/LLMs, MLFlow, SQL Analytics / BI / Visualizations
SKILL LEVELAdvanced
DURATION40 min

Text-to-SQL is available in many frameworks and tools with fine-tuning and prompt engineering under the hood. This talk attempts to bridge the gap between Text-to-SQL tools and querying with natural language for data engineers/scientists and non-technical users in large enterprises like Tencent. After a comprehensive quantitative comparison, we chose WizardCoder-34B as the foundational model for fine-tuning. We developed a training instruction set with two primary goals: 1. query pattern and syntax coverage and 2. representing how business context is referenced, especially for multi-table queries.

 

We also pay special attention to optimizing the performance and cost of the inference process. Our final model is evaluated with GPT4 using the BIRD benchmark and a test set of complex real-life queries at Tencent. While the specialized model is not as capable of dealing with ambiguous expressions of query intentions, it performs better (accurately) than GPT4 when requiring table joins.

SESSION SPEAKERS

Kun Cheng

/Product Manager
Tencent

Hehuan Liu

/Algorithm and Data Scientist
Tencent