Bring Text-to-SQL to BI Production in Large Enterprise
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Data Science and Machine Learning |
INDUSTRY | Enterprise Technology, Media and Entertainment, Professional Services |
TECHNOLOGIES | GenAI/LLMs, MLFlow, SQL Analytics / BI / Visualizations |
SKILL LEVEL | Advanced |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
Text-to-SQL is available in many frameworks and tools with fine-tuning and prompt engineering under the hood. This talk attempts to bridge the gap between Text-to-SQL tools and querying with natural language for data engineers/scientists and non-technical users in large enterprises like Tencent. After a comprehensive quantitative comparison, we chose WizardCoder-34B as the foundational model for fine-tuning. We developed a training instruction set with two primary goals: 1. query pattern and syntax coverage and 2. representing how business context is referenced, especially for multi-table queries.
We also pay special attention to optimizing the performance and cost of the inference process. Our final model is evaluated with GPT4 using the BIRD benchmark and a test set of complex real-life queries at Tencent. While the specialized model is not as capable of dealing with ambiguous expressions of query intentions, it performs better (accurately) than GPT4 when requiring table joins.
SESSION SPEAKERS
Kun Cheng
/Product Manager
Tencent
Hehuan Liu
/Applied Scientist
Tencent