A Case Study in Rearchitecting an On-Premises Pipeline in the Cloud
- Data Engineering
- Public Sector
- Moscone South | Level 3 | 314
- 35 min
We were able to replicate the streaming portion of the pipeline in Azure with a combination of Logstash deployed via Kubernetes, Azure Eventhubs, Azure Functions, and Blob Storage. We then used batch jobs, written in Python with Pandas and handled by Prefect, to replicate the aggregations. Finally, we made the data available to our analysts in the cloud via Azure Databricks.
In this talk, I will discuss the practical design decisions behind these choices, as well as the technical challenges we encountered while recreating the pipeline. I will also point out the many lessons we learned along the way to successfully migrating the pipeline, and how we were able to apply these lessons to similar projects that came up later.