Real-Time Data Warehousing: A Journey from Batch to Streaming with Faust
Short Talk (INTERMEDIATE level)
Room B
Faust is a Python library for building real-time data processing applications with stream-based architectures. Discover how we used it to transform one of our data processing workflows to integrate real-time events into the CERN Business Computing group's data warehouse.
In this short talk, we will see how Faust was used to build an application capable of handling streaming events. We will explore Faust’s components such as pages and agents, and show the ease of creating distributed pipelines with the library. Finally, we will walk through the architecture, from the data source to the final storage database.
In this short talk, we will see how Faust was used to build an application capable of handling streaming events. We will explore Faust’s components such as pages and agents, and show the ease of creating distributed pipelines with the library. Finally, we will walk through the architecture, from the data source to the final storage database.
Manon Charvet
CERN
Manon began working at CERN as a Data Engineer in January 2022. With six years of experience, she has contributed to several projects focused on creating data platforms, including processing security logs in near real-time and administering a data warehouse to support analytics. Her expertise lies in using distributed systems such as Kafka and the Elastic Stack and developing data pipelines. She also enjoys working on architecture and systems design, web development, and acquiring diverse knowledge in data science.