Go offline with the Player FM app!
Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille
Manage episode 501480374 series 2053958
Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.
In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.
Key Takeaways:
00:00 Introduction.
02:13 Overview of the company’s operations and global presence.
04:00 The tech stack and structure of the data engineering team.
04:24 Running nearly 2,000 DAGs in production using Airflow.
05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.
07:05 Details on the Kubernetes-based Airflow setup using Helm charts.
09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.
14:11 Making every team member Airflow-literate through local installation.
17:56 Using custom libraries and plugins to extend Airflow functionality.
Resources Mentioned:
https://www.linkedin.com/in/scroc/
Numberly | LinkedIn
https://www.linkedin.com/company/numberly/
Numberly | Website
https://numberly.com/
https://airflow.apache.org/
https://grafana.com/
https://kafka.apache.org/
https://airflow.apache.org/docs/helm-chart/stable/index.html
https://kubernetes.io/
https://about.gitlab.com/
KubernetesPodOperator – Airflow
https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html
https://astronomer.io/beyond/dataflowcast
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
68 episodes
Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Manage episode 501480374 series 2053958
Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.
In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.
Key Takeaways:
00:00 Introduction.
02:13 Overview of the company’s operations and global presence.
04:00 The tech stack and structure of the data engineering team.
04:24 Running nearly 2,000 DAGs in production using Airflow.
05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.
07:05 Details on the Kubernetes-based Airflow setup using Helm charts.
09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.
14:11 Making every team member Airflow-literate through local installation.
17:56 Using custom libraries and plugins to extend Airflow functionality.
Resources Mentioned:
https://www.linkedin.com/in/scroc/
Numberly | LinkedIn
https://www.linkedin.com/company/numberly/
Numberly | Website
https://numberly.com/
https://airflow.apache.org/
https://grafana.com/
https://kafka.apache.org/
https://airflow.apache.org/docs/helm-chart/stable/index.html
https://kubernetes.io/
https://about.gitlab.com/
KubernetesPodOperator – Airflow
https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html
https://astronomer.io/beyond/dataflowcast
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
68 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.