Time Zone: UTC

21 May 17:30 – 21 May 18:00 in Talks

Capital One uses Dask!

Dan Kerrigan

Audience level:
Novice

Description

Capital One uses Dask and its ecosystem to great effect and gains more internal users regularly. This survey talk outlines who is using Dask, why they use Dask, how they deploy Dask, and the challenges they encounter. We will also consider the future of Dask and its usage within Capital One.

Abstract

Capital One processes tremendous volumes of data every day which drive everything from credit decisions to fraud detection to call transcription. By enabling distributed computing in the PyData ecosystem, Dask allows us handle more data in less time, more efficiently, resulting in more experiments, faster and better decisions, and expedient product delivery to our customers. The rich ecosystem of libraries, including Dask-ML, RAPIDS, XGBoost, internal libraries, and more make this possible. Capital One is Cloud First and we deploy Dask to a variety of cloud-based platforms with many different user experiences. We also encounter challenges including algorithm availability, infrastructure, distributed computing, and more. Efficient data processing is becoming a higher priority every day and tools like Dask are crucial for future success. During this talk, we'll explore these statements and answer questions for those curious about how Capital One uses Dask.