Dask is a popular library for scalable computing. However, using Dask effectively for large workloads, in particular when running computations in the cloud, involves additional nuances and attention. In this talk, we'll walk through our attempts to scale up data engineering workloads to process a petabyte of data in the cloud, describe the pain points we encountered, and how we got around them.
Dask is a popular library for scalable computing. However, using Dask effectively for large workloads, in particular when running computations in the cloud, involves additional nuances and attention. In this talk, we'll walk through our attempts to scale up data engineering workloads to process a petabyte of data in the cloud, describe the pain points we encountered, and how we got around them.