Dask is a Python package that provides advanced parallelism for analytics, enabling performance at scale for the tools you love. People think it’s magic - drop it in and it scales. This will mostly work, but it will not scale well!
We would like to share what we’ve learned about using Dask to scale dataframe and computations, to avoid you making the same mistakes
In particular, this talk will cover:
how to explore your code's current performance
how to find performance bottlenecks
how configuring Dask can help improve your performance
contributing fixes/improvements back to Dask when you find something missing or incorrect