21 May 15:00 – 21 May 15:30 in Talks

Dask DataFrame groupby. Why it can fail and how to compensate.

Hugo Shi

Audience level:


Dask DataFrame groupby operations are very common and very powerful. However due to the distributed nature of Dask DataFrames, they can fail in unexpected ways. This talk covers mitigation strategies for these problems, including using set_index to optimize data layout, and using split_out and split_every parameters to optimize computation.