Time Zone: UTC

21 May 14:00 – 21 May 16:00 in Tutorials / Workshops 2

High-Performance Data Access for Dask

Martin Durant, Joris Van den Bossche, Richard Zamora

Audience level:
Intermediate

Description

Dask contains many functions for data IO for arrays and dataframes. In this workshop, we will discuss the current status of various data format integrations for Dask and more generally about the parallel/cloud-friendly data storage landscape.

Abstract

Detailed agenda

Tabular data

Array data

Catalogs

Topics

Following talks, there will be extensive time for discussions, in particular around the changing APIs of the target IO engines such as pyarrow.