20 May 16:00 – 20 May 17:30 in Tutorials / Workshops 1

Scale Machine Learning Code with Dask

Andrew Mshar, Ryan Soley

Audience level:


Do you use the Scikit-learn library to build machine learning models? In this tutorial, we'll discuss how to avoid the traps that lead to hard to maintain code while implementing customizations to these algorithms. We will cover how building your own estimators can lead to easily scaling your model training with additional libraries like Dask and Dask-ml with much less code than you might think!


In this workshop, we will introduce attendees to scaling Machine Learning code with dask-ml by walking through a few key features of the library. Attendees will get a chance to test and explore what they have learned by completing exercises throughout the tutorial. We will then cover how to go from using dask-ml out of the box, to developing and integrating their own custom estimators into a pipeline using software development patterns that will ensure their code is scalable and maintainable.

Quick Intro to Python ML and Dask ML notebook from dask-tutorial More examples adapted for a tutorial format - Incrementally Train Large Datasets - Scale Scikit-Learn for Small Data Problems - Hyperparameter optimization with Dask - Scale XGBoost - Score and Predict Large Datasets

Patterns for adding custom functionality What can go wrong and why is it important to get this right? A look under the hood at dask-ml Look at dask-ml contrib docs Extend dask-ml using a real world example