Keep the Conversation Going

On Slack

Join the Dask Community Slack Workspace

During the event, the Dask Slack will be a forum for announcements, discussions, and sharing of resources. Following the event, the Dask Slack will remain open to the community as a forum for continued communication with the Dask developers and fellow users.

On Gather

Following each keynote plenary session,
join the speaker in for a fireside chat.

Gather is a video chat platform where you control your avatar on a virtual map. As you get close to other avatars, your videos will pop up and you will be able to chat.

Schedule

Want to add the schedule to your Google Calender?


Day 1
Day 2
Day 3

Interactive visualization and near real-time analysis on out-of-core satellite images

Draga Doncila Pop  |  2021/05/19 01:00:00 UTC - 2021/05/19 01:30:00

We demonstrate a practical case study: loading and interacting with hundreds of Sentinel satellite images and the results of their analysis in near real-time. The project was part of research on interactive visualization of out-of-core images, in the context of the Monash VegMap land cover study.

Zoom: https://zoom.us/j/95688944722?pwd=eUNVMlRVTnZNVzk2N0F3NjBPREkwQT09

Video Recording: https://zoom.us/rec/share/p2Wdhl7COnIBJh3qaBfKtvb7h8XCnQuZHlenYe8YCAFeLNy_dQrIUMC550cC_Etr._DVaRX39-Yiu4VXM?startTime=1621386157000

View Details


Dask Down Under: Dask DevOps for Remote Sensing

Tisham Dhar  |  2021/05/19 01:30:00 UTC - 2021/05/19 02:00:00

This talk covers our DevOps journey with Dask at Geoscience Australia, the architecture of our processing cluster, as well as some of the very expensive lessons we learnt along the way.

Zoom: https://zoom.us/j/95688944722?pwd=eUNVMlRVTnZNVzk2N0F3NjBPREkwQT09

Video Recording: https://zoom.us/rec/share/p2Wdhl7COnIBJh3qaBfKtvb7h8XCnQuZHlenYe8YCAFeLNy_dQrIUMC550cC_Etr._DVaRX39-Yiu4VXM?startTime=1621387868000

View Details


Dask Down Under: Patterns for large scale temporal processing of geo-spatial data using Dask

Kirill Kouzoubov  |  2021/05/19 02:00:00 UTC - 2021/05/19 02:30:00

We describe an approach for efficiently embedding non-Dask algorithms into Dask processing pipeline. By constructing large contiguous memory array incrementally from a Dask graph we were able to achieve significant peak memory reductions. We have used this approach to generate cloud-free Sentinel-2 Geometric Median and Median Absolute Deviations mosaics over Africa (10m res).

Zoom: https://zoom.us/j/95688944722?pwd=eUNVMlRVTnZNVzk2N0F3NjBPREkwQT09

Video Recording: https://zoom.us/rec/share/p2Wdhl7COnIBJh3qaBfKtvb7h8XCnQuZHlenYe8YCAFeLNy_dQrIUMC550cC_Etr._DVaRX39-Yiu4VXM?startTime=1621389492000

View Details


Dask Down Under: Panel Discussion

Genevieve Buckley, Ben Leighton, Draga Doncila Pop, Hugo Bowne-Anderson, Tisham Dhar  |  2021/05/19 03:30:00 UTC - 2021/05/19 04:30:00

We hope this panel discussion will start a conversation about using Dask in Australia, how we build our community, contribute and stay in touch with the rest of the world.

Zoom: https://zoom.us/j/95688944722?pwd=eUNVMlRVTnZNVzk2N0F3NjBPREkwQT09

Video Recording: https://zoom.us/rec/share/p2Wdhl7COnIBJh3qaBfKtvb7h8XCnQuZHlenYe8YCAFeLNy_dQrIUMC550cC_Etr._DVaRX39-Yiu4VXM?startTime=1621395255000

View Details


Dask Down Under: Introduction to xarray and Dask (Tutorial)

Nick Mortimer  |  2021/05/19 05:30:00 UTC - 2021/05/19 07:30:00

Dask down under is a chance for everyone in Oceania to forge links and build community here in our backyard. Dask down under we feature talks, tutorials and panel discussions on using Dask to accelerate research. All levels from beginner to expert are encouraged to attend.

Zoom: https://zoom.us/j/95688944722?pwd=eUNVMlRVTnZNVzk2N0F3NjBPREkwQT09

Video Recording: https://zoom.us/rec/share/3AnbFtOiRIARD3A6MdM1F0PrMvpASQxJQNWbt6pppYpRrx33EFGQYy-wLQWVQZ-H.bl2gwN915ju9aidF?startTime=1621402478000

View Details


Keynote: Clusters of Clusters: Using Dask Distributed to Scale Enterprise Machine Learning Systems

Grant Gelven  |  2021/05/19 13:00:00 UTC - 2021/05/19 14:00:00

The past decade has shown there is a steep learning curve for organizations trying to scale and productionalize ML systems quickly. At Walmart, we have developed several principles over the years that allow us to address this challenge. In this talk, I will discuss these principles and the open-source tools that enable us today.

Zoom: https://zoom.us/j/98736677197?pwd=ZkFmY3pORm1YbFBPaWdmR3pNYUVHZz09

Video Recording: https://zoom.us/rec/share/PfWZiEfZCpFLsdIyhSj5KiDNn4N40e8If8dEBgYOKlohtTH0RvIrAKPa1Qy7-KE.iF9wnXU-mOUL8r7z?startTime=1621429232000

View Details


Hacking Dask: Diving Into Dask’s Internals

Julia Signell, James Bourbeau  |  2021/05/19 14:00:00 UTC - 2021/05/19 17:00:00

This tutorial is intended for working and aspiring data professionals. A working knowledge of the basics of Dask and/or distributed computing is required, though knowledge of Dask’s internals are not. Tutorial attendees should walk away with a deeper understanding of Dask’s internals, an introduction to more advanced features, and ideas of how they can apply these features effectively to their own.

Zoom: https://zoom.us/j/96686816162?pwd=cmhzMXQ2TDhsWDFrdjlVMjB6VE11QT09

Video Recording: https://zoom.us/rec/share/WDj9pYZ_LQKNfM1gQNEfSbb5wY60UFJE1ffIGLXidzmqGR5z3QIJMo4ieem3HBO3.IPJxdRQDU_9GmQuZ?startTime=1621432838000

View Details


Dask SQL Query Engines

Nils Braun, Han Wang, Mike Klaczynski, Miles Adkins, Tom Drabas  |  2021/05/19 14:00:00 UTC - 2021/05/19 16:00:00

In this workshop, we will discuss the different ways to run SQL queries on and with Dask using CPUs and GPUs. Being able to write SQL commands to query and transform the data does allow users to integrate the vast Dask and RAPIDS ecosystem into their BI workflows. We will discuss the current state of PyData SQL query engines, SQL integrations and together find out where to head next.

Zoom: https://zoom.us/j/94693106912?pwd=NlpFbVBISjlIRytpL3YySDZGSUdjZz09

Video Recording: https://zoom.us/rec/share/nNBf5l_BihmkUgMzO4xJ8up6PN4fpzmWhWs3OzBJXvEuAVMIcqYRZAHC7igQEZz9.bNSav9g8_S2j0N5I?startTime=1621432869000

View Details


Challenges Designing Interactive Analysis Facilities with Dask

Oksana Shadura  |  2021/05/19 14:00:00 UTC - 2021/05/19 14:30:00

We share challenges encountered deploying Dask with JupyterHub for HPC for the needs of high-energy physics community at University of Nebraska-Lincoln. We describe how we combine multiple ways of launching resources, allowing Dask to submit workers directly to a batch system and investigate improvements in network connectivity that allow scaling to large numbers of simultaneous facility users.

Zoom: https://zoom.us/j/96201769159?pwd=c0pTckQrT3lWU2lTNlNzRUs3U3Y0dz09

Video Recording: https://zoom.us/rec/share/g9S2QIeFd7gZ5bF7vqUZkxp6ciyzO6b9yf5hOT7PCgEmBO_fGWF-qm7ZYd5zPB09.txV07riikza9E7Tq?startTime=1621432905000

View Details


(Memorable) Lessons of running an always-on Dask Cluster

Tatiana Statsenko, Alexander Hirner  |  2021/05/19 14:30:00 UTC - 2021/05/19 15:00:00

In this talk, we describe the lessons learned from running modern computer-vision tasks in an always available Dask cluster.

Zoom: https://zoom.us/j/96201769159?pwd=c0pTckQrT3lWU2lTNlNzRUs3U3Y0dz09

Video Recording: https://zoom.us/rec/share/g9S2QIeFd7gZ5bF7vqUZkxp6ciyzO6b9yf5hOT7PCgEmBO_fGWF-qm7ZYd5zPB09.txV07riikza9E7Tq?startTime=1621434754000

View Details


Transforming Terabytes of healthcare data with Dask and Kubeflow pipeline

Michael Sebbah, Anthony Dubois, KRIEF David, Yoann Janvier  |  2021/05/19 15:00:00 UTC - 2021/05/19 15:30:00

As part of the data science team, at Ipsen, we study effect of medical products on patients. Computing on Terabytes of data is a challenge, we will explain why we have opted for Dask along with Kubeflow for that. We will share issues we faced and how we fixed them. Come and spend some time with us and you will delve into a real Dask use case.

Zoom: https://zoom.us/j/96201769159?pwd=c0pTckQrT3lWU2lTNlNzRUs3U3Y0dz09

Video Recording: https://zoom.us/rec/share/g9S2QIeFd7gZ5bF7vqUZkxp6ciyzO6b9yf5hOT7PCgEmBO_fGWF-qm7ZYd5zPB09.txV07riikza9E7Tq?startTime=1621436384000

View Details


Advanced data transformation pipeline for AI portfolio management

Michael Sebbah, Noureddine Boumlaik, Philippe Cotte  |  2021/05/19 15:30:00 UTC - 2021/05/19 16:00:00

AI algorithms for financial portfolio management require distributed computation based on complex workflows. We are happy to share with you how we transform many financial data based on Dask to reach this goal. Dask distributed on Kubernetes is a simple but powerful solution for that. Furthermore we will present a thin wrapper on top of Dask futures to allow lazy and partial computations.

Zoom: https://zoom.us/j/96201769159?pwd=c0pTckQrT3lWU2lTNlNzRUs3U3Y0dz09

Video Recording: https://zoom.us/rec/share/g9S2QIeFd7gZ5bF7vqUZkxp6ciyzO6b9yf5hOT7PCgEmBO_fGWF-qm7ZYd5zPB09.txV07riikza9E7Tq?startTime=1621438215000

View Details


Deploying Dask

Jacob Tomlinson, Adam Lewis, Amit Kumar, Anderson Banihirwe, Brad Miro, Nanthini Balasubramanian  |  2021/05/19 16:00:00 UTC - 2021/05/19 18:00:00

This workshop will cover the most common methods for deploying Dask today. Starting with an overview of all the moving pieces within a Dask cluster (client, cluster, scheduler, workers), we will then talk through various platforms and the tools used to deploy onto them along with benefits, common challenges, and pitfalls.

Zoom: https://zoom.us/j/95024741930?pwd=Z3RTSUZRTmVxY0dWNGxROFU5Z2xaZz09

Video Recording: https://zoom.us/rec/share/leSavBFcTI304aH3FighyDnhyUz9qda_-sQurRmUXAeciJi-qXYy49vvRsOJzHg.wsSFSp9pSq7lU70w?startTime=1621440209000

View Details


An Introduction to Memory Spilling

Mads R. B. Kristensen  |  2021/05/19 16:00:00 UTC - 2021/05/19 16:30:00

Memory spilling is an important feature that makes it possible to run Dask applications that would otherwise run out of memory. When low on memory, Dask moves data from GPU memory to main memory and/or data from main memory to disk automatically. In this talk, we will walk through how spilling works in general, its shortcomings, and introduce a new Dask-CUDA approach to overcome these shortcomings.

Zoom: https://zoom.us/j/96201769159?pwd=c0pTckQrT3lWU2lTNlNzRUs3U3Y0dz09

Video Recording: https://www.youtube.com/watch?v=mHWk7y2p-NM

View Details


Logging for scientific computing: debugging, performance, Dask

Itamar Turner-Trauring  |  2021/05/19 17:00:00 UTC - 2021/05/19 17:30:00

When it takes hours or days to run your computation, it can take a long time before you notice something has gone wrong. This means your feedback loop for fixes can be very slow. Learn how logging, and in particular the causal tracing library Eliot, can help debug inconsistent calculations and spot input-specific performance problems in your Dask application.

Zoom: https://zoom.us/j/96201769159?pwd=c0pTckQrT3lWU2lTNlNzRUs3U3Y0dz09

Video Recording: https://zoom.us/rec/share/z60xsuVG0Dy6zWqeXlNK4SQg3rPqUpSskZdeE_Q1QV_1KASVR-D0Ns8NEon-qXes.eGa8sSblNYC_U_KD?startTime=1621443637000

View Details


Pangeo

Tom Augspurger, Anderson Banihirwe, Paige E. Martin  |  2021/05/19 18:00:00 UTC - 2021/05/19 20:00:00

A group of geospatial experts from the pangeo community share their experiences using Dask, xarray and other tools. They’ll share their best practices and the pain points they’ve run into.

Zoom: https://zoom.us/j/92413783445?pwd=NThUMk51Mm0rVUhGNFVFS0lMbGJ1QT09

Video Recording: https://zoom.us/rec/share/L0L2G6jimgHnx1dYERXuM4mtBthwPLRRRTjskAAw45b9h994fXuGdGlJ9x4KXnK-.TqZ87pGSZhXtx7Jv?startTime=1621447260000

View Details


Using Dask for real time feedback in a data wrangling web application

Luis Aguirre, Argenis Leon  |  2021/05/19 18:30:00 UTC - 2021/05/19 19:00:00

In this talk, we will learn how Dask helps Bumblebee, an open-source, data wrangling web app, to provide the user with data insight and data transformation feedback in real-time using Dask sync and async task handling. Also, we will talk about our experience with Apache Spark, the shortcomings we found when creating Bumblebee, and how Dask helps to achieve the user experience we envision.

Zoom: https://zoom.us/j/93636527572?pwd=b0k4M0l4TllJZW1nd1dTekcwUXcyQT09

Video Recording: https://zoom.us/rec/share/U77aw65fF4JMj24LmJrE62FLAb_MqfiJF6hByeeROGgoovsqLhUGOg-plOFxH29_.3RgWOhxwsx9WTI6K?startTime=1621449036000

View Details


Analyzing particle physics data in the scientific python ecosystem

Nicholas Smith  |  2021/05/19 19:00:00 UTC - 2021/05/19 19:30:00

After a brief introduction to particle physics datasets, I will discuss how we adapted the scientific python ecosystem to efficiently and conveniently process these data, from ingestion, through manipulation with novel array programming techniques, to reduction using histograms. Then, I will describe how we currently scale our processing with dask and how we would like to improve our solution.

Zoom: https://zoom.us/j/93636527572?pwd=b0k4M0l4TllJZW1nd1dTekcwUXcyQT09

Video Recording: https://zoom.us/rec/share/U77aw65fF4JMj24LmJrE62FLAb_MqfiJF6hByeeROGgoovsqLhUGOg-plOFxH29_.3RgWOhxwsx9WTI6K?startTime=1621450694000

View Details


NVTabular: Building a Dask-based Library for Recommender-System Data Pipelines

Richard Zamora  |  2021/05/19 19:30:00 UTC - 2021/05/19 20:00:00

NVTabular is a recommender-system focused feature-engineering and preprocessing library for tabular data. This talk will describe how NVTabular was built entirely on Dask-Dataframe to both simplify and accelerate model-training pipelines. The primary goals are to (1) present a successful example of Dask integration and to (2) communicate important lessons learned.

Zoom: https://zoom.us/j/93636527572?pwd=b0k4M0l4TllJZW1nd1dTekcwUXcyQT09

Video Recording: https://zoom.us/rec/share/U77aw65fF4JMj24LmJrE62FLAb_MqfiJF6hByeeROGgoovsqLhUGOg-plOFxH29_.3RgWOhxwsx9WTI6K?startTime=1621452552000

View Details


Keynote: Reimagining Science

Chelle Gentemann  |  2021/05/19 20:00:00 UTC - 2021/05/19 21:00:00

Everything changed this year, including how we work together. In the last decade, the explosion of Python open-source tools has fundamentally changed science. Emerging cloud computing collaborative workspaces are opening up opportunities to realize a new vision of how science advances.

Zoom: https://zoom.us/j/97626591336?pwd=cW83b0c1S2lXdWNnT0VLWVNTMDVUQT09

Video Recording: https://zoom.us/rec/share/HSh9CXoGW4_LZzA9lp7d53D_F8otdO6VHycCLm_hBVv8QvRrz29dMtYuyFhO4o1S.Ml-80sFqEs63RvrH?startTime=1621454432000

View Details


Hyperparameter Optimization using Dask with Oríon

Xavier Bouthillier  |  2021/05/19 21:00:00 UTC - 2021/05/19 21:30:00

Oríon is a framework for asynchronous hyperparameter optimization (HPO) built around two main principles: 1) HPO should be effortless to execute in common machine learning workflows 2) New HPO algorithms should be readily available for practitioners. In this talk we will present Oríon and its core design principles, followed by an integration example with Dask demonstrating its simplicity of use.

Zoom: https://zoom.us/j/93212239346?pwd=RWpCRmVFRDJ0MGZpeHZub2FtTGliZz09

Video Recording: https://zoom.us/rec/share/88DPZedHJaAUbhYyi1KbhX0Xp8l6CE4tvyRwzuI3xEY5gCqVm58zaeEH8iAPwLcC.uaAvEOfZVA3a4-qj?startTime=1621458039000

View Details


Metagraph: An Adventure in Types, Heterogenous Hardware, and Compilers in Dask

Stanley Seibert  |  2021/05/19 21:30:00 UTC - 2021/05/19 22:00:00

Metagraph is an experimental library designed to glue together a fragmented world of graph libraries. However, Metagraph extends Dask in ways that have broader potential. This talk will explore the components of Metagraph: a multiple dispatch system, a data translation system, and a plugin-based DAG compiler. These ideas will motivate a wishlist of possible enhancements to the Dask core.

Zoom: https://zoom.us/j/93212239346?pwd=RWpCRmVFRDJ0MGZpeHZub2FtTGliZz09

Video Recording: https://zoom.us/rec/share/88DPZedHJaAUbhYyi1KbhX0Xp8l6CE4tvyRwzuI3xEY5gCqVm58zaeEH8iAPwLcC.uaAvEOfZVA3a4-qj?startTime=1621459800000

View Details


Dask for Everyone with Coiled

Hugo Bowne-Anderson  |  2021/05/19 22:00:00 UTC - 2021/05/19 22:30:00

Dask has transformed what is possible with Python and Data Science.

However, while Dask has solved many of the technical challenges of parallelism, there remain challenges for institutional adoption. How does this get deployed? Is it supported? Is it secure?

This talk describes Coiled, a company based around Dask, and how it strives to enable the use of Dask by everyone, everywhere.

Zoom: https://zoom.us/j/93212239346?pwd=RWpCRmVFRDJ0MGZpeHZub2FtTGliZz09

Video Recording: https://zoom.us/rec/share/88DPZedHJaAUbhYyi1KbhX0Xp8l6CE4tvyRwzuI3xEY5gCqVm58zaeEH8iAPwLcC.uaAvEOfZVA3a4-qj?startTime=1621461720000

View Details


Record linkage on a SLURM cluster with Dask

Sultan Orazbayev  |  2021/05/19 22:30:00 UTC - 2021/05/19 23:00:00

An overview of experience with dask in HPC environment (SLURM) for an academic project.

Zoom: https://zoom.us/j/93212239346?pwd=RWpCRmVFRDJ0MGZpeHZub2FtTGliZz09

Video Recording: https://zoom.us/rec/share/88DPZedHJaAUbhYyi1KbhX0Xp8l6CE4tvyRwzuI3xEY5gCqVm58zaeEH8iAPwLcC.uaAvEOfZVA3a4-qj?startTime=1621463497000

View Details


Characterizing vegetation patch proximity and size across Australia with skimage and dask map overlay

Ben Leighton, Kimberley Opie  |  2021/05/20 01:00:00 UTC - 2021/05/20 01:30:00

Image processing tasks are pixel independent and are embarrassingly parallel, in our pipeline labelling continuous patches requires calculations across neighbourhoods of pixels and means parallelization is more complex. Buffered tiles solve parallelization by allowing non-parallel pixel operations within tiles to be run simultaneously across many tiles and so provide tile level parallelization

Zoom: https://zoom.us/j/97179558778?pwd=dDF1VmRta1plbC8vWi9Oc1o4QTAxdz09

Video Recording: https://zoom.us/rec/share/5TA3atKA9GaUEvI4P8iGRooXZd2WasnxM5tgqpcoCaSUNUsKdEgHhv9lFAHLPFeP.mXYHn0JQIlo4Wsis?startTime=1621472512000

View Details


Making the most of your schedule: From HPC to Local Cluster

Nick Mortimer, Paul Branson  |  2021/05/20 01:30:00 UTC - 2021/05/20 02:00:00

Dask offers a level of simplicity that makes distributed computing accessible to a broad scientific community. Because of this simplicity users often overlook some key features of the various Dask schedulers. We will present an outline of the types of schedulers and their uses. We will then focus on AARNet’s SWAN environment, which provides a user with a single large node with 36 cores with 256 Gb

Zoom: https://zoom.us/j/97179558778?pwd=dDF1VmRta1plbC8vWi9Oc1o4QTAxdz09

Video Recording: https://zoom.us/rec/share/5TA3atKA9GaUEvI4P8iGRooXZd2WasnxM5tgqpcoCaSUNUsKdEgHhv9lFAHLPFeP.mXYHn0JQIlo4Wsis?startTime=1621474466000

View Details


Tutorial: Marine Heatwaves code

Nick Mortimer  |  2021/05/20 05:30:00 UTC - 2021/05/20 06:00:00

Dask down under is a chance for everyone in Oceania to forge links and build community here in our backyard. Dask down under we feature talks, tutorials and panel discussions on using Dask to accelerate research. All levels from beginner to expert are encouraged to attend.

Zoom: https://zoom.us/j/97179558778?pwd=dDF1VmRta1plbC8vWi9Oc1o4QTAxdz09

Video Recording: https://zoom.us/rec/share/h17UHEn1XjiTgHYNhzVA-M9t8YQ0s-VigchtGYSU_fa9cycCJjSoRdgovStYUQZR.SMwWWnIj7IaFIRsj?startTime=1621489025000

View Details


Scaling Geospatial Vector Data

Joris Van den Bossche, Julia Signell, Martin Fleischmann  |  2021/05/20 11:00:00 UTC - 2021/05/20 13:00:00

The Python ecosystem provides a nice set of tools for working with geospatial vector data, including Shapely and GeoPandas. Efforts are popping up to improve the performance and scalability of those workflows, such as dask-geopandas and spatialpandas. This workshop will give an overview of work by the community, and foster discussion on improvements and interoperability between the libraries.

Zoom: https://zoom.us/j/98654378161?pwd=eTMvRCtoV3hIaEFyd1Y3Ulh6QVh5dz09

Video Recording: https://zoom.us/rec/share/WFRbJZNlsdg49Oh1XxYmtheG2KTbL4-5AOj-_GuyBMjlLSQpabb0k2pbWPDgczmw.-F5hnexY3fEZ77_Z?startTime=1621508432000

View Details


Active Memory Management on Dask.distributed

Guido Imperiale  |  2021/05/20 11:30:00 UTC - 2021/05/20 00:00:00

This talk illustrates recent and ongoing work that rethinks how Dask.distributed manages memory across the cluster. Data should be automatically and transparently moved around workers to optimize memory occupancy, prevent workers from hanging, and increase robustness all around.

Zoom: https://zoom.us/j/98879950901?pwd=M056a2gwTHVJeERFWEEycVFMMDNXUT09

Video Recording: https://zoom.us/rec/share/ppUZ0OX73xgf17rFRQJjQ79wFRjsPxtd6bnzXfpsNrdS2QrlO_kpDFpXdIqp3BJH.oKLsesiFjpqQEAOx?startTime=1621510195000

View Details


Scaling Pandas using Dask: How to avoid all my mistakes

Krishan Bhasin  |  2021/05/20 12:00:00 UTC - 2021/05/20 12:30:00

Dask is a Python package that provides advanced parallelism for analytics, enabling performance at scale for the tools you love. People think it’s magic – drop it in and it scales. This will mostly work, but it will not scale well!

We would like to share what we’ve learned about using Dask to scale dataframe and computations, to avoid you making the same mistakes.

Zoom: https://zoom.us/j/98879950901?pwd=M056a2gwTHVJeERFWEEycVFMMDNXUT09

Video Recording: https://zoom.us/rec/share/M_BanH4yXBLTiZq6HS0BHU3iMdElpDfY9bv_IF_0fWah9JCn3wSTedL2lrU6hhLF.gZNIyeTVeFhLDLzB?startTime=1621512178000

View Details


Standardizing the Model Development and Approval Process

Joe Wolfe, Ryan Soley  |  2021/05/20 12:30:00 UTC - 2021/05/20 13:00:00

Rubicon is an open source data science tool that captures and stores model training and execution information, like parameters and outcomes, in a repeatable and searchable way to ensure full auditability and reproducibility for both developers and stakeholders alike. It was built for the pydata ecosystem and works well with dask.

Zoom: https://zoom.us/j/98879950901?pwd=M056a2gwTHVJeERFWEEycVFMMDNXUT09

Video Recording: https://zoom.us/rec/share/M_BanH4yXBLTiZq6HS0BHU3iMdElpDfY9bv_IF_0fWah9JCn3wSTedL2lrU6hhLF.gZNIyeTVeFhLDLzB?startTime=1621514113000

View Details



Finance Workshop, Presentation: Deploying and adapting Dask at Two Sigma, Using dask for large systems of financial models

Mike McCarty, Cindy Ge, Elaine Jiang, Jonathan Moore, Menghan Chen  |  2021/05/20 14:00:00 UTC - 2021/05/20 16:00:00

The Finance workshop aims to bring together Dask users and developers working in the financial industry to learn from each others experiences and find ways to collaborate going forward. We will start with lightning talks to share experiences using Dask and generate discussion. We will then form breakout groups to focus on common topics and identify ways to work together going forward.

Zoom: https://zoom.us/j/97343824192?pwd=Z1kydWxhbnpDOFR0K0gyclpNeHVXUT09

Video Recording: https://zoom.us/rec/share/LpPiwH---f6X5dJhliJkpoVU0YsAu6_rj7QRscY_0nBh4ocrAE9VVGkC6jabINeH.SLwgV_UmFiwQzDqd?startTime=1621519220000

View Details


Dask in HPC

Anderson Banihirwe, Guillaume Eynard-Bontemps  |  2021/05/20 14:00:00 UTC - 2021/05/20 16:00:00

The goal of this workshop is to bring together scientists, software developers and HPC center administrators to share their experiences with interactive supercomputing, using Dask in High Performance Computing (HPC) settings.

Zoom: https://zoom.us/j/91597434303?pwd=NC9UUllBYXc1a2pHMXVpTGU1UHFJUT09

Video Recording: https://zoom.us/rec/share/25oqsPdh_ZeZSFmvvbOLzJ0Sa8Q7py6GiLrVra8UHJmi0rZ1YLm69Jpt9NUiN4Cm.1MS7-WLlIPa3Ewku?startTime=1621519269000

View Details


How Distributed LightGBM on Dask Works

James Lamb  |  2021/05/20 14:30:00 UTC - 2021/05/20 15:00:00

In this talk, attendees will learn about LightGBM, a popular gradient boosting library. The talk offers details on distributed LightGBM training, and describes the main implementation of it using Dask. Attendees will learn which pieces of the Dask ecosystem LightGBM relies on, and what challenges LightGBM faces in using Dask to wrap existing distributed training code written in C++.

Zoom: https://zoom.us/j/97698388581?pwd=cGlPLzljUlNCWm9ZQ3U5T0JoYnNEZz09

Video Recording: https://zoom.us/rec/share/2MDNheUjidMT7EOcVuD0qnCph3OGnk9Wjf6QZo-8YLO95bzCEHaiDH6I5LmeqXE.Y87S6St0o2DuG29G?startTime=1621521150000

View Details


mlforecast: Scalable machine learning based time series forecasting

José Morales  |  2021/05/20 15:00:00 UTC - 2021/05/20 15:30:00

mlforecast is a framework to perform scalable machine learning based time series forecasting. It performs every step of the process in a distributed way, allowing you to scale to massive amounts of data. dask is used for the parallelism so you can use it either on a single machine or on remote clusters.

Zoom: https://zoom.us/j/97698388581?pwd=cGlPLzljUlNCWm9ZQ3U5T0JoYnNEZz09

Video Recording: https://zoom.us/rec/share/2MDNheUjidMT7EOcVuD0qnCph3OGnk9Wjf6QZo-8YLO95bzCEHaiDH6I5LmeqXE.Y87S6St0o2DuG29G?startTime=1621522979000

View Details


Scalable geospatial data analysis with Dask

Tom Augspurger  |  2021/05/20 15:30:00 UTC - 2021/05/20 16:00:00

We’ll use a fundamental conservation workload, land use / land cover change detection, to demonstrate how Dask can scale geospatial workloads. We’ll use tools like STAC and GDAL to efficiently query a geospatial dataset and mosaic many images into a single xarray DataArray. We’ll then apply a PyTorch model to do the actual land use classification, and xarray to analyze the change.

Zoom: https://zoom.us/s/97698388581?pwd=cGlPLzljUlNCWm9ZQ3U5T0JoYnNEZz09

Video Recording: https://zoom.us/rec/share/2MDNheUjidMT7EOcVuD0qnCph3OGnk9Wjf6QZo-8YLO95bzCEHaiDH6I5LmeqXE.Y87S6St0o2DuG29G?startTime=1621524734000

View Details


Architectures for Scalable Analytic Dashboards in Python with Dask Distributed and Dash

Jon Mease  |  2021/05/20 16:00:00 UTC - 2021/05/20 16:30:00

Dash is a framework for developing analytic web apps in Python. This talk will describe Dash’s design, and how it enables efficient scaling to support large numbers of simultaneous users. Then, several architectures will be presented that can be used to combine the strengths of Dash with the strengths of Dask Distributed to create apps that scale to support large datasets and many users.

Zoom: https://zoom.us/s/97698388581?pwd=cGlPLzljUlNCWm9ZQ3U5T0JoYnNEZz09

Video Recording: https://zoom.us/rec/share/2MDNheUjidMT7EOcVuD0qnCph3OGnk9Wjf6QZo-8YLO95bzCEHaiDH6I5LmeqXE.Y87S6St0o2DuG29G?startTime=1621526599000

View Details


Scale Machine Learning Code with Dask

Andrew Mshar, Ryan Soley  |  2021/05/20 16:00:00 UTC - 2021/05/20 17:30:00

Do you use the Scikit-learn library to build machine learning models? In this tutorial, we’ll discuss how to avoid the traps that lead to hard to maintain code while implementing customizations to these algorithms. We will cover how building your own estimators can lead to easily scaling your model training with additional libraries like Dask and Dask-ml with much less code than you might think!

Zoom: https://zoom.us/j/91216667381?pwd=T201dnR0NVQ0Nms0WDJXK3JDRFhiQT09

Video Recording: https://zoom.us/rec/share/p_Pqd6IJ_YnMEGGsmnnGTxH4_w8FQpG4H_v_4BcsjEE5iZANhjZq_5qDNau5oUiV.IBwADS6g6LDF-ORg?startTime=1621526617000

View Details


Using GPUs to Accelerate Data Science with Dask + RAPIDS

Jacob Schmitt  |  2021/05/20 16:00:00 UTC - 2021/05/20 18:00:00

RAPIDS supercharges data science with NVIDIA accelerated compute. Paired with Dask, data professionals can build highly-performant, distributed workloads with a comfortable toolset similar to favorites like pandas or scikit-learn. In this workshop, we’ll discuss how Dask + RAPIDS empower practitioners, how to start with these tools quickly, and how they’re used to solve common challenges.

Zoom: https://zoom.us/j/95362663568?pwd=N2tyU29RLzcycjFsWld0R2tYdmpVQT09

Video Recording: https://zoom.us/rec/share/X5h8Hp1Z2revdkiVjYcYImp2iV9d-abgyxB7ItrvRNIepAwgmlJnl3TINtAajyM6.P9lfQS35xyprfZv_?startTime=1621526701000

View Details



Bringing Dask Workloads to GPUs with RAPIDS

Benjamin Zaitlen, Nick Becker  |  2021/05/20 18:30:00 UTC - 2021/05/20 20:00:00

Data volumes and computational complexity of analysis techniques have increased, but the need to quickly explore data and develop models is more important than ever. One of the key ways to achieve this has been through GPU acceleration. This workshop introduces RAPIDS, and illustrates how to use Dask and RAPIDS to accelerate ETL/ML workloads, increasing performance and decreasing total cost.

Zoom: https://zoom.us/j/94954946323?pwd=ZXZpRGxRMjFkanlJYlRSbXQ1cGpoQT09

Video Recording: https://zoom.us/rec/share/ecFv8qlWZPyfiGv4zjFNTYErrA5H79mluVWBRegCpeATQhrWQUWj0cTKex4MITQ.fTnVW38TuJ7gmwwz?startTime=1621535703000

View Details


Bringing Dask to High Performance Computing (HPC) Clusters and Grids

Jenna Lau-Caruso, Michael Feiman  |  2021/05/20 18:30:00 UTC - 2021/05/20 19:00:00

This talk presents a blueprint for bringing Dask workloads to HPC grids. We implement an architecture which allows dynamic sharing of compute resources within and between multi-tenant environments where Dask clusters are defined in secure, flexible, and repeatable templates.

Zoom: https://zoom.us/j/96380136685?pwd=aXM5b0pUOUFCY01QTnFWRGcwOWMrdz09

Video Recording: https://zoom.us/rec/share/EaWoBAe_FGV0iTf_ma_qzbLDn5kEnD5rInMGLqg9bKT_ieNRn2JV1B49n7iPbUb4.IC6rqtc_bpEJneGi?startTime=1621535446000

View Details


Dask-on-Ray: Using Dask for Large-scale Data Processing on Ray

Clark Zinzow  |  2021/05/20 19:00:00 UTC - 2021/05/20 19:30:00

Ray is a distributed task execution system that provides a simple API for building distributed applications, and has a large ecosystem of libraries for training and serving machine learning models. As part of a recent effort to expand support for Ray-based data processing and data analytics, Dask-on-Ray was developed to allow users to run Dask workloads on Ray.

Zoom: https://zoom.us/j/96380136685?pwd=aXM5b0pUOUFCY01QTnFWRGcwOWMrdz09

Video Recording: https://zoom.us/rec/share/EaWoBAe_FGV0iTf_ma_qzbLDn5kEnD5rInMGLqg9bKT_ieNRn2JV1B49n7iPbUb4.IC6rqtc_bpEJneGi?startTime=1621537347000

View Details


Keynote: Design Principles of Distributed Systems

Holden Karau  |  2021/05/20 20:00:00 UTC - 2021/05/20 21:00:00

What makes a distributed system framework different than other libraries? How does one’s mental model need to change when thinking about writing code for distributed systems? What makes one distributed system framework different from another? What are some common trade-offs made amongst Dask, Spark, and Ray and how are these three different from other classes of distributed systems?

This talk will answer all of these questions, and include pictures of my amazing puppy dog for when you zone out.

Zoom: https://zoom.us/j/99885256199?pwd=UDJXeGp5S09JYjZNUUYwMTIzdVJVUT09

Video Recording: https://zoom.us/rec/share/lM49KwUGKdkDofST3Xn2R00UlzF5H0osg6nzwN2k8K8kdvlvY8dh1Tnthv2GMBvR.KgrRqPQNKS7_36xG?startTime=1621540826000

View Details


Dark Energy with Dask: Analyzing data from the Next Generation of Large Astronomical Surveys

Michael Wood-Vasey  |  2021/05/21 00:30:00 UTC - 2021/05/21 13:00:00

Astronomers are learning to use Dask to analyze terabyte to petabyte scale data from the upcoming Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) to unlock the mysteries of dark energy leading to the accelerated expansion of the Universe. Anticipating and balancing memory usage during interactive analyses across a high-performance computing center is proving challenging.

Zoom: https://zoom.us/j/91901424359?pwd=YjF1MlQreWNRNnRnMU9GVjhyK1VMUT09

Video Recording: https://zoom.us/rec/share/FqQC3hrNop_S4txcjls2JSd42qjh6TT0Ecz-RGdNvP_ehdJAdeLmnKCtwZmVj6ag.7BsUwkF2xss0Jyx-?startTime=1621600204000

View Details


Keynote: Dask & Prefect for Healthcare Machine Learning on AWS

Joe Schmid, Jie Lou  |  2021/05/21 13:00:00 UTC - 2021/05/21 14:00:00

In early 2019, SymphonyRM started a very ambitious R&D phase to develop machine learning models across many disease areas on clinical healthcare data. To achieve our vision, we needed infrastructure to give our data scientists superpowers. The combination of Dask and Prefect turned out to be incredibly powerful and productive, allowing us to achieve excellent results.

Zoom: https://zoom.us/j/97590283314?pwd=czU3SURreUVZNEExdkRPV3VkRy9oUT09

Video Recording: https://zoom.us/rec/share/esxiBx7J6_0WlHox7o940oUpORfoGJmuaaYftPLXWY8NFAWojgqRw-Me0AzNPXxL.bsgoJLNypUBkEHOC?startTime=1621602077000

View Details


Radio Astronomy Applications with Dask

Simon Perkins, Jan-Willem Steeb, Jonathan Kenyon, Landman Bester, Lexy A. L. Andati, Ruby Van Rooyen, Tim Cornwell, SKA project  |  2021/05/21 14:00:00 UTC - 2021/05/21 16:00:00

Next generation Radio Telescopes generate vast and ever-increasing quantities of data, but current software is not designed to operate in a parallel, distributed paradigm. This workshop brings together three strands of distributed dask Radio Astronomy development by SARAO, NRAO and SKAO, to provide a forum for the above challenge and to serve as a platform for future developments.

Zoom: https://zoom.us/j/99981786966?pwd=TGhRK2JRaStCeFpjc1EveGdrdlpxUT09

Video Recording: https://zoom.us/rec/share/yix_kJ3WHO1uGZdRl-x8aXrJmaO8YQ52vXw8aauBw40WD-Tk3A2zxY4gZmhlAEPT.9kCqC6d1gWrB-ODT?startTime=1621605674000

View Details


High-Performance Data Access for Dask

Martin Durant, Joris Van den Bossche, Richard Zamora  |  2021/05/21 14:00:00 UTC - 2021/05/21 16:00:00

Dask contains many functions for data IO for arrays and dataframes. In this workshop, we will discuss the current status of various data format integrations for Dask and more generally about the parallel/cloud-friendly data storage landscape.

Zoom: https://zoom.us/j/94963743297?pwd=TFBZanBwRGtxZHpMMVZjRURaS3VSUT09

Video Recording: https://zoom.us/rec/share/ICh-jVQ0BqFEAK4eNHM4_5ask04qALFz1N7AvEPFKD6heapvK2L9dE5QwEG0_lfd.MzvvmByibednS33f?startTime=1621605833000

View Details


Training PyTorch models faster with Dask

Scott Sievert  |  2021/05/21 14:00:00 UTC - 2021/05/21 14:30:00

Training machine learning (ML) models can easily take hours. A common parameter with big data is the “batch size,” the number of training examples used to approximate the gradient. Training time can be minimized by increasing the batch size with certain distributed systems. Our experimental results show that using our wrapper for a deep network requires less wall-clock time than standard SGD.

Zoom: https://zoom.us/j/93640713876?pwd=dm5zQm96S3NEZEpWc1ZLR0xVY1lUdz09

Video Recording: https://zoom.us/rec/share/FaL6Rd8HMmkzcbxUdm4gymKEUG2z1gtJHJy6HI109g8EdVj637JxG2fZyncckd85.Vw2veiLe45QKGsay?startTime=1621605605000

View Details


Using Dask to Finetune Geo-Raster Analysis

Brendan Collins  |  2021/05/21 14:30:00 UTC - 2021/05/21 15:00:00

Raster analysis sits at the core of work done in Geo domains, specifically GIS and Environmental Studies. Every domain, from finance to manufacturing, has a Geo component, so being able to wrangle large amounts of raster data is an asset. This talk will show you how Dask beefs up the geo-raster Python stack and how users from different domains can look to GIS to solve interesting problems.

Zoom: https://zoom.us/j/93640713876?pwd=dm5zQm96S3NEZEpWc1ZLR0xVY1lUdz09

Video Recording: https://zoom.us/rec/share/FaL6Rd8HMmkzcbxUdm4gymKEUG2z1gtJHJy6HI109g8EdVj637JxG2fZyncckd85.Vw2veiLe45QKGsay?startTime=1621607440000

View Details


Dask DataFrame groupby. Why it can fail and how to compensate.

Hugo Shi  |  2021/05/21 15:00:00 UTC - 2021/05/21 15:30:00

Dask DataFrame groupby operations are very common and very powerful. However, due to the distributed nature of Dask DataFrames, they can fail in unexpected ways. This talk covers mitigation strategies for these problems, including using set_index to optimize data layout, and using split_out and split_every parameters to optimize computation.

Zoom: https://zoom.us/j/93640713876?pwd=dm5zQm96S3NEZEpWc1ZLR0xVY1lUdz09

Video Recording: https://zoom.us/rec/share/FaL6Rd8HMmkzcbxUdm4gymKEUG2z1gtJHJy6HI109g8EdVj637JxG2fZyncckd85.Vw2veiLe45QKGsay?startTime=1621609177000

View Details


Using Dask and many GPUs to train a neural network with PyTorch

Jacqueline Nolis  |  2021/05/21 15:30:00 UTC - 2021/05/21 16:00:00

I took a neural network I had trained on a single CPU to generate pet names and tried retraining it with tons of connected GPUs using Dask, PyTorch, and the package dask-pytorch-ddp. I learned a lot about when is the right time to use multiple GPUs and what the pitfalls can be. In this talk I’ll discuss what these lessons mean for training with GPUs and Dask.

Zoom: https://zoom.us/j/93640713876?pwd=dm5zQm96S3NEZEpWc1ZLR0xVY1lUdz09

Video Recording: https://zoom.us/rec/share/FaL6Rd8HMmkzcbxUdm4gymKEUG2z1gtJHJy6HI109g8EdVj637JxG2fZyncckd85.Vw2veiLe45QKGsay?startTime=1621610908000

View Details


Dask in High-Energy Physics community

Oksana Shadura, Lukas Heinrich  |  2021/05/21 16:00:00 UTC - 2021/05/21 18:00:00

During this workshop, we would like to discuss upcoming challenges and different Dask use-case in High-Energy Physics community as well as deployment methods across HPC, on-premise high-throughput clusters and cloud providers. A key challenge is the integration of interactive scale-out systems within the existing federated scientific grid computing infrastructure.

Zoom: https://zoom.us/j/94873155498?pwd=RUZFZjUyQWhXSWNGYXIzOVhraXk0Zz09

Video Recording: https://zoom.us/rec/share/nWFRmpyzXqhOuI4fRgv3csOKbaEVzgA3yq1pTYCyGCCK0xjUQ3FBTLVioqcHRW7S.9-vMMxrI_ZEvHP7f?startTime=1621612811000

View Details


Xarray User Forum

Deepak Cherian, Anderson Banihirwe  |  2021/05/21 16:00:00 UTC - 2021/05/21 18:00:00

Xarray provides metadata-rich data structures that wrap array-like objects such as Dask arrays. This two-part session will highlight recent exciting advances in Xarray’s capabilities, and present user stories of Xarray+Dask usage across a wide variety of domains.

Zoom: https://zoom.us/j/95161275110?pwd=eWhMTzFtd1NuMTAvSWNOOW5qVnIyQT09

Video Recording: https://zoom.us/rec/share/BREqZXbqFf3d4B3J9QHAlWCIOuHdRgTq8hBtXSNl6UaSJPneitf_ZLU3vhF5Cjoh.sKTiTOE2DW6qXHlF?startTime=1621612831000

View Details


100GB/s GPU log analytics at Graphistry: A case study and production lessons on tuning dask-cudf

Leo Meyerovich  |  2021/05/21 16:00:00 UTC - 2021/05/21 16:30:00

Going from pandas or cudf to dask-cudf can unlock big and latency-sensitive analytics workloads… if done right. However, dask-cudf is quite new and multi-GPU computing faces NUMA hazards. This talk shares our experience with dask-cudf from two perspectives: A case study in tackling 100 GB/s for extracting an identity graph from big logs, and our top lessons in going to production.

Zoom: https://zoom.us/j/93640713876?pwd=dm5zQm96S3NEZEpWc1ZLR0xVY1lUdz09

Video Recording: https://zoom.us/rec/share/FaL6Rd8HMmkzcbxUdm4gymKEUG2z1gtJHJy6HI109g8EdVj637JxG2fZyncckd85.Vw2veiLe45QKGsay?startTime=1621612748000

View Details


GPU-accelerated Streaming @ Scale using Dask

Chinmay Chandak  |  2021/05/21 16:30:00 UTC - 2021/05/21 17:00:00

Stream processing is experiencing exponential growth with businesses and services relying heavily on real-time analytics, inferencing, monitoring, and more. Reliable, cost-effective streaming at scale is paramount, but auto-scaling has hit cost-efficiency limits with CPUs. This talk will be about how NVIDIA is leveraging Dask to GPU-accelerate big data stream processing at scale in production.

Zoom: https://zoom.us/j/93640713876?pwd=dm5zQm96S3NEZEpWc1ZLR0xVY1lUdz09

Video Recording: https://zoom.us/rec/share/FaL6Rd8HMmkzcbxUdm4gymKEUG2z1gtJHJy6HI109g8EdVj637JxG2fZyncckd85.Vw2veiLe45QKGsay?startTime=1621614790000

View Details


The Dask JupyterLab extension

Ian Rose  |  2021/05/21 17:00:00 UTC - 2021/05/21 17:30:00

The Dask JupyterLab extension provides integration between Dask and JupyterLab. I will show how to use the JupyterLab panel system to create custom layouts for the distributed dashboards, as well as how to use the integrated Dask cluster manager to start, stop, and customize your dask clusters. I will finish with ideas for how people with either Dask or frontend experience could contribute.

Zoom: https://zoom.us/j/93640713876?pwd=dm5zQm96S3NEZEpWc1ZLR0xVY1lUdz09

Video Recording: https://zoom.us/rec/share/FaL6Rd8HMmkzcbxUdm4gymKEUG2z1gtJHJy6HI109g8EdVj637JxG2fZyncckd85.Vw2veiLe45QKGsay?startTime=1621616392000

View Details


Capital One uses Dask!

Dan Kerrigan  |  2021/05/21 17:30:00 UTC - 2021/05/21 18:00:00

Capital One uses Dask and its ecosystem to great effect and gains more internal users regularly. This survey talk outlines who is using Dask, why they use Dask, how they deploy Dask, and the challenges they encounter. We will also consider the future of Dask and its usage within Capital One.

Zoom: https://zoom.us/j/93640713876?pwd=dm5zQm96S3NEZEpWc1ZLR0xVY1lUdz09

Video Recording: https://zoom.us/rec/share/FaL6Rd8HMmkzcbxUdm4gymKEUG2z1gtJHJy6HI109g8EdVj637JxG2fZyncckd85.Vw2veiLe45QKGsay?startTime=1621618150000

View Details


Doing Nothing Poorly: Accelerating Dask Scheduling

Benjamin Zaitlen, Benjamin Zaitlen, Gil Forsyth, James Bourbeau, John Kirkham, Mads R. B. Kristensen, Matt Rocklin, Richard Zamora  |  2021/05/21 18:00:00 UTC - 2021/05/21 20:00:00

As workloads scale, the overhead of processing tasks itself becomes a bottleneck. Improved scalability requires not only a faster scheduler, but a coordinated effort across the entire Dask ecosystem. High performance computing is not about doing one thing well; it’s about doing nothing poorly. In this workshop, we’ll cover an ongoing multi-institutional effort to accelerate Dask scheduling.

Zoom: https://zoom.us/j/98534272079?pwd=U2d6MEh6TWF3cGcwY0JzMnpROG5qZz09

Video Recording: https://zoom.us/rec/share/vO7Xx5HAupQHDEjTgmspp6VZoCY_zabi1nouOHCwl4mBSg2hVQpIFqTTpVQUNZA4.8cE5K7SjaMIUNINW?startTime=1621620212000

View Details


An Intro to Workflow Management with Prefect

Kevin Kho  |  2021/05/21 18:00:00 UTC - 2021/05/21 18:30:00

As data pipelines become increasingly complex and interconnected, workflow management systems are being used to schedule and monitor tasks. Prefect is an open-source workflow management system designed for large-scale data processes. We’ll show how to get started with Prefect and also cover how to run Prefect on top of Dask on the cloud to parallelize workflows.

Zoom: https://zoom.us/j/93640713876?pwd=dm5zQm96S3NEZEpWc1ZLR0xVY1lUdz09

Video Recording: https://zoom.us/rec/share/FaL6Rd8HMmkzcbxUdm4gymKEUG2z1gtJHJy6HI109g8EdVj637JxG2fZyncckd85.Vw2veiLe45QKGsay?startTime=1621620075000

View Details


Scale model training in minutes with Dask + Rapids on GCP AI Platform

Remy Welch  |  2021/05/21 18:30:00 UTC - 2021/05/21 19:00:00

Dask allows users to scale their python code, however, it is not usually easy to provision the machines necessary to run that code. Google Cloud has a wide range of machine sizes (200+ CPU, 600 GB + memory) and types that can be provisioned in minutes. Add to that a wide range of GPUs, including the single-node 16 A100 GPU shape, and you can use Dask on the cluster of your dreams.

Zoom: https://zoom.us/j/93640713876?pwd=dm5zQm96S3NEZEpWc1ZLR0xVY1lUdz09

Video Recording: https://zoom.us/rec/share/FaL6Rd8HMmkzcbxUdm4gymKEUG2z1gtJHJy6HI109g8EdVj637JxG2fZyncckd85.Vw2veiLe45QKGsay?startTime=1621621817000

View Details