Time Zone: UTC

20 May 14:30 – 20 May 15:00 in Talks

How Distributed LightGBM on Dask Works

James Lamb

Audience level:
Intermediate

Description

In this talk, attendees will learn about LightGBM, a popular gradient boosting library. The talk offers details on distributed LightGBM training, and describes the main implementation of it using Dask. Attendees will learn which pieces of the Dask ecosystem LightGBM relies on, and what challenges LightGBM faces in using Dask to wrap existing distributed training code written in C++.

Abstract

In this talk, attendees will learn about LightGBM, a popular gradient boosting library from Microsoft. After a high-level overview of the LightGBM algorithm, the talk will describe strategies for distributed training of gradient boosted decision tree (GBDT) models generally, and distributed training of LightGBM models specifically.

With this base established, bulk of the talk will cover the current state of LightGBM's Dask integration. Attendees will learn the division of responsibilities between Dask and LightGBM's existing distributed training framework, which is written in C++. The talk will also cover the specific components of the Dask ecosystem that LightGBM relies on.

The talk offers details on distributed LightGBM training, and describes the main implementation of it using Dask. Attendees will learn which pieces of the Dask ecosystem LightGBM relies on, and what challenges LightGBM faces in using Dask to wrap existing distributed training code written in C++.