19 May 18:30 – 19 May 19:00 in Talks

Using Dask for real time feedback in a data wrangling web application

Luis Aguirre, Argenis Leon

Audience level:
Intermediate

Description

In this talk, we will learn how Dask helps Bumblebee, an open-source, data wrangling web app, to provide the user with data insight and data transformation feedback in real-time using Dask sync and async task handling. Also, we will talk about our experience with Apache Spark, the shortcomings we found when creating Bumblebee, and how Dask helps to achieve the user experience we envision.

Abstract

Dask capabilities can be extended to web applications. In this talk we’ll make a quick tour of our process of including Dask on Bumblebee, a spreadsheet like web application, and how it helped us achieve the user experience we envisioned.

We’ll talk about how we use Dask on Bumblebee, making use of the different advantages Dask gives us but with an user interface focused on data-wrangling. Some of the topics to be discussed are: Optimus, a python library used on Bumblebee which unifies various engines in one API, how we improved our processing times by implementing Dask, how we used futures for asynchronous operations and how we used actors for remote-only processing for Dask-cuDF.