Metagraph is an experimental library designed to glue together a fragmented world of graph libraries. However, Metagraph extends Dask in ways that have broader potential. This talk will explore the components of Metagraph: a multiple dispatch system, a data translation system, and a plugin-based DAG compiler. These ideas will motivate a wishlist of possible enhancements to the Dask core.
Metagraph is part of the larger DARPA HIVE project to accelerate graph analytics at scale. Most of the HIVE project is focused on a novel hardware accelerator, but Metagraph is designed to address the high-level obstacles to accelerated graph analytics:
These issues present a significant usability obstacle to user adoption of new software and hardware in the graph analytics space. Metagraph is an experiment to mitigate these obstacles, by allowing a coherent API to be layered on top of a fragmented ecosystem of libraries with overlapping functionality and incompatible data representations.
In this talk, we will discuss why Metagraph built its API on top of Dask, and how we combined Dask with a type system to support automatic data translation and multiple dispatch across implementations that span libraries and different hardware targets. Metagraph's plugin-based type system makes it easy for 3rd parties to write the glue code needed to incorporate new libraries into the dispatcher without needing to modify those libraries directly.
Additionally, the Metagraph dispatcher also can select "compilable" task implementations that can be eventually inlined together to create fused tasks with a JIT compiler. Metagraph contains a special Dask optimization pass that extracts compilable subgraphs from the overall task graph, passes them to the relevant compiler plugin (currently supporting Numba and MLIR) and then replaces the subgraphs with newly JIT-compiled task nodes. This opens up a world of possible optimization approaches, especially when dealing with accelerator hardware (like GPUs), where fusing tasks together could have significant benefits.
We will conclude with some general ideas for how the Dask core could evolve to make optimization approaches, like those in Metagraph, easier to implement. In particular, we will talk about the benefits to having "task metadata" as well as the need for a more universal and extensible system for DAG optimization passes in Dask.