Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing a DAG in python

Tags:

I am implementing a DAG in python. I am using a dictionary to implement the DAG. Each key represents a node in the graph. And the value associated with a key represents a set of nodes dependent on the node at that key.

Is it necessary to use an orderedDict instead of a Dict for implementing the DAG. The orderedDict preserves the order of insertion of the keys. I am wondering why would one want to preserve the insertion order of nodes in the DAG when the value at each key represents a set of nodes dependent of the node at that corresponding key?

like image 770
Sharu Gupta Avatar asked Feb 27 '19 10:02

Sharu Gupta


People also ask

How do you create a DAG?

To create a DAG in Airflow, you always have to import the DAG class. After the DAG class, come the imports of Operators. Basically, for each Operator you want to use, you have to make the corresponding import. For example, you want to execute a Python function, you have to import the PythonOperator.

How do you Airflow in Python?

Airflow is a platform to program workflows (general), including the creation, scheduling, and monitoring of workflows. Airflow implements workflows as DAGs, or Directed Acyclic Graphs. Airflow can be accessed and controlled via code, via the command-line, or via a built-in web interface.


2 Answers

Suppose you have the following DAG:

example DAG

You could represent this DAG as a dictionary:

graph = {
    'root': ['a'],
    'a': ['b', 'e'],
    'b': ['c', 'd'],
    'd': ['e']}

You could also represent this DAG as an ordered dictionary, but that'd be unnecessary. The ordering of the key / value pairs does not matter. There's a buggy / incomplete Python DAG library that uses ordered dictionaries, but that lib isn't a good example to follow.

networkx is the gold standard for Python DAGs (and other graphs). You can create a networkx directed graph with a list of tuples that represent the graph edges:

import networkx as nx

graph = nx.DiGraph()
graph.add_edges_from([("root", "a"), ("a", "b"), ("a", "e"), ("b", "c"), ("b", "d"), ("d", "e")])

See here for more information about Python DAGs.

like image 153
Powers Avatar answered Oct 19 '22 19:10

Powers


graphlib is the module in the Python standard library for creating directed acyclic graphics. It was new in version 3.9.

It seems a bit redundant to copy/paste an example from the documentation, but here's a very short one:

>>> graph = {"D": {"B", "C"}, "C": {"A"}, "B": {"A"}}
>>> ts = TopologicalSorter(graph)
>>> tuple(ts.static_order())
('A', 'C', 'B', 'D')

For earlier versions of Python there is a backport: pip install graphlib_backport or put this in your requirements.txt file:

graphlib_backport; python_version < "3.9.0"
like image 29
Ian Goldby Avatar answered Oct 19 '22 19:10

Ian Goldby