Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive Operation in Pandas

I have a DataFrame like this:

vals = {"operator": [1, 1, 1, 2, 3, 5], "nextval": [2, 3, 6, 4, 5, 6]}
df = pd.DataFrame(vals)

   operator  nextval
0         1        2
1         1        3
2         1        6
3         2        4
4         3        5
5         5        6

What I'm trying to do is get a list of all the possible paths from a starting point, like 1, and an ending point, like 6, using the operators and nextvals, not strictly the shortest path. The output can be flexible, but I'm looking for something like this or that communicates this:

1 -> 6
1 -> 2 -> 4 
1 -> 3 -> 5 -> 6

I'm able to get it close, but not sure how to get the recursion right since the dict can't handle 2 same keys:

import pandas as pd

vals = {"operator": [1, 1, 1, 2, 3, 5], "nextval": [2, 3, 6, 4, 5, 6]}
df = pd.DataFrame(vals)

df1 = df.set_index("operator")

dictvals = {}
for x in df1.index.unique():
    dictvals[x] = []
    df2 = df1.loc[x]
    if isinstance(df2, pd.DataFrame):
        for idx, rowdata in df2.iterrows():
            dictvals[x].append(rowdata["nextval"])
    else:
        dictvals[x] = df2[0]

print(dictvals) 

{1: [2, 3, 6], 2: 4, 3: 5, 5: 6}
like image 557
dfundako Avatar asked Sep 04 '20 02:09

dfundako


People also ask

What is recursive function in Python?

Python also accepts function recursion, which means a defined function can call itself. Recursion is a common mathematical and programming concept. It means that a function calls itself. This has the benefit of meaning that you can loop through data to reach a result.

What is recursion operation in Python and give an example?

In the above example, factorial() is a recursive function as it calls itself. When we call this function with a positive integer, it will recursively call itself by decreasing the number. Each function multiplies the number with the factorial of the number below it until it is equal to one.

What is recursion with example?

Recursion is the process of defining a problem (or the solution to a problem) in terms of (a simpler version of) itself. For example, we can define the operation "find your way home" as: If you are at home, stop moving. Take one step toward home.

What are the types of recursive function in Python?

They are of two types: indirect and direct recursion.


2 Answers

Check with networkx , you need a direction graph with 'root' to 'leaf' path

import networkx as nx
G=nx.from_pandas_edgelist(df,source='operator',target='nextval', edge_attr=None, create_using=nx.DiGraph())
road=[]
for n in G:
       if G.out_degree(n)==0: #leaf
           road.append(nx.shortest_path(G, 1, n))
           
road
Out[82]: [[1, 2, 4], [1, 3, 5, 6]]

Update

import networkx as nx
G=nx.from_pandas_edgelist(df,source='operator',target='nextval', edge_attr=None, create_using=nx.DiGraph())
road=[]
for n in G:
       if G.out_degree(n)==0: #leaf
           road.append(list(nx.all_simple_paths(G, 1, n)))
           
road
Out[509]: [[[1, 3, 5, 6], [1, 6]], [[1, 2, 4]]]
like image 199
BENY Avatar answered Sep 23 '22 07:09

BENY


Let's try to hand-roll a solution, because thinking about this kind of recursive algorithm is educational. (Of course it is appropriate to just use an existing library in the real world; it will probably be much more fault-tolerant.)

The code you show builds a recognizable representation of the graph itself, but it would be better to use lists (or sets, or tuple) for the values even when there is only one successor node, for consistency. I would argue that sets make the most sense here, since if there are duplicate entries in the input then we should discard the duplicate graph nodes. So let us suppose we start with:

graph = {1: {2, 3}, 2: {4}, 3: {5}, 5: {6}}

We have agreed to restrict ourselves to considering directed acyclic graphs. I propose that the paths from our root node can be found recursively as follows: recursively check each path from each successor node; accumulate these results, and prepend each with the link from the root to the corresponding successor.

Of course, when we write recursive code we like to avoid side effects, since they make it harder to reason. So let us instead say: the accumulation of all paths, defined as (link from node to successor) + (path from successor to end), for each successor, for each pat from that successor. Of course, the way we represent the "link from node to successor" is just the current node name and an arrow; we get the rest of the path from the recursion, including the successor name.

And then we need a base case: if there are no successors, then we have a single path from here to the end (since we are at an end), which is just that node name by itself. It would be simpler for our code if the dead ends in our graph were represented with empty sets; but it's clearly easier to generate the graph just omitting those keys. So we'll lean on dict.get rather than indexing when we make the check.

Well, that first part sounds an awful lot like a list comprehension to me (with two for clauses`. For the base case, to match that, we need a list containing one path. That gives us:

def paths(graph, root):
    successors = graph.get(root, set())
    if not successors:
        return [str(root)] # a single path starting here and going nowhere.
    return [
        f'{root} -> {path}'
        for successor in successors
        for path in paths(graph, successor)
    ]

Let's try it:

>>> paths({1: {2, 3}, 2: {4}, 3: {5}, 5: {6}}, 1)
['1 -> 2 -> 4', '1 -> 3 -> 5 -> 6']

Alternatively, you could use generator expressions instead of list comprehensions, or even write it as a recursive generator (using yield and yield from).

(If we are feeling cheeky enough, we can continue the functional-programming theme by using a conditional expression:)

def paths(graph, root):
    successors = graph.get(root, set())
    return [
        f'{root} -> {path}'
        for successor in successors
        for path in paths(graph, successor)
    ] if successors else [str(root)]
like image 21
Karl Knechtel Avatar answered Sep 24 '22 07:09

Karl Knechtel