Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Longest path finding with condition

I'm trying to solve a problem in Python/Pandas which I think is closely related to the longest path algorithm. The DataFrame I'm working with has the following structure:

import numpy as np 
import pandas as pd

data = {
    "cusID": ["001", "001", "001", "001", "001", "001", "002", "002", "002"],
    "start": ["A", "B", "C", "D", "A", "E", "B", "C", "D"],
    "end": ["B", "C", "D", "A", "E", "A", "C", "D", "E"]
}
df = pd.DataFrame(data)
print(df)

  cusID start end
0   001     A   B
1   001     B   C
2   001     C   D
3   001     D   A
4   001     A   E
5   001     E   A
6   002     B   C
7   002     C   D
8   002     D   E

For each customer, I want to find the longest sequence which does not contain A. For instance, for customer 001, the sequence can be viewed as follows:

A -> B -> C -> D -> A -> E -> A

where B -> C -> D is the longest sequence of size 3.

The resulting DataFrame I'm looking for is the following:

  cusID longestSeq
0   001          3
1   002          4

Although I wasn't able to write much code to solve this problem, here are some of my thoughts: First, it's obvious that I need to group the original DataFrame by cusID to analyse each of the two sequences separately. One idea I had was to apply some function to transform the DataFrame into this format:

  cusID                       seq
0   001     [A, B, C, D, A, E, A]
1   002              [B, C, D, E]

and then working on each list separately, and use some sort of counter to take the maximum of the length of the paths that exclude A. My problem is transcribing that logic into code (if correct). Any help would be appreciated.

like image 843
glpsx Avatar asked Nov 05 '20 11:11

glpsx


People also ask

How do you find the longest path?

Acyclic graphsA longest path between two given vertices s and t in a weighted graph G is the same thing as a shortest path in a graph −G derived from G by changing every weight to its negation. Therefore, if shortest paths can be found in −G, then longest paths can also be found in G.

How do you find the longest path in DFS?

We can call the DFS function from every node and traverse for all its children. The recursive formula will be: dp[node] = max(dp[node], 1 + max(dp[child1], dp[child2], dp[child3]..)) At the end check for the maximum value in dp[] array, which will be the longest path in the DAG.

Can BFS be used to find longest path?

We can find the longest path using two BFSs. The idea is based on the following fact: If we start BFS from any node x and find a node with the longest distance from x, it must be an endpoint of the longest path.


1 Answers

First step is to normalize the sequences.

seqs = pd.concat([
    df.drop(columns="end").rename(columns={"start":"node"}),
    df.groupby("cusID").tail(1).drop(columns="start").rename(columns={"end":"node"})
])
seqs = seqs.sort_values("cusID", kind="mergesort").reset_index(drop=True)

>>> seqs
   cusID node
0    001    A
1    001    B
2    001    C
3    001    D
4    001    A
5    001    E
6    001    A
7    002    B
8    002    C
9    002    D
10   002    E

Then, using zero_runs we define:

def longest_non_a(seq):
    eqa = seq == "A"
    runs = zero_runs(eqa)
    return (runs[:,1] - runs[:,0]).max()
    
result = seqs.groupby("cusID")["node"].apply(longest_non_a)

>>> result
cusID
001    3
002    4
Name: node, dtype: int64
like image 69
orlp Avatar answered Oct 17 '22 13:10

orlp