Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas, a function will be applied to the combinations of the elements in one row based on a condition on the other row

It seems like there are similar questions, but I couldn't find a proper answer. Let's say this is my dataframe which has different observations for a different brand of cars:

df = pandas.DataFrame({'Car' : ['BMW_1', 'BMW_2', 'BMW_3', 'WW_1','WW_2','Fiat_1', 'Fiat_2'],
                       'distance'   : [10,25,22,24,37,33,49]})

For simplicity, let's assume that I have a function multiples first element by two and second by three:

def my_func(x,y):
   z = 2x + 3y
   return z

I want to get pairwise combinations of the distances covered by the cars and use them in my_func. But there are two conditions are that x and y can not be same brands and combinations should not be duplicated. Desired output is something like this:

  Car      Distance   Combinations                                
0  BMW_1   10         (BMW_1,WW_1),(BMW_1,WW_2),(BMW_1,Fiat_1),(BMW_1,Fiat_1)
1  BMW_2   25         (BMW_2,WW_1),(BMW_2,WW_2),(BMW_2,Fiat_1),(BMW_2,Fiat_1)
2  BMW_3   22         (BMW_3,WW_1),(BMW_3,WW_2),(BMW_3,Fiat_1),(BMW_3,Fiat_1)
3  WW_1    24         (WW_1, Fiat_1),(WW_1, Fiat_2)
4  WW_2    37         (WW_2, Fiat_1),(WW_2, Fiat_2)
5  Fiat_1  33         None
6  Fiat_2  49         None

//Output
[120, 134, 156, 178]
[113, 145, 134, 132]
[114, 123, 145, 182]
[153, 123] 
[120, 134] 
None 
None 

Note: I made up the numbers for output.

Next Step I want to get maximum numbers from the arrays of 'output' row for each brand. And the final data should look like

  Car  Max_Distance
0 BMW  178
1 WW   153
2 Fiat None

I will be grateful if someone could help me

like image 355
edyvedy13 Avatar asked Mar 09 '23 12:03

edyvedy13


2 Answers

UPDATE:

In [49]: x = pd.DataFrame(np.triu(squareform(pdist(df[['distance']], my_func))),
    ...:                  columns=df.Car.str.split('_').str[0],
    ...:                  index=df.Car.str.split('_').str[0]).replace(0, np.nan)
    ...:

In [50]: x[x.apply(lambda col: col.index != col.name)].max(1).max(level=0)
Out[50]:
Car
BMW     197.0
Fiat      NaN
WW      221.0
dtype: float64

OLD answer:

IIUC you can do something like the following:

from scipy.spatial.distance import pdist, squareform

def my_func(x,y):
    return 2*x + 3*y

x = pd.DataFrame(
    squareform(pdist(df[['distance']], my_func)),
    columns=df.Car.str.split('_').str[0],
    index=df.Car.str.split('_').str[0])

it produced:

In [269]: x
Out[269]:
Car     BMW    BMW    BMW     WW     WW   Fiat   Fiat
Car
BMW     0.0   95.0   86.0   92.0  131.0  119.0  167.0
BMW    95.0    0.0  116.0  122.0  161.0  149.0  197.0
BMW    86.0  116.0    0.0  116.0  155.0  143.0  191.0
WW     92.0  122.0  116.0    0.0  159.0  147.0  195.0
WW    131.0  161.0  155.0  159.0    0.0  173.0  221.0
Fiat  119.0  149.0  143.0  147.0  173.0    0.0  213.0
Fiat  167.0  197.0  191.0  195.0  221.0  213.0    0.0

exluding the same brand:

In [270]: x.apply(lambda col: col.index != col.name)
Out[270]:
Car     BMW    BMW    BMW     WW     WW   Fiat   Fiat
Car
BMW   False  False  False   True   True   True   True
BMW   False  False  False   True   True   True   True
BMW   False  False  False   True   True   True   True
WW     True   True   True  False  False   True   True
WW     True   True   True  False  False   True   True
Fiat   True   True   True   True   True  False  False
Fiat   True   True   True   True   True  False  False

In [273]: x[x.apply(lambda col: col.index != col.name)]
Out[273]:
Car     BMW    BMW    BMW     WW     WW   Fiat   Fiat
Car
BMW     NaN    NaN    NaN   92.0  131.0  119.0  167.0
BMW     NaN    NaN    NaN  122.0  161.0  149.0  197.0
BMW     NaN    NaN    NaN  116.0  155.0  143.0  191.0
WW     92.0  122.0  116.0    NaN    NaN  147.0  195.0
WW    131.0  161.0  155.0    NaN    NaN  173.0  221.0
Fiat  119.0  149.0  143.0  147.0  173.0    NaN    NaN
Fiat  167.0  197.0  191.0  195.0  221.0    NaN    NaN

selecting maximum per row:

In [271]: x[x.apply(lambda col: col.index != col.name)].max(1)
Out[271]:
Car
BMW     167.0
BMW     197.0
BMW     191.0
WW      195.0
WW      221.0
Fiat    173.0
Fiat    221.0
dtype: float64

max per brand:

In [276]: x[x.apply(lambda col: col.index != col.name)].max(1).max(level=0)
Out[276]:
Car
BMW     197.0
Fiat    221.0
WW      221.0
dtype: float64
like image 72
MaxU - stop WAR against UA Avatar answered Apr 05 '23 22:04

MaxU - stop WAR against UA


i, j = np.tril_indices(len(df), 1)

def my_func(x,y):
    z = 2 * x + 3 * y
    return z

d = df.distance.values
c = df.Car.values
s = pd.Series(my_func(d[i], d[j]), [c[i], c[j]])

def test_name(df):
    name = df.index[0]
    n1, n2 = map(lambda x: x.split('_')[0], name)
    return n1 != n2

s.groupby(level=[0, 1]).filter(test_name).groupby(level=1).apply(list)

BMW_1       [78, 104, 96, 128]
BMW_2     [123, 149, 141, 173]
BMW_3     [114, 140, 132, 164]
Fiat_1                   [173]
WW_1           [116, 138, 170]
WW_2                [177, 209]
dtype: object
like image 35
piRSquared Avatar answered Apr 05 '23 22:04

piRSquared