Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently fill an incomplete pandas dataframe consisting of pairwise combinations of values in a list?

Let's say I have a list of values,

lst=['orange','apple','banana', 'grape', 'lemon']

I also have a pandas dataframe of the form, df:

Source     Destination     Weight
orange     apple           0.4
banana     orange          0.67
grape      lemon           0.1
grape      banana          0.5

The rows are a subset of all pairwise combinations in lst. Note that every combination appears at most once.

What I want is a new dataframe where the remaining combinations are filled in with a value of 0.

For example, new_df:

Source     Destination     Weight
orange     apple           0.4
banana     orange          0.67
grape      lemon           0.1
grape      banana          0.5
orange     grape           0.0
orange     lemon           0.0
banana     lemon           0.0

The order does not make a difference.

What is a fast way to do this?

like image 832
Melsauce Avatar asked Aug 02 '17 18:08

Melsauce


2 Answers

  • I create an array of sets of combinations
  • Then I do the same thing for the combinations that already exist
  • I use np.in1d to find the ones that don't exist
  • Then append a new dataframe with the ones that don't exist yet.

from itertools import combinations

comb = np.array([set(x) for x in combinations(lst, 2)])
exst = df[['Source', 'Destination']].apply(set, 1).values
new = comb[~np.in1d(comb, exst)]

d1 = pd.DataFrame(
    [list(x) for x in new],
    columns=['Source', 'Destination']
).assign(Weight=0.)

df.append(d1, ignore_index=True)

   Source Destination  Weight
0  orange       apple    0.40
1  banana      orange    0.67
2   grape       lemon    0.10
3   grape      banana    0.50
4   grape      orange    0.00
5  orange       lemon    0.00
6   apple      banana    0.00
7   grape       apple    0.00
8   apple       lemon    0.00
9  banana       lemon    0.00
like image 102
piRSquared Avatar answered Nov 14 '22 23:11

piRSquared


Step 1: Convert your source dataframe to a frozenset

In [350]: df = df.assign(Combinations=df.apply(lambda x: frozenset(x[:-1]), axis=1)).loc[:, ['Combinations', 'Weight']]

Step 2: Generate all possible combinations (import itertools first) of items from lst

In [352]: new_df = pd.DataFrame(list(itertools.combinations(lst, 2)), columns=['Source', 'Destination'])

Step 3: Merge on combinations

In [358]: new_df = new_df.iloc[:, :2].apply(lambda x: frozenset(x), axis=1)\
                        .to_frame().rename(columns={0 : "Combinations"})\
                        .merge(df, how='outer').fillna(0)

Step 4: Revert to original structure

In [365]: new_df.apply(lambda x: pd.Series(list(x['Combinations'])), axis=1)\
                .rename(columns={0 : 'Source', 1 : 'Destination'})\
                .join(new_df['Weight'])
Out[365]: 
   Source Destination  Weight
0  orange       apple    0.40
1  orange      banana    0.67
2   grape      orange    0.00
3  orange       lemon    0.00
4   apple      banana    0.00
5   grape       apple    0.00
6   apple       lemon    0.00
7   grape      banana    0.50
8   lemon      banana    0.00
9   grape       lemon    0.10
like image 45
cs95 Avatar answered Nov 15 '22 00:11

cs95