Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas pivot and join in two dataframes

I have two dataFrames :

df1
   mag   cat
0  101   A1
1  256   A2  
2  760   A2
3  888   A3  
...

df2
   A1    A2    A3    ...
0  E50R  AZ33  REZ3 
1  T605  YYU6  YHG5
2  IR50  P0O9  BF53
3  NaN   YY9I  NaN

And I would like to create a final DataFrame which looks like :

df
   101   256   760   888  ...
0  E50R  AZ33  AZ33  REZ3
1  T605  YYU6  YYU6  YHG5
2  IR50  P0O9  P0O9  BF53
3  NaN   YY9I  YY9I  NaN

I tried something with pivot, but it doesn't seem to do the job Could you help me ?

like image 884
Matthieu Veron Avatar asked Jul 31 '18 12:07

Matthieu Veron


4 Answers

IIUC reindex +re-name

newdf=df2.reindex(columns=df1.cat)
newdf.columns=df1.mag
newdf
Out[519]: 
mag   101   256   760   888
0    E50R  AZ33  AZ33  REZ3
1    T605  YYU6  YYU6  YHG5
2    IR50  P0O9  P0O9  BF53
3     NaN  YY9I  YY9I   NaN
like image 104
BENY Avatar answered Oct 14 '22 07:10

BENY


You can use a combination of GroupBy, numpy.repeat, itertools.chain:

from itertools import chain

# map cat to list of mag
s = df1.groupby('cat')['mag'].apply(list)

# calculate indices for columns, including repeats
cols_idx = np.repeat(range(len(df2.columns)), s.map(len))

# apply indexing
res = df2.iloc[:, cols_idx]

# rename columns
res.columns = list(chain.from_iterable(df2.columns.map(s.get)))

print(res)

    101   256   760   888
0  E50R  AZ33  AZ33  REZ3
1  T605  YYU6  YYU6  YHG5
2  IR50  P0O9  P0O9  BF53
3   NaN  YY9I  YY9I   NaN

Performance benchmarking

Some good and different solutions here, so you may be interested in performance. Wen's reindex solution is the clear winner.

%timeit wen(df1, df2)   # 632 µs per loop
%timeit jpp(df1, df2)   # 2.55 ms per loop
%timeit scb(df1, df2)   # 7.98 ms per loop
%timeit abhi(df1, df2)  # 4.52 ms per loop

Code:

def jpp(df1, df2):
    s = df1.groupby('cat')['mag'].apply(list)
    cols_idx = np.repeat(range(len(df2.columns)), s.map(len))
    res = df2.iloc[:, cols_idx]
    res.columns = list(chain.from_iterable(df2.columns.map(s.get)))    
    return res

def scb(df1, df2):
    df_out = (df2.stack().reset_index()
                 .merge(df1, left_on='level_1', right_on='cat')[['level_0','mag',0]])
    return df_out.pivot('level_0','mag',0).reset_index(drop=True)    

def abhi(df1, df2):
    return df2.T.merge(df1, left_index=True, right_on='cat').drop('cat', axis=1).set_index('mag').T

def wen(df1, df2):
    newdf=df2.reindex(columns=df1.cat)
    newdf.columns=df1.mag
    return newdf
like image 21
jpp Avatar answered Oct 14 '22 06:10

jpp


Another way you can do it using stack, merge, and pivot:

df_out = (df2.stack().reset_index()
             .merge(df1, left_on='level_1', right_on='cat')[['level_0','mag',0]])

df_out.pivot('level_0','mag',0).reset_index(drop=True)

Output:

mag   101   256   760   888
0    E50R  AZ33  AZ33  REZ3
1    T605  YYU6  YYU6  YHG5
2    IR50  P0O9  P0O9  BF53
3     NaN  YY9I  YY9I   NaN
like image 3
Scott Boston Avatar answered Oct 14 '22 08:10

Scott Boston


You can do this by transpose and merge and then set column 'mag' as index. Then again transposing it.

df2_transposed = df2.T
res = df2_transposed.merge(df1,how = "left",left_index=True,right_on='cat')
del res['cat']
res.set_index('mag', inplace=True)
res.T

mag 101      256    760     888
0   E50R    AZ33    AZ33    REZ3
1   T605    YYU6    YYU6    YHG5
2   IR50    P0O9    P0O9    BF53
3           YY9I    YY9I    
like image 3
anarchy Avatar answered Oct 14 '22 07:10

anarchy