Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to join multiple date columns into one in pandas?

I have the below dataframe with multiple date columns and its value :

date         value_1      date        value_2    date         value_3
01-01-1990   1            01-01-1990  2          02-01-1990   4
02-01-1990   3            03-01-1990  20         
                          04-01-1990  30  

Output: combine all date columns into superset date column and show values accordingly.

date         value_1        value_2    value_3
01-01-1990   1              2          
02-01-1990   3                         4
03-01-1990                  20
04-01-1990                  30 
like image 231
user3222101 Avatar asked Oct 27 '25 11:10

user3222101


1 Answers

First is necessary deduplicated same columns names for pairs dates with values columns:

s = df.columns.to_series()
mask = df.columns.duplicated(keep=False)
c = np.where(mask, s + '_'  + (s.groupby(s).cumcount() + 1).astype(str) , s)
df.columns = c
print (df)
       date_1  value_1      date_2  value_2      date_3  value_3
0  01-01-1990      1.0  01-01-1990        2  02-01-1990      4.0
1  02-01-1990      3.0  03-01-1990       20         NaN      NaN
2         NaN      NaN  04-01-1990       30         NaN      NaN

Then loop by groupby with lambda function ans split by all pairs, create dates column, remove missing values and last concat together:

dfs = [x.set_index(x.columns[0]).dropna() 
       for i, x in df.groupby(lambda x: x.split('_')[1], axis=1)]
#print (dfs)

df2 = pd.concat(dfs, axis=1)
print (df2)
            value_1  value_2  value_3
01-01-1990      1.0      2.0      NaN
02-01-1990      3.0      NaN      4.0
03-01-1990      NaN     20.0      NaN
04-01-1990      NaN     30.0      NaN

EDIT:

Answer was changed for datetimes columns and next 2 data values columns:

print (df)
  date_security  GH_LAST_PRICE Val  GH_VOLUME_PRICE Val date_security  \
0    01-01-1990                1.0                  7.0    01-01-1990   
1    01-02-1990                3.0                  8.0    03-01-1990   
2           NaN                NaN                  NaN    04-01-1990   

   DG_LAST_PRICE Val  DG_VOLUME_PRICE Val  
0                  2                 10.0  
1                 20                  NaN  
2                 30                  1.0 

Create MultiIndex:

df.columns = [(np.arange(len(df.columns)) // 3).astype(str), df.columns]
print (df)
#              0                                                   1  \
  date_security GH_LAST_PRICE Val GH_VOLUME_PRICE Val date_security   
0    01-01-1990               1.0                 7.0    01-01-1990   
1    01-02-1990               3.0                 8.0    03-01-1990   
2           NaN               NaN                 NaN    04-01-1990   


  DG_LAST_PRICE Val DG_VOLUME_PRICE Val  
0                 2                10.0  
1                20                 NaN  
2                30                 1.0  

dfs = [x.set_index(x.columns[0]).dropna()
       for i, x in df.groupby(level=0, axis=1)]


df2 = pd.concat(dfs, axis=1)
#flatten MultiIndex
df2.columns = df2.columns.map('_'.join)
print (df2)
            0_GH_LAST_PRICE Val  0_GH_VOLUME_PRICE Val  1_DG_LAST_PRICE Val  \
01-01-1990                  1.0                    7.0                  2.0   
01-02-1990                  3.0                    8.0                  NaN   
04-01-1990                  NaN                    NaN                 30.0   

            1_DG_VOLUME_PRICE Val  
01-01-1990                   10.0  
01-02-1990                    NaN  
04-01-1990                    1.0  
like image 195
jezrael Avatar answered Oct 29 '25 08:10

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!