Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - add NaN for missing values when pd.merge

I am concatenating all files in a directory together into one, however some files have a different number of entries - how to I place an NaN when there isn't a value for that key in the file?

For example:

file1.cs

 NUM, NAME, ORG, DATA
 1,AAA,10,123.4
 1,AAB,20,176.5
 1,AAC,30,133.5

file2. cs

 NUM, NAME, ORG, DATA
 1,AAA,10,111.4
 1,AAC,30,122.5
 2,BBA,12,156.7

Desired Output

 NUM, NAME, ORG, File1, File2 ....
 1, AAA, 10, 123.4, 111.4
 1, AAB, 20, 176.5, NaN
 1, AAC, 30, 133.5, 122.5
 2, BBA, 12, NaN,   156.7
 .....

This is what I've tried:

import pandas as pd
import glob

writer = pd.ExcelWriter('analysis.xlsx', engine='xlsxwriter')
data = []
df1 = pd.read_csv("file1.cs", sep = ',', header = 'infer')    

for infile in glob.glob("*.cs"):
    df = pd.read_csv(infile, sep = ',', header = 'infer')
    name = infile[13:-7]
    df['filename'] = name
    data.append(df)
result = pd.merge(df1, data.to_frame(), on= 'NAME')
result.to_excel(writer, sheet_name=sheetname)
writer.save()

I also tried pd.concat(data, axis=1, ignore_index=False) but this doesn't add the NaN because it just concatenates the files based on the column name.

like image 567
EA00 Avatar asked Feb 22 '18 17:02

EA00


1 Answers

Use merge with parameter how equal to 'outer':

df1.merge(df2, on=['NUM','NAME','ORG'], how='outer')

Output:

   NUM NAME  ORG  DATA_x  DATA_y
0    1  AAA   10   123.4   111.4
1    1  AAB   20   176.5     NaN
2    1  AAC   30   133.5   122.5
3    2  BBA   12     NaN   156.7

To get your exact output use:

df1.rename(columns={'DATA':'FILE'})\
   .merge(df2.rename(columns={'DATA':'FILE'}), 
         on=['NUM','NAME','ORG'],
         how='outer', 
         suffixes=('1','2'))

Output:

   NUM NAME  ORG  FILE1  FILE2
0    1  AAA   10  123.4  111.4
1    1  AAB   20  176.5    NaN
2    1  AAC   30  133.5  122.5
3    2  BBA   12    NaN  156.7
like image 104
Scott Boston Avatar answered Sep 22 '22 06:09

Scott Boston