I have two CSV files that I would like to merge. With pandas I would use:
pd.merge(df1,df2, how='left', left_on='ST_LOGINID', right_on='LOGINID')
However panda runs out of memory performing this operation ("MemoryError:"), although my RAM usage only goes from 1.9 GB to 2.2GB out of 4GB before the error is returned.
I am thus looking for either one of these solutions: 1) One way to perform such a merge/join operation without loading the files into memory 2) One way to allow pandas to use more RAM, since it seems that there is plenty of memory available.
Try csvkit:
First install with:
pip install csvkit
Then:
csvjoin -c "ST_LOGINID, LOGINID" --outer file1.csv file2.csv
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With