Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to merge multiple csv files to the right of eachother in time order? (python)

Tags:

python

csv

excel

I currently have downloaded 120 files (10 years, month by month) worth of csv data.

I'm using some code below that merges all of these together into one document that goes in time order, e..g from 1/1/09 to 1/1/19.

from glob import glob
files = sorted(glob('*.csv'))
with open('cat.csv', 'w') as fi_out:
    for i, fname_in in enumerate(files):
        with open(fname_in, 'r') as fi_in:
                if i_line > 0 or i == 0:
                    fi_out.write(line)

This works all fine, however know I have also downloaded the same type of data except for a different product. What I also order all this new data in time order but have it side by side with the old set of data.

I receive an error like so:

Any help would be appreciated.

EDIT1:

Traceback (most recent call last):
  File "/Users/myname/Desktop/collate/asdas.py", line 4, in <module>
    result = pd.merge(data1[['REGION', 'TOTALDEMAND', 'RRP']], data2, on='SETTLEMENTDATE')
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 61, in merge
    validate=validate)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 551, in __init__
    self.join_names) = self._get_merge_keys()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 871, in _get_merge_keys
    lk, stacklevel=stacklevel))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 1382, in _get_label_or_level_values
    raise KeyError(key)
KeyError: 'SETTLEMENTDATE'

EDIT2:

import pandas as pd
df1 = pd.read_csv("product1.csv") 
df2 = pd.read_csv("product2.csv") 
combine = pd.merge(df1, df2, on='DATE', how='outer')
combine.columns = ['product1_price', 'REGION1', 'DATE', 'product2_price', 'REGION2']
combine[['DATE','product1_price','product2_price']]
combine.to_csv("combine.csv",index=False)

Error:

Traceback (most recent call last):
  File "/Users/george/Desktop/collate/asdas.py", line 5, in <module>
    combine.columns = ['VICRRP', 'REGION1', 'SETTLEMENTDATE', 'QLD1RRP', 'REGION2']
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 4389, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 646, in _set_axis
    self._data.set_axis(axis, labels)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 3323, in set_axis
    'values have {new} elements'.format(old=old_len, new=new_len))
ValueError: Length mismatch: Expected axis has 9 elements, new values have 5 elements
like image 405
user8261831 Avatar asked Mar 05 '23 16:03

user8261831


1 Answers

Load your data into dataframes

import pandas as pd
data1 = pd.read_csv("filename1.csv") 
data2 = pd.read_csv("filename2.csv") 

Merge the two dataframes on SETTLEMENTDATE

result = pd.merge(data1, data2, on='SETTLEMENTDATE')

This assumes that there's a 1-to-1 relationship between settlementdate in the two dataframes. If there's not, there will be duplicates.

EDIT: To remove column "PERIOD TYPE" do

result = pd.merge(data1[['REGION', 'TOTALDEMA', 'RRP', 'SETTLEMENTDATE']], data2, on='SETTLEMENTDATE')
like image 170
Filipe Avatar answered Apr 26 '23 23:04

Filipe