I currently have downloaded 120 files (10 years, month by month) worth of csv data.
I'm using some code below that merges all of these together into one document that goes in time order, e..g from 1/1/09 to 1/1/19.
from glob import glob
files = sorted(glob('*.csv'))
with open('cat.csv', 'w') as fi_out:
for i, fname_in in enumerate(files):
with open(fname_in, 'r') as fi_in:
if i_line > 0 or i == 0:
fi_out.write(line)
This works all fine, however know I have also downloaded the same type of data except for a different product. What I also order all this new data in time order but have it side by side with the old set of data.
I receive an error like so:
Any help would be appreciated.
EDIT1:
Traceback (most recent call last):
File "/Users/myname/Desktop/collate/asdas.py", line 4, in <module>
result = pd.merge(data1[['REGION', 'TOTALDEMAND', 'RRP']], data2, on='SETTLEMENTDATE')
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 61, in merge
validate=validate)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 551, in __init__
self.join_names) = self._get_merge_keys()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 871, in _get_merge_keys
lk, stacklevel=stacklevel))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 1382, in _get_label_or_level_values
raise KeyError(key)
KeyError: 'SETTLEMENTDATE'
EDIT2:
import pandas as pd
df1 = pd.read_csv("product1.csv")
df2 = pd.read_csv("product2.csv")
combine = pd.merge(df1, df2, on='DATE', how='outer')
combine.columns = ['product1_price', 'REGION1', 'DATE', 'product2_price', 'REGION2']
combine[['DATE','product1_price','product2_price']]
combine.to_csv("combine.csv",index=False)
Error:
Traceback (most recent call last):
File "/Users/george/Desktop/collate/asdas.py", line 5, in <module>
combine.columns = ['VICRRP', 'REGION1', 'SETTLEMENTDATE', 'QLD1RRP', 'REGION2']
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 4389, in __setattr__
return object.__setattr__(self, name, value)
File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 646, in _set_axis
self._data.set_axis(axis, labels)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 3323, in set_axis
'values have {new} elements'.format(old=old_len, new=new_len))
ValueError: Length mismatch: Expected axis has 9 elements, new values have 5 elements
Load your data into dataframes
import pandas as pd
data1 = pd.read_csv("filename1.csv")
data2 = pd.read_csv("filename2.csv")
Merge the two dataframes on SETTLEMENTDATE
result = pd.merge(data1, data2, on='SETTLEMENTDATE')
This assumes that there's a 1-to-1 relationship between settlementdate
in the two dataframes. If there's not, there will be duplicates.
EDIT: To remove column "PERIOD TYPE" do
result = pd.merge(data1[['REGION', 'TOTALDEMA', 'RRP', 'SETTLEMENTDATE']], data2, on='SETTLEMENTDATE')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With