I have around 600 csv file datasets, all have the very same column names [‘DateTime’, ‘Actual’, ‘Consensus’, ‘Previous’, ‘Revised’], all economic indicators and all-time series data sets. the aim is to merge them all together in one csv file. With ‘DateTime’ as an index. The way I wanted this file to indexed in is the time line way which means let’s say the first event in the first csv dated in 12/18/2017 10:00:00 and first event in the second csv dated in 12/29/2017 09:00:00 and first event in the third csv dated in 12/20/2017 09:00:00. So, I want to index them the later first and the newer after it, etc. despite the source csv it originally from. I tried to merge just 3 of them as an experiment and the problem is the ‘DateTime’ because it prints the 3 of them together like this ('12/18/2017 10:00:00', '12/29/2017 09:00:00', '12/20/2017 09:00:00') Here is the code: <pre class="prettyprint"><code>import pandas as pd df1 = pd.read_csv("E:\Business\Economic Indicators\Consumer Price Index - Core (YoY) - European Monetary Union.csv") df2 = pd.read_csv("E:\Business\Economic Indicators\Private loans (YoY) - European Monetary Union.csv") df3 = pd.read_csv("E:\Business\Economic Indicators\Current Account s.a - European Monetary Union.csv") df = pd.concat([df1, df2, df3], axis=1, join='inner') df.set_index('DateTime', inplace=True) print(df.head()) df.to_csv('df.csv') </code></pre>

You're trying to build one large dataframe out of the rows of many dataframes who all have the same column names. <code>axis</code> should be 0 (the default), not 1. Also you don't need to specify a type of join. This will have no effect since the column names are the same for each dataframe. <pre class="prettyprint"><code>df = pd.concat([df1, df2, df3]) </code></pre> should be enough in order to concatenate the datasets. (see https://pandas.pydata.org/pandas-docs/stable/merging.html ) Your call to <code>set_index</code> to define an index using the values in the DateTime column should then work.

python pandas merge multiple csv files

Tags:

python

datetime

pandas

csv

I have around 600 csv file datasets, all have the very same column names [‘DateTime’, ‘Actual’, ‘Consensus’, ‘Previous’, ‘Revised’], all economic indicators and all-time series data sets.

the aim is to merge them all together in one csv file.

With ‘DateTime’ as an index.

The way I wanted this file to indexed in is the time line way which means let’s say the first event in the first csv dated in 12/18/2017 10:00:00 and first event in the second csv dated in 12/29/2017 09:00:00 and first event in the third csv dated in 12/20/2017 09:00:00.

So, I want to index them the later first and the newer after it, etc. despite the source csv it originally from.

I tried to merge just 3 of them as an experiment and the problem is the ‘DateTime’ because it prints the 3 of them together like this ('12/18/2017 10:00:00', '12/29/2017 09:00:00', '12/20/2017 09:00:00') Here is the code:

import pandas as pd


df1 = pd.read_csv("E:\Business\Economic Indicators\Consumer Price Index - Core (YoY) - European Monetary Union.csv")
df2 = pd.read_csv("E:\Business\Economic Indicators\Private loans (YoY) - European Monetary Union.csv")
df3 = pd.read_csv("E:\Business\Economic Indicators\Current Account s.a - European Monetary Union.csv")

df = pd.concat([df1, df2, df3], axis=1, join='inner')
df.set_index('DateTime', inplace=True)

print(df.head())
df.to_csv('df.csv')

451

asked Jan 01 '18 15:01

Sayed Gouda

2 Answers

Consider using read_csv() args, index_col and parse_dates, to create indices during import and format as datetime. Then run your needed horizontal merge. Below assumes date is in first column of csv. And at the end use sort_index() on final dataframe to sort the datetimes.

df1 = pd.read_csv(r"E:\Business\Economic Indicators\Consumer Price Index - Core (YoY) - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])
df2 = pd.read_csv(r"E:\Business\Economic Indicators\Private loans (YoY) - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])
df3 = pd.read_csv(r"E:\Business\Economic Indicators\Current Account s.a - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])

finaldf = pd.concat([df1, df2, df3], axis=1, join='inner').sort_index()

And for DRY-er approach especially across the hundreds of csv files, use a list comprehension

import os
...
os.chdir('E:\\Business\\Economic Indicators')

dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0])
        for f in os.listdir(os.getcwd()) if f.endswith('csv')]

finaldf = pd.concat(dfs, axis=1, join='inner').sort_index()

118

answered Sep 22 '22 18:09

Parfait

You're trying to build one large dataframe out of the rows of many dataframes who all have the same column names. axis should be 0 (the default), not 1. Also you don't need to specify a type of join. This will have no effect since the column names are the same for each dataframe.

df = pd.concat([df1, df2, df3])

should be enough in order to concatenate the datasets.

(see https://pandas.pydata.org/pandas-docs/stable/merging.html )

Your call to set_index to define an index using the values in the DateTime column should then work.

answered Sep 21 '22 18:09

John Smith Optional

Related questions
                            
                                How to call django.setup() in console_script?
                            
                                Python restplus API to upload and dowload files
                            
                                Getting {ValueError} 'a' must be 1-dimensoinal for list of lists from np.random.choice
                            
                                TypeError: src data type = 17 is not supported
                            
                                trying to make paths work - attempted relative import beyond top-level package
                            
                                Pandas read_csv speed up
                            
                                How can i convert html to word docx in python?
                            
                                Python thread running twice when called once in main
                            
                                Merge two different dataframes on different column names [duplicate]
                            
                                How to read and write from a COM Port using PySerial?
                            
                                Parallelizing loading data from MongoDB into python
                            
                                What does GCC have to do with a python interpreter?
                            
                                How to cast a typing.Union to one of its subtypes in Python?
                            
                                TypeError: unhashable type: 'slice' for pandas
                            
                                Limit the range of x in seaborn distplot KDE estimation
                            
                                TypeError: strptime() argument 1 must be string, not Series
                            
                                How to create image from a list of pixel values in Python3?
                            
                                pyspark Window.partitionBy vs groupBy
                            
                                Uploading file to AWS S3 through Chalice API call
                            
                                How to use functional programming to iterate and find maximum product of five consecutive numbers in a list?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With