Pandas: import multiple csv files into dataframe using a loop and hierarchical indexing

Tags:

I would like to read multiple CSV files (with a different number of columns) from a target directory into a single Python Pandas DataFrame to efficiently search and extract data.

Example file:

Events 
1,0.32,0.20,0.67
2,0.94,0.19,0.14,0.21,0.94
3,0.32,0.20,0.64,0.32
4,0.87,0.13,0.61,0.54,0.25,0.43 
5,0.62,0.21,0.77,0.44,0.16

Here is what I have so far:

# get a list of all csv files in target directory
my_dir = "C:\\Data\\"
filelist = []
os.chdir( my_dir )
for files in glob.glob( "*.csv" ) :
    filelist.append(files)

# read each csv file into single dataframe and add a filename reference column 
# (i.e. file1, file2, file 3) for each file read
df = pd.DataFrame()
columns = range(1,100)
for c, f in enumerate(filelist) :
    key = "file%i" % c
    frame = pd.read_csv( (my_dir + f), skiprows = 1, index_col=0, names=columns )
    frame['key'] = key
    df = df.append(frame,ignore_index=True)

(the indexing isn't working properly)

Essentially, the script below is exactly what I want (tried and tested) but needs to be looped through 10 or more csv files:

df1 = pd.DataFrame()
df2 = pd.DataFrame()
columns = range(1,100)
df1 = pd.read_csv("C:\\Data\\Currambene_001y09h00m_events.csv", 
                  skiprows = 1, index_col=0, names=columns)
df2 = pd.read_csv("C:\\Data\\Currambene_001y12h00m_events.csv", 
                  skiprows = 1, index_col=0, names=columns)
keys = [('file1'), ('file2')]
df = pd.concat([df1, df2], keys=keys, names=['fileno'])

I have found many related links, however I am still not able to get this to work:

Reading Multiple CSV Files into Python Pandas Dataframe
Merge of multiple data frames of different number of columns into one big data frame
Import multiple csv files into pandas and concatenate into one DataFrame

560

asked Jan 15 '14 22:01

mellover

1 Answers

You need to decide in what axis you want to append your files. Pandas will always try to do the right thing by:

Assuming that each column from each file is different, and appending digits to columns with similar names across files if necessary, so that they don't get mixed;
Items that belong to the same row index across files are placed side by side, under their respective columns.

The trick to appending efficiently is to tip the files sideways, so you get the desired behaviour to match what pandas.concat will be doing. This is my recipe:

from pandas import *
files = !ls *.csv # IPython magic
d = concat([read_csv(f, index_col=0, header=None, axis=1) for f in files], keys=files)

Notice that read_csv is transposed with axis=1, so it will be concatenated on the column axis, preserving its names. If you need, you can transpose the resulting DataFrame back with d.T.

EDIT:

For different number of columns in each source file, you'll need to supply a header. I understand you don't have a header in your source files, so let's create one with a simple function:

def reader(f):
    d = read_csv(f, index_col=0, header=None, axis=1)
    d.columns = range(d.shape[1])
    return d

df = concat([reader(f) for f in files], keys=files)

155

answered Oct 16 '22 16:10

dmvianna

Related questions
                            
                                Force a function parameter type in Python?
                            
                                How to get BeautifulSoup 4 to respect a self-closing tag?
                            
                                Using DictVectorizer with sklearn DecisionTreeClassifier
                            
                                What is the name of this operator “ _ ” in Python?
                            
                                500 error with urllib.request.urlopen
                            
                                _tkinter.TclError: invalid command name ".4302957584"
                            
                                Is it possible to run native python code in chrome?
                            
                                multi-column factorize in pandas
                            
                                pandas convert strings to float for multiple columns in dataframe
                            
                                Slugify string in Django
                            
                                Why does a classmethod's super need a second argument?
                            
                                Checking if path is a socket in Python 2.7
                            
                                Is there a Python ElasticSearch client that supports asynchronous requests?
                            
                                Filter by hour in Pandas
                            
                                Django celery task run at once on startup of celery server
                            
                                Flask views in separate module
                            
                                Parsing Command line arguments in python which has spaces
                            
                                Weird .astimezone behavior
                            
                                Python Using Adblock with Selenium and Firefox Webdriver
                            
                                Python Imports From The Directory Above

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: import multiple csv files into dataframe using a loop and hierarchical indexing

Tags:

python

pandas

csv

hierarchical-data

mellover

People also ask

1 Answers

dmvianna

Recent Activity

Donate For Us