Load all csv/txt files from one directory and merge them via python

Question

I have a folder which contains hundreds (possibly over 1 k) of csv data files, of chronological data. Ideally this data would be in one csv, so that I can analyse it all in one go. What I would like to know is, is there a way to append all the files to one another using python.

My files exist in folder locations like so:

C:\Users\folder\Database Files\1st September
C:\Users\folder\Database Files\1st October
C:\Users\folder\Database Files\1st November
C:\Users\folder\Database Files\1st December
etc

Inside each of the folders there is 3 csv (I am using the term csv loosly since these files are actually saved as .txt files containing values seperated by pipes |)

Lets say these files are called:

MonthNamOne.txt
MonthNamTwo.txt
MonthNameOneTwoMurged.txt

How would I, or even is it possible to code something to go through all of these folders in this directory and then merge together all the OneTwoMurged.txt files?

Dennis Sylvian · Accepted Answer

For all files in folder with .csv suffix

import glob
import os

filelist = []

os.chdir("folderwithcsvs/")
for counter, files in enumerate(glob.glob("*.csv")):
    filelist.append(files)
    print "do stuff with file:", files, counter

print filelist

for fileitem in filelist:
    print fileitem

Obviously the "do stuff part" depends on what you want done with the files, this is looking getting your list of files.

If you want to do something with the files on a monthly basis then you could use datetime and create possible months, same for days or yearly data.

For example, for monthly files with the names Month Year.csv it would look for each file.

import subprocess, datetime, os

start_year, start_month = "2001", "January"

current_month = datetime.date.today().replace(day=1)
possible_month = datetime.datetime.strptime('%s %s' % (start_month, start_year), '%B %Y').date()
while possible_month <= current_month:
    csv_filename = possible_month.strftime('%B %Y') + '.csv'
    month = possible_month.strftime('%B %Y').split(" ")[0]
    year = possible_month.strftime('%B %Y').split(" ")[1]
    if os.path.exists("folder/" + csv_filename):
        print csv_filename
    possible_month = (possible_month + datetime.timedelta(days=31)).replace(day=1)

Obviously you can change that to however you feel fit, let me know if you need more or if this suffices.

Mark Tolonen · Answer

This will recursively process a directory, match a specific file pattern for processing, and append the results of processed files. This will parse the csvs as well, so you could do individual line analysis and processing as well. Modify as needed :)

#!python2
import os
import fnmatch
import csv
from datetime import datetime as dt

# Open result file
with open('output.txt','wb') as fout:
    wout = csv.writer(fout,delimiter='|')

    # Recursively process a directory
    for path,dirs,files in os.walk('files'):

        # Sort directories for processing.
        # In this case, sorting directories named "Month Year" chronologically.
        dirs.sort(key=lambda d: dt.strptime(d,'%B %Y'))
        interesting_files = fnmatch.filter(files,'*.txt')

        # Example for sorting filenames with a custom chronological sort "Month Year.txt"
        for filename in sorted(interesting_files,key=lambda f: dt.strptime(f,'%B %Y.txt')):

            # Generate the full path to the file.
            fullname = os.path.join(path,filename)
            print 'Processing',fullname

            # Open and process file
            with open(fullname,'rb') as fin:
                for line in csv.reader(fin,delimiter='|'):
                    wout.writerow(line)

muon · Answer

Reading into pandas dataframe (choice of axis depends on your application), my example adds columns of same length

import glob
import pandas as pd


df=pd.DataFrame()
for files in glob.glob("*.csv"):
    print files 
    df = pd.concat([df,pd.read_csv(files).iloc[:,1:]],axis=1)

axis = 0 would add row-wise

Load all csv/txt files from one directory and merge them via python

Tags:

python

file

directory

csv

python-2.7

AEA

3 Answers

Dennis Sylvian

Mark Tolonen

muon

Recent Activity

Donate For Us

Load all csv/txt files from one directory and merge them via python

Tags:

python

file

directory

csv

python-2.7

AEA

3 Answers

Dennis Sylvian

Mark Tolonen

muon

Related questions

Recent Activity

Donate For Us