Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas reading csv files with partial wildcard

Tags:

python

pandas

I'm trying to write a script that imports a file, then does something with the file and outputs the result into another file.

df = pd.read_csv('somefile2018.csv')

The above code works perfectly fine. However, I'd like to avoid hardcoding the file name in the code.

The script will be run in a folder (directory) that contains the script.py and several csv files.

I've tried the following:

somefile_path = glob.glob('somefile*.csv')

df = pd.read_csv(somefile_path)

But I get the following error:

ValueError: Invalid file path or buffer object type: <class 'list'>

like image 760
Kvothe Avatar asked Apr 18 '18 11:04

Kvothe


3 Answers

glob returns a list, not a string. The read_csv function takes a string as the input to find the file. Try this:

for f in glob('somefile*.csv'):
    df = pd.read_csv(f)
    ...
    # the rest of your script
like image 146
James Avatar answered Nov 02 '22 21:11

James


To read all of the files that follow a certain pattern, so long as they share the same schema, use this function:

import glob
import pandas as pd

def pd_read_pattern(pattern):
    files = glob.glob(pattern)

    df = pd.DataFrame()
    for f in files:
        df = df.append(pd.read_csv(f))

    return df.reset_index(drop=True)

df = pd_read_pattern('somefile*.csv')

This will work with either an absolute or relative path.

like image 8
pleicht17 Avatar answered Nov 02 '22 21:11

pleicht17


You can get the list of the CSV files in the script and loop over them.

from os import listdir
from os.path import isfile, join
mypath = os.getcwd()

csvfiles = [f for f in listdir(mypath) if isfile(join(mypath, f)) if '.csv' in f]

for f in csvfiles:
    pd.read_csv(f)
# the rest of your script
like image 3
iDrwish Avatar answered Nov 02 '22 19:11

iDrwish