Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract file name from read_csv - Python

I have a script that current reads raw data from a .csv file and performs some pandas data analysis against the data. Currently the .csv file is hardcoded and is read in like this:

data = pd.read_csv('test.csv',sep="|", names=col)

I want to change 2 things:

  1. I want to turn this into a loop so it loops through a directory of .csv files and executes the pandas analysis below each one in the script.

  2. I want to take each .csv file and strip the '.csv' and store that in a another list variable, let's call it 'new_table_list'.

I think I need something like below, at least for the 1st point(though I know this isn't completely correct). I am not sure how to address the 2nd point

Any help is appreciated

import os 

path = '\test\test\csvfiles'
table_list = []

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(file)
data = pd.read_csv(table_list,sep="|", names=col)
like image 847
JD2775 Avatar asked May 14 '18 19:05

JD2775


People also ask

What does read_csv return in Python?

In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument. This string can be any valid path, including URLs. The parameter index_col specifies the column from the CSV file that contains the row labels.

What is index_col in read_csv?

index_col: int, str, sequence of int/str, or False, (Default None) Column(s) to use as the row labels of the DataFrame, either given as string name or column index. If a sequence of int/str is given, a MultiIndex is used. usecols: list-like or callable Return a subset of the columns.


2 Answers

Many ways to do it

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename.split(".")[0])

One more

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename[:-4])

and many more

As @barmar pointed out, better to append path as well to the table_list to avoid any issues related to path and location of files and script.

like image 51
Yuvraj Jaiswal Avatar answered Sep 18 '22 07:09

Yuvraj Jaiswal


You can try something like this:

import glob

data = {}
for filename in glob.glob('/path/to/csvfiles/*.csv'):
    data[filename[:-4]] = pd.read_csv(filename, sep="|", names=col)

Then data.keys() is the list of filenames without the ".csv" part and data.values() is a list with one pandas dataframe for each file.

like image 33
Paulo Scardine Avatar answered Sep 20 '22 07:09

Paulo Scardine