I have a script that current reads raw data from a .csv file and performs some pandas data analysis against the data. Currently the .csv file is hardcoded and is read in like this:
data = pd.read_csv('test.csv',sep="|", names=col)
I want to change 2 things:
I want to turn this into a loop so it loops through a directory of .csv files and executes the pandas analysis below each one in the script.
I want to take each .csv file and strip the '.csv' and store that in a another list variable, let's call it 'new_table_list'.
I think I need something like below, at least for the 1st point(though I know this isn't completely correct). I am not sure how to address the 2nd point
Any help is appreciated
import os
path = '\test\test\csvfiles'
table_list = []
for filename in os.listdir(path):
if filename.endswith('.csv'):
table_list.append(file)
data = pd.read_csv(table_list,sep="|", names=col)
In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument. This string can be any valid path, including URLs. The parameter index_col specifies the column from the CSV file that contains the row labels.
index_col: int, str, sequence of int/str, or False, (Default None) Column(s) to use as the row labels of the DataFrame, either given as string name or column index. If a sequence of int/str is given, a MultiIndex is used. usecols: list-like or callable Return a subset of the columns.
Many ways to do it
for filename in os.listdir(path):
if filename.endswith('.csv'):
table_list.append(pd.read_csv(filename,sep="|"))
new_table_list.append(filename.split(".")[0])
One more
for filename in os.listdir(path):
if filename.endswith('.csv'):
table_list.append(pd.read_csv(filename,sep="|"))
new_table_list.append(filename[:-4])
and many more
As @barmar pointed out, better to append path as well to the table_list
to avoid any issues related to path and location of files and script.
You can try something like this:
import glob
data = {}
for filename in glob.glob('/path/to/csvfiles/*.csv'):
data[filename[:-4]] = pd.read_csv(filename, sep="|", names=col)
Then data.keys()
is the list of filenames without the ".csv" part and data.values()
is a list with one pandas dataframe for each file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With