I have a script that current reads raw data from a .csv file and performs some pandas data analysis against the data. Currently the .csv file is hardcoded and is read in like this: <pre class="prettyprint"><code>data = pd.read_csv('test.csv',sep="|", names=col) </code></pre> I want to change 2 things: <ol> <li>I want to turn this into a loop so it loops through a directory of .csv files and executes the pandas analysis below each one in the script.</li> <li>I want to take each .csv file and strip the '.csv' and store that in a another list variable, let's call it 'new_table_list'. </li> </ol> I think I need something like below, at least for the 1st point(though I know this isn't completely correct). I am not sure how to address the 2nd point Any help is appreciated <pre class="prettyprint"><code>import os path = '\test\test\csvfiles' table_list = [] for filename in os.listdir(path): if filename.endswith('.csv'): table_list.append(file) data = pd.read_csv(table_list,sep="|", names=col) </code></pre>

You can try something like this: <pre class="prettyprint"><code>import glob data = {} for filename in glob.glob('/path/to/csvfiles/*.csv'): data[filename[:-4]] = pd.read_csv(filename, sep="|", names=col) </code></pre> Then <code>data.keys()</code> is the list of filenames without the ".csv" part and <code>data.values()</code> is a list with one pandas dataframe for each file.

Extract file name from read_csv - Python

Tags:

python

string

pandas

I have a script that current reads raw data from a .csv file and performs some pandas data analysis against the data. Currently the .csv file is hardcoded and is read in like this:

data = pd.read_csv('test.csv',sep="|", names=col)

I want to change 2 things:

I want to turn this into a loop so it loops through a directory of .csv files and executes the pandas analysis below each one in the script.
I want to take each .csv file and strip the '.csv' and store that in a another list variable, let's call it 'new_table_list'.

I think I need something like below, at least for the 1st point(though I know this isn't completely correct). I am not sure how to address the 2nd point

Any help is appreciated

import os 

path = '\test\test\csvfiles'
table_list = []

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(file)
data = pd.read_csv(table_list,sep="|", names=col)

847

asked May 14 '18 19:05

JD2775

2 Answers

Many ways to do it

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename.split(".")[0])

One more

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename[:-4])

and many more

As @barmar pointed out, better to append path as well to the table_list to avoid any issues related to path and location of files and script.

answered Sep 18 '22 07:09

Yuvraj Jaiswal

You can try something like this:

import glob

data = {}
for filename in glob.glob('/path/to/csvfiles/*.csv'):
    data[filename[:-4]] = pd.read_csv(filename, sep="|", names=col)

Then data.keys() is the list of filenames without the ".csv" part and data.values() is a list with one pandas dataframe for each file.

answered Sep 20 '22 07:09

Paulo Scardine

Related questions
                            
                                How to delete a specific message by ID using discord.py
                            
                                Use hidden states instead of outputs in LSTMs of keras
                            
                                TensorFlow - object detection module, error appear when trying to use protoc
                            
                                Three sum algorithm solution
                            
                                Pandas select n middle rows
                            
                                How to create new values in a pandas dataframe column based on values from another column
                            
                                dask.multiprocessing or pandas + multiprocessing.pool: what's the difference?
                            
                                get feature names of SelectKBest function python
                            
                                How to prevent cached response (flask server, using chrome)
                            
                                google.api_core.exceptions.Forbidden: 403 Missing or insufficient permissions
                            
                                Pandas split and select the second element
                            
                                Print 2 lists side by side
                            
                                No module named 'bokeh.plotting'; bokeh is not a package
                            
                                How do I mock class instance attributes?
                            
                                Pandas create date range at certain dates
                            
                                Python Script to Convert CSV to GeoJSON
                            
                                NLTK. Detecting whether a sentence is Interogative or Not?
                            
                                How to install tesseract for python on anaconda
                            
                                Using for loop to define multiple functions - Python
                            
                                How to fix "polyfit maybe poorly conditioned" in numpy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With