I have a directory containing many csv files which I have loaded into a dictionary of dataframes
So, just 3 sample small csv files to illustrate
import os
import csv
import pandas as pd
#create 3 small csv files for test purposes
os.chdir('c:/test')
with open('dat1990.csv','w',newline='') as fp:
a=csv.writer(fp,delimiter=',')
data = [['Stock','Sales','Year'],
['100','24','1990'],
['120','33','1990'],
['23','5','1990']]
a.writerows(data)
with open('dat1991.csv','w',newline='') as fp:
a=csv.writer(fp,delimiter=',')
data = [['Stock','Sales','Year'],
['400','35','1991'],
['450','55','1991'],
['34','6','1991']]
a.writerows(data)
with open('other1991.csv','w',newline='') as fp:
a=csv.writer(fp,delimiter=',')
data = [['Stock','Sales','Year'],
['500','56','1991'],
['600','44','1991'],
['56','55','1991']]
a.writerows(data)
create a dictionary for processing the csv files into dataframes
dfcsv_dict = {'dat1990': 'dat1990.csv', 'dat1991': 'dat1991.csv',
'other1991': 'other1991.csv'}
create a simple import function for importing csv to pandas
def myimport(csvfile):
return pd.read_csv(csvfile)
iterate through the dictionary to import all csv files into pandas dataframes
df_dict = {}
for k, v in dfcsv_dict.items():
df_dict[k] = myimport(v)
Given I now may have thousands of dataframes within the unified dictionary object, how can I select a few and "extract" them out of the dictionary?
So for example, how would I extract just two of these three dataframes nested in the dictionary, something like
dat1990 = df_dict['dat1990']
dat1991 = df_dict['dat1991']
but without using literal assignments. Maybe some sort of looping structure over the dictionary, hopefully with a means to select a subgroup based on a string sequence in the dictionary key: eg all dataframes named dat or 1991 etc
I don't want another "sub dictionary" but want to extract them as named "standalone" dataframes as the above code illustrates.
I am using python 3.5.
This is an old question from Jan 2016 but since no one answered, here is an answer from Oct 2019. Might be useful for future reference.
I think you can skip the step of creating a dictionary of dataframes. I previously wrote an answer on how to create a single master dataframe from multiple CSV files, and adding a column in the master dataframe with a string extracted from the CSV filename. I think you could essentially do the same thing here.
Create a dataframe of csv files based on timestamp intervals
Steps:
import pandas as pd
import os
# Step 1: create a path to the folder, syntax for Windows OS
path_test_folder = 'C:\\test\\'
# Step 2: create a list of CSV files in the folder
files_in_folder = os.listdir(path_test_folder)
files_in_folder = [x for x in files_in_folder if '.csv' in x]
# Step 3: create empty master dataframe to store CSV files
df_master = pd.DataFrame()
# Step 4: loop through the files in folder
for each_csv in files_in_folder:
# temporary dataframe for the CSV
path_csv = os.path.join(path_test_folder, each_csv)
temp_df = pd.read_csv(path_csv)
# add folder with filename
temp_df['str_filename'] = str(each_csv)
# combine into master dataframe
df_master = pd.concat([df_master, temp_df])
# then filter on your filenames
mask_filter = df_master['str_filename'].isin(['dat1990.csv', 'dat1991.csv'])
df_filter = df_master.loc[mask_filter]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With