I am practicing Pandas and have the following task:
Create a list whose elements are the # of columns of each .csv file
.csv files are stored in the dictionary directory
keyed by year
I use a dictionary comprehension dataframes
(again keyed by year) to store the .csv files as pandas dataframes
directory = {2009: 'path_to_file/data_2009.csv', ... , 2018: 'path_to_file/data_2018.csv'}
dataframes = {year: pandas.read_csv(file) for year, file in directory.items()}
# My Approach 1
columns = [df.shape[1] for year, df in dataframes.items()]
# My Approach 2
columns = [dataframes[year].shape[1] for year in dataframes]
Which way is more "Pythonic"? Or is there a better way to approach this?
Your method will get it done... but I don't like reading in the entire file and creating a dataframe just to count the columns. You could do the same thing by just reading the first line of each file and counting the number of commas. Notice that I add 1
because there will always be one less comma than there are columns.
columns = [open(f).readline().count(',') + 1 for _, f in directory.items()]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With