The general use case behind the question is to read multiple CSV log files from a target directory into a single Python Pandas DataFrame for quick turnaround statistical analysis & charting. The idea for utilizing Pandas vs MySQL is to conduct this data import or append + stat analysis periodically throughout the day.
The script below attempts to read all of the CSV (same file layout) files into a single Pandas dataframe & adds a year column associated with each file read.
# Assemble all of the data files into a single DataFrame & add a year field
# 2010 is the last available year
years = range(1880, 2011)
for year in years:
    path ='C:\\Documents and Settings\\Foo\\My Documents\\pydata-book\\pydata-book-master`\\ch02\\names\\yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)
    frame['year'] = year
    pieces.append(frame)
# Concatenates everything into a single Dataframe
names = pd.concat(pieces, ignore_index=True)
# Expected row total should be 1690784
names
<class 'pandas.core.frame.DataFrame'>
Int64Index: 33838 entries, 0 to 33837
Data columns:
name      33838  non-null values
sex       33838  non-null values
births    33838  non-null values
year      33838  non-null values
dtypes: int64(2), object(2)
# Start aggregating the data at the year & gender level using groupby or pivot
total_births = names.pivot_table('births', rows='year', cols='sex', aggfunc=sum)
# Prints pivot table
total_births.tail()
Out[35]:
sex     F   M
year        
2010    1759010     1898382
                In order to read multiple CSV files or all files from a folder in R, use data. table package. data. table is a third-party library hence, in order to use data.
The append method on an instance of a DataFrame does not function the same as the append method on an instance of a list.  Dataframe.append() does not occur in-place and instead returns a new object.
years = range(1880, 2011)
names = pd.DataFrame()
for year in years:
    path ='C:\\Documents and Settings\\Foo\\My Documents\\pydata-book\\pydata-book-master`\\ch02\\names\\yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)
    frame['year'] = year
    names = names.append(frame, ignore_index=True)
or you can use concat:
years = range(1880, 2011)
names = pd.DataFrame()
for year in years:
    path ='C:\\Documents and Settings\\Foo\\My Documents\\pydata-book\\pydata-book-master`\\ch02\\names\\yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)
    frame['year'] = year
    names = pd.concat(names, frame, ignore_index=True)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With