I came across this dataset:
http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data
and I couldn't find a simple way of getting this into a Pandas Dataframe. I manually parsed this into a list of lists and then called the Dataframe constructor, but is there an easier way of doing this. Thanks!
pandas provides the read_csv() function to read data stored as a csv file into a pandas DataFrame . pandas supports many different file formats or data sources out of the box (csv, excel, sql, json, parquet, …), each of them with the prefix read_* .
The results show that apply massively outperforms iterrows . As mentioned previously, this is because apply is optimized for looping through dataframe rows much quicker than iterrows does. While slower than apply , itertuples is quicker than iterrows , so if looping is required, try implementing itertuples instead.
The reason iterrows() is slower than itertuples() is due to iterrows() doing a lot of type checks in the lifetime of its call.
Try using pandas.read_fwf
and specify a list of column widths (including whitespace):
In [35]: url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
In [36]: widths = [7, 4, 10, 10, 11, 7, 4, 4, 30]
In [37]: df = pd.read_fwf(url, widths=widths, header=None, na_values=['?'])
In [38]: df.irow(0)
Out[38]:
X0 18
X1 8
X2 307
X3 130
X4 3504
X5 12
X6 70
X7 1
X8 "chevrolet chevelle malibu"
Name: 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With