Logo Questions Linux Laravel Mysql Ubuntu Git Menu

What's the easiest way of getting this data into a Pandas Dataframe?



I came across this dataset:


and I couldn't find a simple way of getting this into a Pandas Dataframe. I manually parsed this into a list of lists and then called the Dataframe constructor, but is there an easier way of doing this. Thanks!

like image 323
vgoklani Avatar asked Nov 06 '12 03:11


People also ask

Which is the best way to get data in pandas?

pandas provides the read_csv() function to read data stored as a csv file into a pandas DataFrame . pandas supports many different file formats or data sources out of the box (csv, excel, sql, json, parquet, …), each of them with the prefix read_* .

Is apply faster than Iterrows?

The results show that apply massively outperforms iterrows . As mentioned previously, this is because apply is optimized for looping through dataframe rows much quicker than iterrows does. While slower than apply , itertuples is quicker than iterrows , so if looping is required, try implementing itertuples instead.

Why is Itertuples faster than Iterrows?

The reason iterrows() is slower than itertuples() is due to iterrows() doing a lot of type checks in the lifetime of its call.

1 Answers

Try using pandas.read_fwf and specify a list of column widths (including whitespace):

In [35]: url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'

In [36]: widths = [7, 4, 10, 10, 11, 7, 4, 4, 30]

In [37]: df = pd.read_fwf(url, widths=widths, header=None, na_values=['?'])

In [38]: df.irow(0)
X0                              18
X1                               8
X2                             307
X3                             130
X4                            3504
X5                              12
X6                              70
X7                               1
X8    "chevrolet chevelle malibu"

Name: 0
like image 54
Chang She Avatar answered Oct 08 '22 11:10

Chang She