The nature of pandas DataFrame

Tags:

pandas

As a followup to my question on mixed types in a column:

Can I think of a DataFrame as a list of columns or is it a list of rows?

In the former case, it means that (optimally) each column has to be homogeneous (type-wise) and different columns can be of different types. The latter case, suggests that each row is type-wise homogeneous.

For the documentation:

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

This implies that a DataFrame is a list of columns.

Does it mean that appending a row to a DataFrame is more expensive than appending a column?

596

asked Dec 09 '14 08:12

Dror

1 Answers

You are fully correct that a DataFrame can be seen as a list of columns, or even more a (ordered) dictionary of columns (see explanation here).

Indeed, each column has to be homogeneous of type, and different columns can be of different types. But by using the object dtype you can still hold different types of objects in one column (although not recommended apart for eg strings).
To illustrate, if you ask the data types of a DataFrame, you get the dtype for each column:

In [2]: df = pd.DataFrame({'int_col':[0,1,2], 'float_col':[0.0,1.1,2.5], 'bool_col':[True, False, True]})

In [3]: df.dtypes
Out[3]:
bool_col        bool
float_col    float64
int_col        int64
dtype: object

Internally, the values are stored as blocks of the same type. Each column, or collection of columns of the same type is stored in a separate array.

And this indeed implies that appending a row is more expensive. In general, appending multiple single rows is not a good idea: better to eg preallocate an empty dataframe to fill, or put the new rows/columns in a list and concat them all at once.
See the note at the end of the concat/append docs (just before the first subsection "Set logic on the other axes").

answered Oct 26 '22 19:10

joris

Related questions
                            
                                "decoder jpeg not available" with Pillow on AWS Elastic Beanstalk
                            
                                Pandas DataFrame datetime index doesn't survive JSON conversion and reconversion
                            
                                How to rename ForeignKey set field in Django Rest Framework
                            
                                Monitoring JSON wire protocol logs
                            
                                Sklearn and GridSearchCV - Is it expected to return optimal parameters?
                            
                                Python Flask mod-wsgi Custom Headers not in Request
                            
                                efficient numpy.roll before numpy.sum() or mean()
                            
                                What is being pickled when I call multiprocessing.Process?
                            
                                Kivy and buildozer "Permission denied"
                            
                                Invalid command name while executing ("after" script)
                            
                                ImportError: No module named Qsci while running ninja-ide
                            
                                How to handle CLI subcommands with argparse
                            
                                Convert Pandas DataFrame to JSON as element of larger data structure
                            
                                Adding errorbars to 3D plot in matplotlib
                            
                                Mocking Directory Structure in Python
                            
                                Flask: how to send a dynamically generate zipfile to the client
                            
                                Determinant using sympy
                            
                                App Engine Unit Testing: ImportError: Start directory is not importable
                            
                                Marshal unserialization - not secure
                            
                                PyQt5 and Matplotlib 1.4.2 - installing one breaks the other

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With