What is the difference between `str` and `object` data types in `pandas.read_csv`?

Question

According to the pandas documentation, pandas.read_csv allows me to specify a dtype for the columns in the CSV file.

dtype : Type name or dict of column -> type, default None Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} (Unsupported with engine=’python’). Use str or object to preserve and not interpret dtype.

To treat every column as text data, I can use either

df = pandas.read_csv(... , dtype=str)

or

df = pandas.read_csv(..., dtype=object)

As far as I know, these two methods always behave exactly the same. Are there any situations in which these two methods behave differently? If so, what are the differences?

miradulo · Accepted Answer

These had a subtle difference, until release 0.11.1 (see issue #3795).

Every element in a numpy array must have the same size in bytes. The issue with strings is that their size in bytes is not fixed, hence the object dtype allows pointers to strings which do have a fixed byte size. So in short, str has a special fixed width for each item, whereas object allows variable string length, or really any object.

In any case, since release 0.11.1 there is an auto-conversion from dtype=str to dtype=object whenever it is seen, so it does not matter what you use, although I would advise avoiding str altogether and just use dtype=object.

What is the difference between `str` and `object` data types in `pandas.read_csv`?

Tags:

python

python-3.x

pandas

DGrady

1 Answers

miradulo

Recent Activity

Donate For Us

What is the difference between `str` and `object` data types in `pandas.read_csv`?

Tags:

python

python-3.x

pandas

DGrady

1 Answers

miradulo

Related questions

Recent Activity

Donate For Us