For pandas, would anyone know, if any datatype apart from (i) <code>float64</code>, <code>int64</code> (and other variants of <code>np.number</code> like <code>float32</code>, <code>int8</code> etc.) (ii) <code>bool</code> (iii) <code>datetime64</code>, <code>timedelta64</code> such as string columns, always have a <code>dtype</code> of <code>object</code> ? Alternatively, I want to know, if there are any datatype apart from (i), (ii) and (iii) in the list above that <code>pandas</code> does not make it's <code>dtype</code> an <code>object</code>?

EDIT Feb 2020 following pandas 1.0.0 release Pandas mostly uses NumPy arrays and dtypes for each Series (a dataframe is a collection of Series, each which can have its own dtype). NumPy's documentation further explains dtype, data types, and data type objects. In addition, the answer provided by @lcameron05 provides an excellent description of the numpy dtypes. Furthermore, the pandas docs on dtypes have a lot of additional information. <blockquote> The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32. </blockquote> <blockquote> By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit). The following will all result in int64 dtypes. Numpy, however will choose platform-dependent types when creating arrays. The following WILL result in int32 on 32-bit platform. One of the major changes to version 1.0.0 of pandas is the introduction of <code>pd.NA</code> to represent scalar missing values (rather than the previous values of <code>np.nan</code>, <code>pd.NaT</code> or <code>None</code>, depending on usage). </blockquote> Pandas extends NumPy's type system and also allows users to write their on extension types. The following lists all of pandas extension types. 1) Time zone handling Kind of data: tz-aware datetime (note that NumPy does not support timezone-aware datetimes). Data type: DatetimeTZDtype Scalar: Timestamp Array: arrays.DatetimeArray String Aliases: 'datetime64[ns, ]' 2) Categorical data Kind of data: Categorical Data type: CategoricalDtype Scalar: (none) Array: Categorical String Aliases: 'category' 3) Time span representation Kind of data: period (time spans) Data type: PeriodDtype Scalar: Period Array: arrays.PeriodArray String Aliases: 'period[]', 'Period[]' 4) Sparse data structures Kind of data: sparse Data type: SparseDtype Scalar: (none) Array: arrays.SparseArray String Aliases: 'Sparse', 'Sparse[int]', 'Sparse[float]' 5) IntervalIndex Kind of data: intervals Data type: IntervalDtype Scalar: Interval Array: arrays.IntervalArray String Aliases: 'interval', 'Interval', 'Interval[<numpy_dtype>]', 'Interval[datetime64[ns, ]]', 'Interval[timedelta64[]]' 6) Nullable integer data type Kind of data: nullable integer Data type: Int64Dtype, ... Scalar: (none) Array: arrays.IntegerArray String Aliases: 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64' 7) Working with text data Kind of data: Strings Data type: StringDtype Scalar: str Array: arrays.StringArray String Aliases: 'string' 8) Boolean data with missing values Kind of data: Boolean (with NA) Data type: BooleanDtype Scalar: bool Array: arrays.BooleanArray String Aliases: 'boolean'

what are all the dtypes that pandas recognizes?

2 Answers

pandas borrows its dtypes from numpy. For demonstration of this see the following:

import pandas as pd  df = pd.DataFrame({'A': [1,'C',2.]}) df['A'].dtype  >>> dtype('O')  type(df['A'].dtype)  >>> numpy.dtype

You can find the list of valid numpy.dtypes in the documentation:

'?' boolean

'b' (signed) byte

'B' unsigned byte

'i' (signed) integer

'u' unsigned integer

'f' floating-point

'c' complex-floating point

'm' timedelta

'M' datetime

'O' (Python) objects

'S', 'a' zero-terminated bytes (not recommended)

'U' Unicode string

'V' raw data (void)

pandas should support these types. Using the astype method of a pandas.Series object with any of the above options as the input argument will result in pandas trying to convert the Series to that type (or at the very least falling back to object type); 'u' is the only one that I see pandas not understanding at all:

df['A'].astype('u')  >>> TypeError: data type "u" not understood

This is a numpy error that results because the 'u' needs to be followed by a number specifying the number of bytes per item in (which needs to be valid):

import numpy as np  np.dtype('u')  >>> TypeError: data type "u" not understood  np.dtype('u1')  >>> dtype('uint8')  np.dtype('u2')  >>> dtype('uint16')  np.dtype('u4')  >>> dtype('uint32')  np.dtype('u8')  >>> dtype('uint64')  # testing another invalid argument np.dtype('u3')  >>> TypeError: data type "u3" not understood

To summarise, the astype methods of pandas objects will try and do something sensible with any argument that is valid for numpy.dtype. Note that numpy.dtype('f') is the same as numpy.dtype('float32') and numpy.dtype('f8') is the same as numpy.dtype('float64') etc. Same goes for passing the arguments to pandas astype methods.

To locate the respective data type classes in NumPy, the Pandas docs recommends this:

def subdtypes(dtype):     subs = dtype.__subclasses__()     if not subs:         return dtype     return [dtype, [subdtypes(dt) for dt in subs]]  subdtypes(np.generic)

Output:

[numpy.generic,  [[numpy.number,    [[numpy.integer,      [[numpy.signedinteger,        [numpy.int8,         numpy.int16,         numpy.int32,         numpy.int64,         numpy.int64,         numpy.timedelta64]],       [numpy.unsignedinteger,        [numpy.uint8,         numpy.uint16,         numpy.uint32,         numpy.uint64,         numpy.uint64]]]],     [numpy.inexact,      [[numpy.floating,        [numpy.float16, numpy.float32, numpy.float64, numpy.float128]],       [numpy.complexfloating,        [numpy.complex64, numpy.complex128, numpy.complex256]]]]]],   [numpy.flexible,    [[numpy.character, [numpy.bytes_, numpy.str_]],     [numpy.void, [numpy.record]]]],   numpy.bool_,   numpy.datetime64,   numpy.object_]]

Pandas accepts these classes as valid types. For example, dtype={'A': np.float}.

NumPy docs contain more details and a chart:

dtypes

answered Oct 02 '22 15:10

lcameron05

EDIT Feb 2020 following pandas 1.0.0 release

Pandas mostly uses NumPy arrays and dtypes for each Series (a dataframe is a collection of Series, each which can have its own dtype). NumPy's documentation further explains dtype, data types, and data type objects. In addition, the answer provided by @lcameron05 provides an excellent description of the numpy dtypes. Furthermore, the pandas docs on dtypes have a lot of additional information.

The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32.

By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit). The following will all result in int64 dtypes.

Numpy, however will choose platform-dependent types when creating arrays. The following WILL result in int32 on 32-bit platform. One of the major changes to version 1.0.0 of pandas is the introduction of pd.NA to represent scalar missing values (rather than the previous values of np.nan, pd.NaT or None, depending on usage).

Pandas extends NumPy's type system and also allows users to write their on extension types. The following lists all of pandas extension types.

1) Time zone handling

Kind of data: tz-aware datetime (note that NumPy does not support timezone-aware datetimes).

Data type: DatetimeTZDtype

Scalar: Timestamp

Array: arrays.DatetimeArray

String Aliases: 'datetime64[ns, ]'

2) Categorical data

Kind of data: Categorical

Data type: CategoricalDtype

Scalar: (none)

Array: Categorical

String Aliases: 'category'

3) Time span representation

Kind of data: period (time spans)

Data type: PeriodDtype

Scalar: Period

Array: arrays.PeriodArray

String Aliases: 'period[]', 'Period[]'

4) Sparse data structures

Kind of data: sparse

Data type: SparseDtype

Scalar: (none)

Array: arrays.SparseArray

String Aliases: 'Sparse', 'Sparse[int]', 'Sparse[float]'

5) IntervalIndex

Kind of data: intervals

Data type: IntervalDtype

Scalar: Interval

Array: arrays.IntervalArray

String Aliases: 'interval', 'Interval', 'Interval[<numpy_dtype>]', 'Interval[datetime64[ns, ]]', 'Interval[timedelta64[]]'

6) Nullable integer data type

Kind of data: nullable integer

Data type: Int64Dtype, ...

Scalar: (none)

Array: arrays.IntegerArray

String Aliases: 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64'

7) Working with text data

Kind of data: Strings

Data type: StringDtype

Scalar: str

Array: arrays.StringArray

String Aliases: 'string'

8) Boolean data with missing values

Kind of data: Boolean (with NA)

Data type: BooleanDtype

Scalar: bool

Array: arrays.BooleanArray

String Aliases: 'boolean'

answered Oct 02 '22 16:10

Alexander

Related questions
                            
                                How to sort alpha numeric set in python
                            
                                Ordinal numbers replacement
                            
                                How to save a list to a file and read it as a list type?
                            
                                move column in pandas dataframe
                            
                                Mapping a range of values to another
                            
                                How to left align a fixed width string?
                            
                                Stopword removal with NLTK
                            
                                Error installing Python Image Library using pip on Mac OS X 10.9
                            
                                Convert ConfigParser.items('') to dictionary
                            
                                Python db-api: fetchone vs fetchmany vs fetchall
                            
                                Differences and uses between WSGI, CGI, FastCGI, and mod_python in regards to Python?
                            
                                Is there any difference between using ABC vs ABCMeta?
                            
                                Is virtualenv recommended for django production server? [closed]
                            
                                How to dynamically change base class of instances at runtime?
                            
                                Does JavaScript support array/list comprehensions like Python?
                            
                                Why would I put code in __init__.py files?
                            
                                How do I type a floating point infinity literal in python
                            
                                Why is there no first(iterable) built-in function in Python?
                            
                                How to test or mock "if __name__ == '__main__'" contents
                            
                                module has no attribute

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

what are all the dtypes that pandas recognizes?

Tags:

python

python-3.x

pandas

uday

People also ask

2 Answers

lcameron05

Alexander

Recent Activity

Donate For Us