Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe to structured array with Boolean series

I have a Pandas dataframe which I wish to convert to either a NumPy records array or structured array. I am using Python 3.6 / Pandas 0.19.2 / NumPy 1.11.3.

df = pd.DataFrame(data=[[True, 1, 2],[False, 10, 20]], columns=['a','b','c'])

print(df.dtypes)

a     bool
b    int64
c    int64
dtype: object

My attempts are below:

# record array
res1 = df.to_records(index=False)

# structured array
s = df.dtypes
res2 = np.array([tuple(x) for x in df.values], dtype=list(zip(s.index, s)))

However, the Boolean type doesn't seem to be evident in the dtype attribute of these results:

print(res1.dtype)

(numpy.record, [('a', '?'), ('b', '<i8'), ('c', '<i8')])

print(res2.dtype)

[('a', '?'), ('b', '<i8'), ('c', '<i8')]

Why is this? More generically, is this the only exception, or should we have to check manually each time to ensure the dtype conversion has been processed as anticipated?

Edit: On the other hand, it seems the conversion is correct:

print(res1.a.dtype)     # bool
print(res2['a'].dtype)  # bool

So is this just a display issue?

like image 895
jpp Avatar asked Mar 31 '26 01:03

jpp


1 Answers

Curiously, NumPy chooses ? to represent Boolean. From Data type objects (dtype):

'?' boolean
'b' (signed) byte
'B' unsigned byte
'i' (signed) integer
'u' unsigned integer
'f' floating-point
'c' complex-floating point
'm' timedelta
'M' datetime
'O' (Python) objects
'S', 'a'    zero-terminated bytes (not recommended)
'U' Unicode string
'V' raw data (void)

Confusingly, the NumPy Array Interface for access from C extensions uses a different mapping:

t   Bit field (following integer gives the number of bits in the bit field).
b   Boolean (integer type where all values are only True or False)
i   Integer
u   Unsigned integer
f   Floating point
c   Complex floating point
m   Timedelta
M   Datetime
O   Object (i.e. the memory contains a pointer to PyObject)
S   String (fixed-length sequence of char)
U   Unicode (fixed-length sequence of Py_UNICODE)
V   Other (void * – each item is a fixed-size chunk of memory)

Credit to @bobrobbob for finding this in the docs.

like image 158
jpp Avatar answered Apr 02 '26 02:04

jpp



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!