I have the following code
import numpy as np
import pandas as pd
test_array = np.array([(1, 2, 3), (4, 5, 6)],
dtype={'names': ('a', 'b', 'c'), 'formats': ('f8', 'f8', 'f8')})
test_df = pd.DataFrame.from_records(test_array)
test_df.to_records().view(np.float64).reshape(test_array.shape + (-1, ))
I expect a view on the original test_array
to be returned, with shape (2, 3)
, however, I get this (2, 4)
array.
rec.array([[0.e+000, 1.e+000, 2.e+000, 3.e+000],
[5.e-324, 4.e+000, 5.e+000, 6.e+000]],
dtype=float64)
Where did the extra column, column 0, come from?
Edit: I've just learned I can use DataFrame.values()
to do the same thing, but I remain curious why this behavior exists.
If you need a record array, use np.rec.fromrecords
:
np.rec.fromrecords(test_df, names=[*test_df])
# rec.array([(1., 2., 3.), (4., 5., 6.)],
# dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])
My tests show that this is faster than df.to_records
by some.
to_records
is capturing the index too. Note that this is stated in the docs:
Index will be included as the first field of the record array if requested
If you want to exlude it simply set index=False
.
Although in your case you can simply use to_numpy
(or values
):
test_df.to_numpy().view(np.float64).reshape(test_array.shape + (-1, ))
array([[1., 2., 3.],
[4., 5., 6.]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With