I'll preface this with the statement that I wouldn't do this in the first place and that I ran across this helping a friend.
Consider the data frame df
df = pd.DataFrame(pd.Series([[1.2]]))
df
0
0 [1.2]
This is a data frame of objects where the objects are lists. In my friend's code, they had:
df.astype(float)
Which breaks as I had hoped
ValueError: setting an array element with a sequence.
However, if those values were numpy arrays instead:
df = pd.DataFrame(pd.Series([np.array([1.2])]))
df
0
0 [1.2]
And I tried the same thing:
df.astype(float)
0
0 1.2
It's happy enough to do something and convert my 1-length arrays to scalars. This feels very dirty!
If instead they were not 1-length arrays
df = pd.DataFrame(pd.Series([np.array([1.2, 1.3])]))
df
0
0 [1.2, 1.3]
Then it breaks
ValueError: setting an array element with a sequence.
Question
Please tell me this is a bug and we can fix it. Or can someone explain why and in what world this makes sense?
Response to @root
You are right. Is this worth an issue? Do you expect/want this?
a = np.empty((1,), object)
a[0] = np.array([1.2])
a.astype(float)
array([ 1.2])
And
a = np.empty((1,), object)
a[0] = np.array([1.2, 1.3])
a.astype(float)
ValueError: setting an array element with a sequence.
This is due to the unsafe
default-value for the casting
argument of astype
. In the docs the argument casting
is described as such:
"Controls what kind of data casting may occur. Defaults to ‘unsafe’ for backwards compatibility." (my emphasis)
Any of the other possible castings return a TypeError
.
a = np.empty((1,), object)
a[0] = np.array([1.2])
a.astype(float, casting='same_kind')
Results in:
TypeError: Cannot cast array from dtype('O') to dtype('float64') according to the rule 'same_kind'
This is true for all castings except unsafe
, namely: no
, equiv
, safe
, and same_kind
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With