I am a little confused with datatype "object" in Pandas. What exactly is "object"?
I would like to change the variable "SpT" (see below) from object to String.
> df_cleaned.dtypes
Vmag float64
RA float64
DE float64
Plx float64
pmRA float64
pmDE float64
B-V float64
SpT object
M_V float64
distance float64
dtype: object
For this I do the following:
df_cleaned['SpT'] = df_cleaned['SpT'].astype(str)
But that has no effect on the dtype of SpT.
The reason for doing is when I do the following:
f = lambda s: (len(s) >= 2) and (s[0].isalpha()) and (s[1].isdigit())
i = df_cleaned['SpT'].apply(f)
df_cleaned = df_cleaned[i]
I get:
TypeError: object of type 'float' has no len()
Hence, I believe if I convert "object" to "String", I will get to do what I want.
More info: This is how SpT looks like:
HIP
1 F5
2 K3V
3 B9
4 F0V
5 G8III
6 M0V:
7 G0
8 M6e-M8.5e Tc
9 G5
10 F6V
11 A2
12 K4III
13 K0III
14 K0
15 K2
...
118307 M2III:
118308 K:
118309 A2
118310 K5
118312 G5
118313 F0
118314 K0
118315 K0III
118316 F2
118317 F8
118318 K2
118319 G2V
118320 K0
118321 G5V
118322 B9IV
Name: SpT, Length: 114472, dtype: object
We can convert float to a string easily using str() function.
To convert float to string, use the toString() method. It represents a value in a string.
Use pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
If a column contains string or is treated as string, it will have a dtype
of object
(but not necessarily true backward -- more below). Here is a simple example:
import pandas as pd
df = pd.DataFrame({'SpT': ['string1', 'string2', 'string3'],
'num': ['0.1', '0.2', '0.3'],
'strange': ['0.1', '0.2', 0.3]})
print df.dtypes
#SpT object
#num object
#strange object
#dtype: object
If a column contains only strings, we can apply len
on it like what you did should work fine:
print df['num'].apply(lambda x: len(x))
#0 3
#1 3
#2 3
However, a dtype
of object does not means it only contains strings. For example, the column strange
contains objects with mixed types -- and some str
and a float
. Applying the function len
will raise an error similar to what you have seen:
print df['strange'].apply(lambda x: len(x))
# TypeError: object of type 'float' has no len()
Thus, the problem could be that you have not properly converted the column to string, and the column still contains mixed object types.
Continuing the above example, let us convert strange
to strings and check if apply
works:
df['strange'] = df['strange'].astype(str)
print df['strange'].apply(lambda x: len(x))
#0 3
#1 3
#2 3
(There is a suspicious discrepancy between df_cleaned
and df_clean
there in your question, is it a typo or a mistake in the code that causes the problem?)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With