The dataframe looks like:
col1 col2 col3 col4 col5 col6 col7
points
x1 0.6 '0' 'first' 0.93 'lion' 0.34 0.98
x2 0.7 '1' 'second' 0.47 'cat' 0.43 0.76
x3 NaN '0' 'third' 0.87 'tiger' 0.24 0.10
x4 0.6 '0' 'first' 0.93 'lion' 0.34 0.98
x5 0.5 '1' 'first' 0.32 NaN 0.09 NaN
x6 0.4 '0' 'third' 0.78 'tiger' 0.18 0.17
x7 0.5 '1' 'second' 0.98 'cat' 0.47 0.78
numeric=df.select_dtypes(include=["number"])
others=df.select_dtypes(exclude=["number"])
print(numeric)
output:
col4 col6
points
x1 0.93 0.34
x2 0.47 0.43
x3 0.87 0.24
x4 0.93 0.34
x5 0.32 0.09
x6 0.78 0.18
x7 0.98 0.47
But I need output to be like:
col1 col4 col6 col7
points
x1 0.6 0.93 0.34 0.98
x2 0.7 0.47 0.43 0.76
x3 NaN 0.87 0.24 0.10
x4 0.6 0.93 0.34 0.98
x5 0.5 0.32 0.09 NaN
x6 0.4 0.78 0.18 0.17
x7 0.5 0.98 0.47 0.78
I understand that NaN is being treated as Object and those columns were being moved others
. How can I detect columns even based on values in the column?
You can use df. isnull(). sum() . It shows all columns and the total NaNs of each feature.
To check for numeric columns, you could use df[c]. dtype. kind in 'iufcb' where c is any given column name. The comparison will yeild a True or False boolean output.
Your question boils down to:
How can I convert columns which are meant to be numeric but currently have
object
dtype.
Once this issue is resolved, pd.DataFrame.select_dtypes
will work as desired. The implication is you don't know in advance which series are meant to be numeric. But what you can do is try and convert columns to numeric which currently have object
dtype. If you find any non-null values, you can apply the conversion.
for col in df.select_dtypes(include=['object']):
s = pd.to_numeric(df[col], errors='coerce')
if s.notnull().any():
df[col] = s
print(df.dtypes)
points object
col1 float64
col2 object
col3 object
col4 float64
col5 object
col6 float64
col7 float64
dtype: object
The logic will work for the data you have presented. It won't work, for example, when you have a series of mainly strings and a few numbers. In that situation, you will need to define more precise logic to determine which series should be considered numeric.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With