Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should i find the numeric columns in a dataframe which also contain Null values?

The dataframe looks like:

          col1  col2   col3    col4    col5    col6    col7
points                                                    
x1         0.6  '0'   'first'  0.93   'lion'   0.34   0.98
x2         0.7  '1'  'second'  0.47    'cat'   0.43   0.76
x3         NaN  '0'   'third'  0.87  'tiger'   0.24   0.10
x4         0.6  '0'   'first'  0.93   'lion'   0.34   0.98
x5         0.5  '1'   'first'  0.32     NaN    0.09   NaN
x6         0.4  '0'   'third'  0.78  'tiger'   0.18   0.17
x7         0.5  '1'  'second'  0.98    'cat'   0.47   0.78 

numeric=df.select_dtypes(include=["number"])
others=df.select_dtypes(exclude=["number"])
print(numeric)

output:
          col4   col6
points                                                    
x1        0.93   0.34
x2        0.47   0.43   
x3        0.87   0.24   
x4        0.93   0.34   
x5        0.32   0.09   
x6        0.78   0.18   
x7        0.98   0.47   

But I need output to be like:

          col1  col4    col6    col7
points                                                    
x1         0.6  0.93    0.34   0.98
x2         0.7  0.47    0.43   0.76
x3         NaN  0.87    0.24   0.10
x4         0.6  0.93    0.34   0.98
x5         0.5  0.32    0.09   NaN
x6         0.4  0.78    0.18   0.17
x7         0.5  0.98    0.47   0.78 

I understand that NaN is being treated as Object and those columns were being moved others. How can I detect columns even based on values in the column?

like image 773
Vas Avatar asked Sep 29 '18 11:09

Vas


People also ask

How do you get the list of columns that have NULL values in a Dataframes?

You can use df. isnull(). sum() . It shows all columns and the total NaNs of each feature.

How do you check if a column has numeric values in pandas?

To check for numeric columns, you could use df[c]. dtype. kind in 'iufcb' where c is any given column name. The comparison will yeild a True or False boolean output.


1 Answers

Your question boils down to:

How can I convert columns which are meant to be numeric but currently have object dtype.

Once this issue is resolved, pd.DataFrame.select_dtypes will work as desired. The implication is you don't know in advance which series are meant to be numeric. But what you can do is try and convert columns to numeric which currently have object dtype. If you find any non-null values, you can apply the conversion.

for col in df.select_dtypes(include=['object']):
    s = pd.to_numeric(df[col], errors='coerce')
    if s.notnull().any():
        df[col] = s

print(df.dtypes)

points     object
col1      float64
col2       object
col3       object
col4      float64
col5       object
col6      float64
col7      float64
dtype: object

The logic will work for the data you have presented. It won't work, for example, when you have a series of mainly strings and a few numbers. In that situation, you will need to define more precise logic to determine which series should be considered numeric.

like image 85
jpp Avatar answered Oct 17 '22 04:10

jpp