Python: Check if dataframe column contain string type

Tags:

I want check if columns in a dataframe consists of strings so I can label them with numbers for machine learning purposes. Some columns consists of numbers, I dont want to change them. Columns example can be seen below:

TRAIN FEATURES
  Age              Level  
  32.0              Silver      
  61.0              Silver  
  66.0              Silver      
  36.0              Gold      
  20.0              Silver     
  29.0              Silver     
  46.0              Silver  
  27.0              Silver

Thank you=)

274

asked Mar 27 '17 14:03

s900n

4 Answers

Notice that the above answers will include DateTime, TimeStamp, Category and other datatypes.

Using object is more restrictive (although I am not sure if other dtypes would also of object dtype):

Create the dataframe:

df = pd.DataFrame({
    'a': ['a','b','c','d'], 
    'b': [1, 'b', 'c', 2], 
    'c': [np.nan, 2, 3, 4], 
    'd': ['A', 'B', 'B', 'A'], 
    'e': pd.to_datetime('today')})
df['d'] = df['d'].astype('category')

That will look like this:

   a  b    c  d          e
0  a  1  NaN  A 2018-05-17
1  b  b  2.0  B 2018-05-17
2  c  c  3.0  B 2018-05-17
3  d  2  4.0  A 2018-05-17

You can check the types calling dtypes:

df.dtypes

a            object
b            object
c           float64
d          category
e    datetime64[ns]
dtype: object

You can list the strings columns using the items() method and filtering by object:
```
> [ col  for col, dt in df.dtypes.items() if dt == object]
['a', 'b']
```

Or you can use select_dtypes to display a dataframe with only the strings:

df.select_dtypes(include=[object])
   a  b
0  a  1
1  b  b
2  c  c
3  d  2

198

answered Oct 17 '22 15:10

toto_tico

Yes, its possible. You use dtype

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': ['a','b','c','d']})
if df['a'].dtype != np.number:
    print('yes')
else:
    print('no')

You can also select your columns by dtype using select_dtypes

df_subset = df.select_dtypes(exclude=[np.number])
# Now apply you can label encode your df_subset

answered Oct 17 '22 16:10

Scratch'N'Purr

I use a 2-step approach: first to determine if dtype==object, and then if so, I got the first row of data to see if that column's data was a string or not.

c = 'my_column_name'
if df[c].dtype == object and isinstance(df.iloc[0][c], str):
    # do something

answered Oct 17 '22 14:10

hamx0r

4 years since the creation of this question and I believe there's still not a definitive answer.

I don't think strings were ever considered as a first class citizen in Pandas (even >= 1.0.0). As an example:

import pandas as pd
import datetime

df = pd.DataFrame({
    'str': ['a', 'b', 'c', None],
    'hete': [1, 2.0, datetime.datetime.utcnow(), None]
})

string_series = df['str']
print(string_series.dtype)
print(pd.api.types.is_string_dtype(string_series.dtype))

heterogenous_series = df['hete']
print(heterogenous_series.dtype)
print(pd.api.types.is_string_dtype(heterogenous_series.dtype))

prints

object
True
object
True

so although hete does not contain any explicit strings, it is considered as a string series.

After reading the documentation, I think the only way to make sure a series contains only strings is:

def is_string_series(s : pd.Series):
    if isinstance(s.dtype, pd.StringDtype):
        # The series was explicitly created as a string series (Pandas>=1.0.0)
        return True
    elif s.dtype == 'object':
        # Object series, check each value
        return all((v is None) or isinstance(v, str) for v in s)
    else:
        return False

answered Oct 17 '22 16:10

vc 74

Related questions
                            
                                Efficiently count the number of bits in an integer in JavaScript
                            
                                Ecto/Elixir, How can I query by date?
                            
                                .attr vs .classed in D3.js
                            
                                Exporting environment variables from one stage to the next in GitLab CI
                            
                                Get size of specific repository in Nexus 3
                            
                                Unable to start LiveReload server
                            
                                What is the difference between auto-fill and auto-fit?
                            
                                Pandas Merge returns NaN
                            
                                Ninject.MVC5 not generating NinjectWebCommon.Cs
                            
                                Dynamically access table in EF Core 2.0
                            
                                Break jq query string into lines
                            
                                apollo-client does not work with CORS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Check if dataframe column contain string type

Tags:

python

dataframe