I want to check if a column in a dataframe contains strings. I would have thought this could be done just by checking dtype, but that isn't the case. A pandas series that contains strings just has dtype 'object', which is also used for other data structures (like lists):
df = pd.DataFrame({'a': [1,2,3], 'b': ['Hello', '1', '2'], 'c': [[1],[2],[3]]})
df = pd.DataFrame({'a': [1,2,3], 'b': ['Hello', '1', '2'], 'c': [[1],[2],[3]]})
print(df['a'].dtype)
print(df['b'].dtype)
print(df['c'].dtype)
Produces:
int64
object
object
Is there some way of checking if a column contains only strings?
contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
To check the data type of a Series we have a dedicated attribute in the pandas series properties. The “dtype” is a pandas attribute that is used to verify data type in a pandas Series object. This attribute will return a dtype object which represents the data type of the given series.
Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.
You can use this to see if all elements in a column are strings
df.applymap(type).eq(str).all()
a False
b True
c False
dtype: bool
To just check if any are strings
df.applymap(type).eq(str).any()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With