I am parsing a pandas dataframe df1
containing string object rows. I have a reference list of keywords and need to delete every row in df1
containing any word from the reference list.
Currently, I do it like this:
reference_list: ["words", "to", "remove"]
df1 = df1[~df1[0].str.contains(r"words")]
df1 = df1[~df1[0].str.contains(r"to")]
df1 = df1[~df1[0].str.contains(r"remove")]
Which is not not scalable to thousands of words. However, when I do:
df1 = df1[~df1[0].str.contains(reference_word for reference_word in reference_list)]
I yield the error first argument must be string or compiled pattern.
Following this solution, I tried:
reference_list: "words|to|remove"
df1 = df1[~df1[0].str.contains(reference_list)]
Which doesn't raise an exception but doesn't parse all words eather.
How to effectively use str.contains with a list of words?
The min-max feature scaling The min-max approach (often called normalization) rescales the feature to a fixed range of [0,1] by subtracting the minimum value of the feature and then dividing by the range. We can apply the min-max scaling in Pandas using the . min() and . max() methods.
You can use .str.contains() on a pandas column and pass it the substring as an argument to filter for rows that contain the substring.
You can insert a list of values into a cell in Pandas DataFrame using DataFrame.at() , DataFrame. iat() , and DataFrame. loc() methods.
Pandas uses the object dtype for storing strings.
For a scalable solution, do the following -
|
str.contains
df1
To index the 0th column, don't use df1[0]
(as this might be considered ambiguous). It would be better to use loc
or iloc
(see below).
words = ["words", "to", "remove"]
mask = df1.iloc[:, 0].str.contains(r'\b(?:{})\b'.format('|'.join(words)))
df1 = df1[~mask]
Note: This will also work if words
is a Series.
Alternatively, if your 0th column is a column of words only (not sentences), then you can use df.isin
, which should be faster -
df1 = df1[~df1.iloc[:, 0].isin(words)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With