I have a list of 'words' I want to count below
word_list = ['one','three']
And I have a column within pandas dataframe with text below.
TEXT |
-------------------------------------------|
"Perhaps she'll be the one for me." |
"Is it two or one?" |
"Mayhaps it be three afterall..." |
"Three times and it's a charm." |
"One fish, two fish, red fish, blue fish." |
"There's only one cat in the hat." |
"One does not simply code into pandas." |
"Two nights later..." |
"Quoth the Raven... nevermore." |
The desired output is the following below, where it keeps the original text column, but only extracted the words in word_list to a new column
TEXT | EXTRACT
-------------------------------------------|---------------
"Perhaps she'll be the one for me." | one
"Is it two or one?" | one
"Mayhaps it be three afterall..." | three
"Three times and it's a charm." | three
"One fish, two fish, red fish, blue fish." | one
"There's only one cat in the hat." | one
"One does not simply code into pandas." | one
"Two nights later..." |
"Quoth the Raven... nevermore." |
Is there a way to do this in Python 2.7?
You can replace substring of pandas DataFrame column by using DataFrame. replace() method. This method by default finds the exact sting match and replaces it with the specified value. Use regex=True to replace substring.
extract() function is used to extract capture groups in the regex pat as columns in a DataFrame. For each subject string in the Series, extract groups from the first match of regular expression pat.
Use str.extract
:
df['EXTRACT'] = df.TEXT.str.extract('({})'.format('|'.join(word_list)),
flags=re.IGNORECASE, expand=False).str.lower().fillna('')
df['EXTRACT']
0 one
1 one
2 three
3 three
4 one
5 one
6 one
7
8
Name: EXTRACT, dtype: object
Each word in word_list
is joined by the regex separator |
and then passed to str.extract
for regex pattern matching.
The re.IGNORECASE
switch is turned on for case-insensitive comparisons, and the resultant matches are lowercased to match with your expected output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With