I have this dummy df:
columns = ['answer', 'some_number']
data = [['hello how are you doing','1.0'],
['hello', '1.0'],
['bye bye bye bye', '0.0'],
['no', '0.0'],
['yes', '1.0'],
['Who let the dogs out', '0.0'],
['1 + 1 + 1 + 2', '1.0']]
df = pd.DataFrame(columns=columns, data=data)
I want to output the rows with a word count greater than 3.
Here that would the rows 'hello how are you doing', 'bye bye bye bye', 'Who let the dogs out', '1 + 1 + 1 + 2'
My approach doesn't work: df[len(df.answer) > 3]
Output: KeyError: True
A couple more options using str.split()
:
Combine with str.len()
:
df[df.answer.str.split().str.len().gt(n)]
Or combine with apply(len)
:
df[df.answer.str.split().apply(len).gt(n)]
Fastest overall (BENY's list comprehension):
df[[x.count(' ') >= n for x in df.answer]]
Fastest pandas-based (anky's first answer):
df[df.answer.str.count(' ').ge(n)]
Timed with ~20 words per sentence:
df[len(df.answer) > 3]
work?len(df.answer)
returns the length of the answer
column itself (7), not the number of words per answer
(5, 1, 4, 1, 1, 5, 7).
That means the final expression evaluates to df[7 > 3]
or df[True]
, which breaks because there is no column True
:
>>> len(df.answer)
7
>>> len(df.answer) > 3 # 7 > 3
True
>>> df[len(df.answer) > 3] # df[True] doesn't exist
KeyError: True
If the seperator is ' '
,you can try series.str.count
, else you can replace the sep
n=3
df[df['answer'].str.count(' ').gt(n-1)]
To include Multiple spaces #credits @piRSquared
df['answer'].str.count('\s+').gt(2)
Or using list comprehension:
n= 3
df[[len(i.split())>n for i in df['answer']]] #should be faster than above
answer some_number
0 hello how are you doing 1.0
2 bye bye bye bye 0.0
5 Who let the dogs out 0.0
6 1 + 1 + 1 + 2 1.0
If I understand this correctly, here's one way:
>>> df.loc[df['answer'].str.split().apply(len) > 3, 'answer']
0 hello how are you doing
2 bye bye bye bye
5 Who let the dogs out
6 1 + 1 + 1 + 2
Try with count for string operation
n = 3
df[[x.count(' ') > n-1 for x in df.answer]]
Out[31]:
answer some_number
0 hello how are you doing 1.0
2 bye bye bye bye 0.0
5 Who let the dogs out 0.0
6 1 + 1 + 1 + 2 1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With