I have a dataframe containing a sentence per row. I need to search through these sentences for the occurence of certain words. This is how I currently do it:
import pandas as pd
p = pd.DataFrame({"sentence" : ["this is a test", "yet another test", "now two tests", "test a", "no test"]})
test_words = ["yet", "test"]
p["word_test"] = ""
p["word_yet"]  = ""
for i in range(len(p)):
    for word in test_words:
        p.loc[i]["word_"+word] = p.loc[i]["sentence"].find(word)
This works as intended, however, is it possible to optimize this? It runs fairly slow for large dataframes
You can use str.find
p['word_test'] = p.sentence.str.find('test')
p['word_yet'] = p.sentence.str.find('yet')
    sentence         word_test  word_yet    word_yest
0   this is a test   10         -1          -1
1   yet another test 12          0          0
2   now two tests    8          -1          -1
3   test a           0          -1          -1
4   no test          3          -1          -1
                        IIUC, use a simple list comprehension and call str.find for each word:
u = pd.DataFrame({
    # 'word_{}'.format(w)
    f'word_{w}': df.sentence.str.find(w) for w in test_words}, index=df.index)
u
   word_yet  word_test
0        -1         10
1         0         12
2        -1          8
3        -1          0
4        -1          3
pd.concat([df, u], axis=1)
           sentence  word_yet  word_test
0    this is a test        -1         10
1  yet another test         0         12
2     now two tests        -1          8
3            test a        -1          0
4           no test        -1          3
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With