Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I save BM25Okapi object value to file?

We are working on information retrieval task, and we need to rank research papers due to query.

After cleaning data, and creating dataframe, we have tokenized paper texts and need to save result into file.

import sys
#tokenized_corpus = [doc.split(" ") for doc in corpus]

corpus = list(df.body_text)

tokenized_corpus1 = [doc.split(" ") for doc in corpus[:20000]]
tokenized_corpus2 = [doc.split(" ") for doc in corpus[20000:40000]]
#tokenized_corpus3 = [doc.split(" ") for doc in corpus[40000:]]

tokenized_corpus = tokenized_corpus1 + tokenized_corpus2 # + tokenized_corpus3 

cell above create tokenized corpus.

with open('file.csv', 'w', newline='', encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerows(tokenized_corpus)

then we save data to .csv file.

after that, we call BM25Okapi method

bm25 = BM25Okapi(tokenized_corpus)

As this step takes too much time and consumes gigabytes of memory (causing frequent errors) we want to save result, so that we will not need to recall funktion every time.

to retrieve results due to results we used the following steps.

query = "coronavirus origin"
tokenized_query = query.split(" ")

doc_scores = bm25.get_scores(tokenized_query)
doc_scores

I were not able to save BM25 objects value to file. And did not see any method in the source code. How should i do?

like image 489
Ulvi Shukurzade Avatar asked Nov 06 '22 01:11

Ulvi Shukurzade


1 Answers

Question is asked in a wrong way. What we have to do is saving objects not specifically BM25Okapi results.

so, here goes the solution:

import pickle

#To save bm25 object
with open('bm25result', 'wb') as bm25result_file:
    pickle.dump(bm25, bm25result_file)

then, to read the object data:

#to read bm25 object
with open('bm25result', 'rb') as bm25result_file:
    bm25result = pickle.load(bm25result_file)

detailed description can be found this article

like image 170
Ulvi Shukurzade Avatar answered Nov 14 '22 04:11

Ulvi Shukurzade