Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

text information retrieve result analysis dataset (text)

I had created the text semantic search engine. However, I cannot find the data set which is labeled so that I can evaluate the information retrieve of my system.

Is there any public available document (text) which is labeled. As I would need the text document to evaluate the information retrieve result. (recall, precision, F1 value...)

Thanks.

like image 332
dd90p Avatar asked Jun 01 '26 08:06

dd90p


1 Answers

I do research in this direction. In all my research, i have used AOL dataset which consists of ~20M web queries collected from ~650k users over three months (March 01, 2006 to May 31, 2006). The data is sorted by anonymous user ID and sequentially arranged.

The data set includes {AnonID, Query, QueryTime, ItemRank, ClickURL}. More details can be found in the link mentioned above. I am interested to know how you have implemented and if possible, share your engine's code. I am also interested to know the performance on AOL dataset in your search engine.

You can find the dataset in my git repository. Thanks!

like image 133
Wasi Ahmad Avatar answered Jun 04 '26 13:06

Wasi Ahmad