I have the following Mongodb database structure:
{
"_id" : "519817e508a16b447c00020e",
"keyword" : "Just an example query",
"rankings" :
{
results:
{
"1" : { "domain" : "example1.com", "href" : "http://www.example1.com/"},
"2" : { "domain" : "example2.com", "href" : "http://www.example2.com/"},
"3" : { "domain" : "example3.com", "href" : "http://www.example3.com/"},
"4" : { "domain" : "example4.com", "href" : "http://www.example4.com/"},
"5" : { "domain" : "example5.com", "href" : "http://www.example5.com/"},
...
...
"99" : { "domain" : "example99.com", "href" : "http://www.example99.com/"}
"100" : {"domain" : "example100.com", "href" : "http://www.example100.com/"}
},
"plus":"many",
"other":"not",
"interesting" : "stuff",
"for": "this question"
}
}
In a previous question, I asked how to index the text so that I could search for the keyword and domain using for example:
db.ranking.find({ $text: { $search: "\"example9.com\" \"Just an example query\""}})
The awesome answer by John Petrone was:
db.ranking.ensureIndex(
{
"keyword": "text",
"rankings.results.1.domain" : "text",
"rankings.results.2.domain" : "text",
...
...
"rankings.results.99.domain" : "text",
"rankings.results.100.domain" : "text"
}
However, if that works just great when I have 10 results, I run into an "Index key pattern too large" error with code 67 from Mongo shell when I try to index 100 results.
So the big question is:
How (the hell) can I resolve that "index key pattern too large" error?
EDIT: 18/08/2014 The document structure clarified
{
"_id" : "519817e508a16b447c00020e", #From Mongodb
"keyword" : "Just an example query",
"date" : "2014-03-28"
"rankings" :
{
"1" : { "domain" : "example1.com", "href" : "http://www.example1.com/", "plus" : "stuff1"},
...
"100" : {"domain" : "example100.com", "href" : "http://www.example100.com/"plus" : "stuff100"}"}
},
"plus":"many",
"other":"not",
"interesting" : "stuff",
"for": "this question"
}
The problem with your suggested structure:
{
keyword" : "Just an example query",
"rankings" :
[{"rank" : 1, "domain" : "example1.com", "href" : "example1.com"},
...{ "rank" : 99, "domain" : "example99.com", "href" : "example99.com“}
]}
}
Is that although you can now do
db.ranking.ensureIndex({"rankings.href":"text", "rankings.domain":"text"})
and then run queries like:
db.ranking.find({$text:{$search:"example1"}});
this will now return the whole array document where the array element is matched.
You might want to consider referencing so that each rankings result is a separate document and the keywords and other meta data are referenced, to avoid repeating lots of information.
So, you have a keyword/metadata document like:
{_id:1, "keyword":"example query", "querydate": date, "other stuff":"other meta data"},
{_id:2, "keyword":"example query 2", "querydate": date, "other stuff":"other meta data 2"}
and then a results document like:
{keyword_id:1, {"rank" : 1, "domain" : "example1.com", "href" : "example1.com"},
... keyword_id:1, {"rank" : 99, "domain" : "example99.com", "href" : "example99.com"},
keyword_id:2, {"rank" : 1, "domain" : "example1.com", "href" : "example1.com"},
...keyword_id:2, {"rank" : 99, "domain" : "example99.com", "href" : "example99.com"}}
where keyword_id links back to (references) the keyword/metadata table -- obviously, in practice, the _ids will look like "_id" : "519817e508a16b447c00020e", but this is just for readability. You could now index on keyword_id, domain and href, either together or separately, depending on your query types and you will not get the index key pattern too large error
and you will only get a single matching document rather than a whole array returned.
I am not entirely clear on where you are needing fuzzy/regex style searches and whether you will be searching metadata or just href and domain, but I think this structure should be a cleaner way to start thinking about indexing, without maxing out on indexes, as before. It will also allow you to combine finds on normal indexes with text indexes, depending on your query pattern.
You might find this answer MongoDB relationships: embed or reference? useful when considering you document struture.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With