I am working on a web crawler and the results which are saved to Raven can vary on how big the website is. I'm trying to delete a specific result which has over the the "server per session is limited to 30", I don't want to extend it to the 1,000 limit, I do however want to batch delete.
The code I have written which I think should work is
public void DeleteCrawledLinks(string baseUrl)
{
DocumentStore().DatabaseCommands.DeleteByIndex(
"Auto/UrlContainers/ByBaseUrlAndUrl",
new IndexQuery
{
Query = "BaseUrl:" + baseUrl // where BaseUrl contains baseUrl
}, allowStale: false);
}
the BaseUrl in Raven for this example let's call it "BaseUrl": "http://localhost:2125/" and the baseUrl will be the same, when I run the delete function I get this error message
Url: "/bulk_docs/Auto/UrlContainers/ByBaseUrlAndUrl?query=BaseUrl%253Ahttp%253A%252F%252Flocalhost%253A2125%252F&start=0&pageSize=128&aggregation=None&allowStale=False"
System.ArgumentException: The field 'http' is not indexed, cannot query on fields that are not indexed
Is it because of the : in my query, is there a way around this or is there another way? I don't want to extend the limit because the sites I crawl could have more than 1,000 results returned.
When constructing the query yourself, escape search terms as follows:
Query = "BaseUrl:" + RavenQuery.Escape(baseUrl)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With