RethinkDB

Question

I'm trying to write the most optimal query to find all of the documents that do not have a specific field. Is there any better way to do this than the examples I have listed below?

// Get the ids of all documents missing "location"
r.db("mydb").table("mytable").filter({location: null},{default: true}).pluck("id")

// Get a count of all documents missing "location"
r.db("mydb").table("mytable").filter({location: null},{default: true}).count()

Right now, these queries take about 300-400ms on a table with ~40k documents, which seems rather slow. Furthermore, in this specific case, the "location" attribute contains latitude/longitude and has a geospatial index.

Is there any way to accomplish this? Thanks!

Jorge Silva · Accepted Answer

A naive suggestion

You could use the hasFields method along with the not method on to filter out unwanted documents:

r.db("mydb").table("mytable")
  .filter(function (row) {
    return row.hasFields({ location: true }).not()
  })

This might or might not be faster, but it's worth trying.

Using a secondary index

Ideally, you'd want a way to make location a secondary index and then use getAll or between since queries using indexes are always faster. A way you could work around that is making all rows in your table have a value false value for their location, if they don't have a location. Then, you would create a secondary index for location. Finally, you can then query the table using getAll as much as you want!

Adding a location property to all fields without a location

For that, you'd need to first insert location: false into all rows without a location. You could do this as follows:

r.db("mydb").table("mytable")
  .filter(function (row) {
    return row.hasFields({ location: true }).not()
  })
  .update({
    location: false
  })

After this, you would need to find a way to insert location: false every time you add a document without a location.

Create secondary index for the table

Now that all documents have a location field, we can create a secondary index for location.

r.db("mydb").table("mytable")
 .indexCreate('location')

Keep in mind that you only have to add the { location: false } and create the index only once.

Use getAll

Now we can just use getAll to query documents using the location index.

r.db("mydb").table("mytable")
 .getAll(false, { index: 'location' })

This will probably be faster than the query above.

Using a secondary index (function)

You can also create a secondary index as a function. Basically, you create a function and then query the results of that function using getAll. This is probably easier and more straight-forward than what I proposed before.

Create the index

Here it is:

r.db("mydb").table("mytable")
 .indexCreate('has_location', 
   function(x) { return x.hasFields('location'); 
 })

Use getAll.

Here it is:

r.db("mydb").table("mytable")
 .getAll(false, { index: 'has_location' })

RethinkDB - Find documents with missing field

Tags:

query-optimization

ambient

1 Answers

Jorge Silva

Recent Activity

Donate For Us