Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running advanced MongoDB queries in R with rmongodb

As MySQL is driving me nuts I'm trying to make myself acquainted with my first "NoSQL" DBMS and it happened to be MongoDB. I'm connecting to it via rmongodb.

The more I play around with rmongodb, the more questions/problems come up with respect to running advanced queries.

First I present some example data before I go into detail about the different types of queries that I can't seem to specify correctly.

Example Data

The example is taken from the MongoDB website and has been simplified a bit.

pkg <- "rmongodb"
if (!require(pkg, character.only=TRUE)) {
    install.packages(pkg)
    require(pkg, character.only=TRUE)   
}

# Connect to DB
db <- "test"
ns <- "posts"
mongo <- mongo.create(db=db)

# Insert document to collection 'test.users'
b <- mongo.bson.from.list(list(
    "_id"="alex", 
    name=list(first="Alex", last="Benisson"),
    karma=1.0,
    age=30,
    test=c("a", "b")
))
mongo.insert(mongo, "test.users", b)

# Insert document to collection 'test.posts'
b <- mongo.bson.from.list(list(
        "_id"="abcd",
        when=mongo.timestamp.create(strptime("2011-09-19 02:00:00",
            "%Y-%m-%d %H:%M:%s"), increment=1),
        author="alex",
        title="Some title",
        text="Some text.",
        tags=c("tag.1", "tag.2"),
        votes=5,
        voters=c("jane", "joe", "spencer", "phyllis", "li"),
        comments=list(
            list(
                who="jane", 
                when=mongo.timestamp.create(strptime("2011-09-19 04:00:00",
                    "%Y-%m-%d %H:%M:%s"), increment=1),
                comment="Some comment."
            ),
            list(
                who="meghan", 
                when=mongo.timestamp.create(strptime("2011-09-20 13:00:00",
                    "%Y-%m-%d %H:%M:%s"), increment=1),
                comment="Some comment."
            )
        )
    )
)
b
mongo.insert(mongo, "test.posts", b)

Two questions related to inserting JSON/BSON objects:

  1. Document 'test.posts', field voters: is it correct to use c() in this case?
  2. Document 'test.posts', field comments: what's the right way to specify this, c() or list()?

Top Level Queries: they work a treat

Top level queries work just fine:

# Get all posts by 'alex' (only titles)
res <- mongo.find(mongo, "test.posts", query=list(author="alex"), 
    fields=list(title=1L))
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
[[1]]
                       _id                      title 
                     "abcd"            "No Free Lunch" 

Question 1: Basic Sub Level Queries

How can run a simple "sub level queries" (as opposed to top level queries) that need to reach into arbitrarily deep sublevels of a JSON/BSON style MongoDB object? These sub level queries make use of MongoDB's dot notation and I can't seem to figure out how to map that to a valid rmongodb query

In plain MongoDB syntax, something like

> db.posts.find( { comments.who : "meghan" } )

would work. But I can't figure out how to do that with rmongodb functions

Here's what I tried so far

# Get all comments by 'meghan' from 'test.posts'

#--------------------
# Approach 1)
#--------------------
res <- mongo.find(mongo, "test.posts", query=list(comments=list(who="meghan")))
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
NULL
# Does not work

#--------------------
# Approach 2) 
#--------------------
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "comments")
mongo.bson.buffer.append(buf, "who", "meghan")
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)
res <- mongo.find(mongo, "test.posts", query=query)
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
NULL
# Does not work

Question 2: Queries Using $ Operators

These work

Query 1

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "age")
mongo.bson.buffer.append(buf, "$lte", 30)
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria

> mongo.find.one(mongo, "test.users", query=criteria)
    _id : 2      alex
    name : 3     
        first : 2    Alex
        last : 2     Benisson

    karma : 1    1.000000
    age : 1      30.000000
    test : 4     
        0 : 2    a
        1 : 2    b

Query 2

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "test")
mongo.bson.buffer.append(buf, "$in", c("a", "z"))
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria
mongo.find.one(mongo, "test.users", query=criteria)

However, notice that an atomic set will result in a return value of NULL

mongo.bson.buffer.append(buf, "$in", "a")
# Instead of 'mongo.bson.buffer.append(buf, "$in", c("a", "z"))'

Trying the same with sub level queries I'm lost again

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "name")
mongo.bson.buffer.start.object(buf, "first")
mongo.bson.buffer.append(buf, "$in", c("Alex", "Horst"))
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria <- mongo.bson.from.buffer(buf)
> criteria
    name : 3     
        first : 3    
            $in : 4      
                0 : 2    Alex
                1 : 2    Horst

> mongo.find.one(mongo, "test.users", query=criteria)
NULL
like image 983
Rappster Avatar asked May 29 '12 11:05

Rappster


People also ask

Can we run query efficiently in MongoDB?

Performance. Because the index contains all fields required by the query, MongoDB can both match the query conditions and return the results using only the index. Querying only the index can be much faster than querying documents outside of the index.

Can we use MongoDB for caching?

Does MongoDB handle caching? Yes. MongoDB keeps most recently used data in RAM. If you have created indexes for your queries and your working data set fits in RAM, MongoDB serves all queries from memory.

How much RAM do I need for MongoDB?

MongoDB requires approximately 1 GB of RAM per 100.000 assets. If the system has to start swapping memory to disk, this will have a severely negative impact on performance and should be avoided.


3 Answers

Either c() or list() can be ok. Depends on whether the components are named and whether they all have the same type (for list). Best thing to do is look at the generated BSON and see if you are getting what you want. For the best control of the generated object use mongo.bson.buffer and the functions that operate on it. In fact this is why the sub-queries are failing. 'comments' is being created as a subobject rather than an array. mongo.bson.from.list() is handy but it doesn't give you the same control and sometimes it guesses wrong about what to generate from complicated structures.

The query on the other set of data can be corrected like so though:

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "name.first")
mongo.bson.buffer.append(buf, "$in", c("Alex", "Horst"))
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)

Note that you definitely need to use a buffer here since R will choke on the dotted name.

I hope this straightens out your problem. Let me know if you have any further questions.

like image 92
Gerald Lindsly Avatar answered Oct 08 '22 18:10

Gerald Lindsly


I'm still not very clear on what's the preferred way here on SO to progress once a question has been posted but one wishes to elaborate a bit more, possibly adding further questions and answer approaches.

As I was often told not to blow up my original question with future edits, in this "answer" I'm simply taking the suggestions by Gerald Lindsly and try to put it into actual code (because it still didn't work out for me):

Preparations

pkg <- "rmongodb"
if (!require(pkg, character.only=TRUE)) {
    install.packages(pkg)
    require(pkg, character.only=TRUE)   
}

# Connect to DB
db <- "test"
ns <- "posts"
mongo <- mongo.create(db=db)

# Make sure we start with an empty collection
mongo.drop(mongo, paste(db, ns, sep="."))

Insert document

As Gerald has pointed out in his answer, mongo.bson.from.list() sometimes makes wrong guesses about the resulting BSON structure, so I tried to go ahead an explicitly create BSON array objects:

buf <- mongo.bson.buffer.create()

# 'REGULAR' APPENDING
mongo.bson.buffer.append(buf, "_id", "abcd")
mongo.bson.buffer.append(buf, "when", 
    mongo.timestamp.create(strptime("2011-09-19 02:00:00",
        "%Y-%m-%d %H:%M:%s"), increment=1))
mongo.bson.buffer.append(buf, "author", "alex")
mongo.bson.buffer.append(buf, "title", "Some title")
mongo.bson.buffer.append(buf, "text", "Some text.")
mongo.bson.buffer.append(buf, "tags", c("tag.1", "tag.2"))
mongo.bson.buffer.append(buf, "votes", 5)
# /

# VOTERS ARRAY
mongo.bson.buffer.start.array(buf, "voters")
voters <- c("jane", "joe", "spencer", "phyllis", "li")
i=1
for (i in seq(along=voters)) {
    mongo.bson.buffer.append(buf, as.character(i), voters[i])
}
mongo.bson.buffer.finish.object(buf)
# /

# COMMENTS ARRAY
mongo.bson.buffer.start.array(buf, "comments")

mongo.bson.buffer.start.object(buf, "1")
mongo.bson.buffer.append(buf, "who", "jane")
mongo.bson.buffer.append(buf, "when", 
    mongo.timestamp.create(strptime("2011-09-19 04:00:00",
            "%Y-%m-%d %H:%M:%s"), increment=1))
mongo.bson.buffer.append(buf, "comment", "some comment.")
mongo.bson.buffer.finish.object(buf)

mongo.bson.buffer.start.object(buf, "2")
mongo.bson.buffer.append(buf, "who", "meghan")
mongo.bson.buffer.append(buf, "when", 
    mongo.timestamp.create(strptime("2011-09-20 13:00:00",
            "%Y-%m-%d %H:%M:%s"), increment=1))
mongo.bson.buffer.append(buf, "comment", "some comment.")
mongo.bson.buffer.finish.object(buf)
# /

# FINALIZE
mongo.bson.buffer.finish.object(buf)
b <- mongo.bson.from.buffer(buf)
> b
_id : 2      abcd
when : 17    i: 1, t: 1316390400
author : 2   alex
title : 2    Some title
text : 2     Some text.
tags : 4     
    0 : 2    tag.1
    1 : 2    tag.2

votes : 1    5.000000
voters : 4   
    1 : 2    jane
    2 : 2    joe
    3 : 2    spencer
    4 : 2    phyllis
    5 : 2    li

comments : 4     
    1 : 3    
        who : 2      jane
        when : 17    i: 1, t: 1316397600
        comment : 2      some comment.

    2 : 3    
        who : 2      meghan
        when : 17    i: 1, t: 1316516400
        comment : 2      some comment.

mongo.insert(mongo, "test.posts", b)

Basic sub-level query

# Get all comments by 'meghan' from 'test.posts'

#--------------------
# Approach 1)
#--------------------
res <- mongo.find(mongo, "test.posts", query=list(comments=list(who="meghan")))
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
NULL
# Does not work

#--------------------
# Approach 2) 
#--------------------
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "comments")
mongo.bson.buffer.append(buf, "who", "meghan")
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)
res <- mongo.find(mongo, "test.posts", query=query)
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
NULL
# Does not work

I still must be doing something wrong here when specifying the document ;-)

like image 28
Rappster Avatar answered Oct 08 '22 19:10

Rappster


Regarding atomic queries and the $in operator, I got Query 2 from your first question to work as follows:

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "test")
mongo.bson.buffer.start.array(buf, "$in")
mongo.bson.buffer.append(buf, "a", "a")
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria

I guess explicitly starting and ending the array does the trick, if the array is going to end up holding only one element.

One thing that can be useful is monitoring the mongod console or log (after starting mongod with the -v option). Running your old query, you'll see:

Tue Nov 20 16:09:04 [conn23] User Assertion: 12580:invalid query
Tue Nov 20 16:09:04 [conn23] assertion 12580 invalid query ns:test.users query:{ test: { $in: "a" } }
Tue Nov 20 16:09:04 [conn23] problem detected during query over test.users : { $err: "invalid query", code: 12580 }
Tue Nov 20 16:09:04 [conn23] query test.users query: { test: { $in: "a" } } ntoreturn:0 keyUpdates:0 exception: invalid query code:12580 locks(micros) r:440 reslen:59 0ms

Running the modified query, it looks ok:

Tue Nov 20 16:10:14 [conn23] query test.users query: { test: { $in: [ "a" ] } } ntoreturn:0 keyUpdates:0 locks(micros) r:168 nreturned:1 reslen:142 0ms
like image 39
Dan Tenenbaum Avatar answered Oct 08 '22 18:10

Dan Tenenbaum