Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex with mongolite in R

Trying to do a regex match with the awesome mongolite library and I am still not sure what I am doing wrong, going really nuts on what am i missing here.

library(mongolite)
m <- mongo(url = "mongodb://192.168.1.5:27017", db = "products", collection =    "sku")

m$count()
#gives 54524

a1 <- m$find('{"item"  : { "$regex" : "/.*A*./i" }}')
returns Imported 0 records. Simplifying into dataframe...

#but when you do
a1 <- m$find('{"item"  : "ABC"}')
#returns 8 records
a1 <- m$find('{"item"  : "AAC"}')
#returns 5 records
a1 <- m$find('{"item"  : "AAAC"}')
#returns 18 records

so on. So I am not sure what is wrong in the way I am calling the regex operator in mongodb. Any clues.? Thanks

like image 391
Kate K Avatar asked May 17 '16 22:05

Kate K


1 Answers

In the mongo shell you would use the / ... / without quotes. However, within mongolite you need the quotes otherwise it's invalid JSON

You therefore need to use ... { "$regex" : ".*A*.", "$options" : "i"}...

Consider this example

library(mongolite)

m <- mongo(db = "test", collection = "test", url = "mongodb://localhost")

## create and insert some dummy data
set.seed(2016)
df <- data.frame(id = seq(1:100),
                val = sample(letters, size = 100, replace = T))

m$insert(df)

## valid regex query in mongolite
m$find('{ "val" : { "$regex" : "^a", "$options" : "i" }  }')
# Imported 5 records. Simplifying into dataframe...
#     id val
# 1  26   a
# 2  53   a
# 3  61   a
# 4  76   a
# 5 100   a

## these queries don't work. 
m$find('{ "val" : { "$regex" : "/^a/", "$options" : "i" }  }')
# Imported 0 records. Simplifying into dataframe...
# data frame with 0 columns and 0 row

m$find('{ "val" : { "$regex" : /^a/, "$options" : "i" }  }')
# Error: Invalid JSON object: { "val" : { "$regex" : /^a/, "$options" : "i" }  }

Whereas in the mongo shell (I use robomongo) you can use either

db.test.find({ "val" : { "$regex" : /^a/ }  })
## or
db.test.find({ "val" : { "$regex" : "^a" }  })

Now, if you're after a bit more speed getting your data into R, and your result can be coerced into a data.table without loss of data, you can use a package I've written that extends mongolite that uses data.table::rbindlist to convert the results to a data.table. The speed is gained as it assumes your data is in a "tabular" structure and avoids the recursive calls in mongolite that simplifies JSON to a data.frame. See my github page for more details.

# library(devtools)
# install_github("SymbolixAU/mongolitedt")
library(mongolitedt)
bind_mongolitedt(m)

m$finddt('{ "val" : { "$regex" : "^A", "$options" : "i" }  }')
## returns a data.table
#  Imported 5 records.
#     id val
# 1:  26   a
# 2:  53   a
# 3:  61   a
# 4:  76   a
# 5: 100   a
like image 104
SymbolixAU Avatar answered Oct 21 '22 22:10

SymbolixAU