Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use Rs mongolite to correctly (insert? update?) add data to existing collection

I have the following function written in R that (I think) is doing a poor job of updating my mongo databases collections.

library(mongolite) 

con <- mongolite::mongo(collection = "mongo_collection_1", db = 'mydb', url = 'myurl')
myRdataframe1 <- con$find(query = '{}', fields = '{}')
rm(con)

con <- mongolite::mongo(collection = "mongo_collection_2", db = 'mydb', url = 'myurl')
myRdataframe2 <- con$find(query = '{}', fields = '{}')
rm(con)

... code to update my dataframes (rbind additional rows onto each of them) ...

# write dataframes to database
write.dfs.to.mongodb.collections <- function() {

  collections <- c("mongo_collection_1", "mongo_collection_2") 
  my.dataframes <- c("myRdataframe1", "myRdataframe2")

  # loop dataframes, write colllections
  for(i in 1:length(collections)) {

    # connect and add data to this table
    con <- mongo(collection = collections[i], db = 'mydb', url = 'myurl')
    con$remove('{}')
    con$insert(get(my.dataframes[i]))
    con$count()

    rm(con)
  }
}
write.dfs.to.mongodb.collections()

My dataframes myRdataframe1 and myRdataframe2 are very large dataframes, currently ~100K rows and ~50 columns. Each time my script runs, it:

  • uses con$find('{}') to pull the mongodb collection into R, saved as a dataframe myRdataframe1
  • scrapes new data from a data provider that gets appended as new rows to myRdataframe1
  • uses con$remove() and con$insert to fully remove the data in the mongodb collection, and then re-insert the entire myRdataframe1

This last bullet point is iffy, because I run this R script daily in a cronjob and I don't like that each time I am entirely wiping the mongo db collection and re-inserting the R dataframe to the collection.

If I remove the con$remove() line, I receive an error that states I have duplicate _id keys. It appears I cannot simply append using con$insert().

Any thoughts on this are greatly appreciated!

like image 618
Canovice Avatar asked Oct 29 '22 03:10

Canovice


1 Answers

When you attempt to insert documents into MongoDB that already exist in the database as per their primary key you will get the duplicate key exception. In order to work around that you can simply unset the _id column using something like this before the con$insert:

my.dataframes[i]$_id <- NULL

This way, the newly inserted document will automatically get a new _id assigned.

like image 61
dnickless Avatar answered Nov 15 '22 07:11

dnickless