Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongoengine - How to perform a "save new item or increment counter" operation?

I'm using MongoEngine in a web-scraping project. I would like to keep track of all the images I've encountered on all the scraped webpages.

To do so, I store the image src URL and the number of times the image has been encountered.

The MongoEngine model definition is the following:

class ImagesUrl(Document):
    """ Model representing images encountered during web-scraping.

    When an image is encountered on a web-page during scraping,
    we store its url and the number of times it has been
    seen (default counter value is 1).
    If the image had been seen before, we do not insert a new document
    in collection, but merely increment the corresponding counter value.

    """

    # The url of the image. There cannot be any duplicate.
    src = URLField(required=True, unique=True)

    # counter of the total number of occurences of the image during
    # the datamining process
    counter = IntField(min_value=0, required=True, default=1)

I'm looking for the proper way to implement the "save or increment" process.

So far, I'm handling it that way, but I feel there might be a better, built-in way of doing it with MongoEngine:

def save_or_increment(self):
    """ If it is the first time the image has been encountered, insert
        its src in mongo, along with a counter=1 value.
        If not, increment its counter value by 1.

    """ 
    # check if item is already stored
    # if not, save a new item
    if not ImagesUrl.objects(src=self.src):
        ImagesUrl(
            src=self.src,
            counter=self.counter,
            ).save()
    else:
        # if item already stored in Mongo, just increment its counter
        ImagesUrl.objects(src=self.src).update_one(inc__counter=1)

Is there a better way of doing it?

Thank you very much for your time.

like image 873
Balthazar Rouberol Avatar asked Jan 31 '13 10:01

Balthazar Rouberol


1 Answers

You should be able to just do an upsert eg:

 ImagesUrl.objects(src=self.src).update_one(
                                  upsert=True, 
                                  inc__counter=1, 
                                  set__src=self.src)
like image 55
Ross Avatar answered Nov 11 '22 15:11

Ross