Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inserting a document with Pymongo - InvalidDocument: Cannot encode object

I am trying to insert a document (twitter information in this case) into Mongo database with PyMongo.

As you can see below, tweets_listdt[0] is exactly the same as

{
     'created_at': u'Sun Aug 03 17:07:24 +0000 2014',
     'id': 2704548373,
     'name': u'NoSQL',
     'text': u'RT @BigdataITJobs: Data Scientist \u2013 Machine learning, Python, Pandas, Statistics @adam_rab in London, United Kingdom http://t.co/pIIJVPCuN8\u2026'
}

But I couldn't save tweets_listdt[0] into my Mongodb while I could do so with the later one.

In[529]: tweets_listdt[0] == {'created_at': u'Sun Aug 03 17:07:24 +0000 2014',
 'id': 2704548373,
 'name': u'NoSQL',
 'text': u'RT @BigdataITJobs: Data Scientist \u2013 Machine learning, Python, Pandas, Statistics @adam_rab in London, United Kingdom http://t.co/pIIJVPCuN8\u2026'}
Out[528]: **True**

This one fails:

In[530]: tweetsdb.save(tweets_listdt[0])
tweetsdb.save({'created_at': u'Sun Aug 03 17:07:24 +0000 2014',
 'id': 2704548373,
 'name': u'NoSQL',
 'text': u'RT @BigdataITJobs: Data Scientist \u2013 Machine learning, Python, Pandas, Statistics @adam_rab in London, United Kingdom http://t.co/pIIJVPCuN8\u2026'})
Traceback (most recent call last):
  File "D:\Program Files\Anaconda\lib\site-packages\IPython\core\interactiveshell.py", line 3035, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-529-b1b81c04d5ad>", line 1, in <module>
    tweetsdb.save(tweets_listdt[0])
  File "D:\Program Files\Anaconda\lib\site-packages\pymongo\collection.py", line 1903, in save
    check_keys, manipulate, write_concern)
  File "D:\Program Files\Anaconda\lib\site-packages\pymongo\collection.py", line 430, in _insert
    gen(), check_keys, self.codec_options, sock_info)
InvalidDocument: **Cannot encode object: 2704548373**

This one works okay:

In[531]: tweetsdb.save({'created_at': u'Sun Aug 03 17:07:24 +0000 2014',
 'id': 2704548373,
 'name': u'NoSQL',
 'text': u'RT @BigdataITJobs: Data Scientist \u2013 Machine learning, Python, Pandas, Statistics @adam_rab in London, United Kingdom http://t.co/pIIJVPCuN8\u2026'})
Out[530]: **ObjectId('554b38d5c3d89c09688b1149')**

Update on 5/10

Thanks Bernie. The PyMongo version I'm using is 3.0.1.

here is the check of id's data type:

In[36]:type(tweets_listdt[0]['id'])
Out[37]:long

If I just use:

for tweet in tweets_listdt:
    tweetsdb.save(tweet)

The error mentioned above would happen.

But if I add in this line, everything is okay:

tweet['id'] = int(tweet['id'])

And when I directly assign

tweets_listdtw = {'created_at': u'Sun Aug 03 17:07:24 +0000 2014',
 'id': 2704548373,
 'name': u'NoSQL',
 'text': u'RT @BigdataITJobs: Data Scientist'}

tweetsdb.save(tweets_listdtw) is working, and

print type(tweets_listdtw['id'])
<type 'numpy.int64'>

Got confused again... So definitely long type is okay...But why after I change 'id' into int, the saving is working?

like image 727
aeoluseros Avatar asked May 07 '15 10:05

aeoluseros


2 Answers

  1. If you have numpy object for ex. int or float in the json/dict data_dict which you want to send over mongo using pymongo.
  2. one might get "cannot encode object" error, to resolve this I have used a custom encoder like this.

class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, numpy.integer):
            return int(obj)
        elif isinstance(obj, numpy.floating):
            return float(obj)
        elif isinstance(obj, numpy.ndarray):
            return obj.tolist()
        else:
            return super(CustomEncoder, self).default(obj)
        
data_dict_1 = json.dumps(data_dict,cls=CustomEncoder)
data_dict_final  = json.loads(data_dict_1)
  • Please check out docs here https://docs.python.org/3/library/json.html
  • this way does not matter how your Json data is organised, it works.
like image 160
rgv Avatar answered Oct 16 '22 00:10

rgv


Your problem is that numpy.int64 is foreign to MongoDB. I have had the same problem.

The solution is to convert the offending values to a datatype that MongoDB will understand, here is an example how I converted those offending values in my code:

try:
    collection.insert(r)
except pymongo.errors.InvalidDocument:
    # Python 2.7.10 on Windows and Pymongo are not forgiving
    # If you have foreign data types you have to convert them
    n = {}
    for k, v in r.items():
        if isinstance(k, unicode):
            for i in ['utf-8', 'iso-8859-1']:
                try:
                    k = k.encode(i)
                except (UnicodeEncodeError, UnicodeDecodeError):
                    continue
        if isinstance(v, np.int64):
            self.info("k is %s , v is %s" % (k, v))
            v = int(v)
            self.info("V is %s" % v)
        if isinstance(v, unicode):
            for i in ['utf-8', 'iso-8859-1']:
                try:
                    v = v.encode(i)
                except (UnicodeEncodeError, UnicodeDecodeError):
                    continue

        n[k] = v

    collection.insert(n)

I hope this helps you.

like image 17
oz123 Avatar answered Oct 15 '22 23:10

oz123