Python Segmentation Fault when using json.loads -- alternative way to load JSON into a list?

Question

I am trying to load some JSON Twitter data into a list, but instead I'm getting segmemtation fault (core dumped).

While I would love to upgrade my memory, that simply isn't an option right now. I would like to know if there is some way to maybe iterate over this list instead of trying to trying to load it all into memory? Or maybe there is a different kind of solution to this problem that will allow me to load this JSON data into a list?

In [1]: import json

In [2]: data = []

In [3]: for i in open('tweets.json'):
   ...:     try:
   ...:         data.append(json.loads(i))
   ...:     except:
   ...:         pass
   ...:     

Segmentation fault (core dumped)

The data was collected using the Twitter Streaming API over about 10 days and is 213M in size.

Machine Specs:

Oracle VM Virtual Box
Operating System: Ubuntu (64 bit)
Base Memory: 1024 MB
Video Memory: 128 MB
Storage (Virtual Size): 8.00 GB Dynamically allocated

I'm using iPython (version 2.7.6), and accessing it through a Linux terminal window.

ely · Accepted Answer

On almost any modern machine, a 213MB file is very tiny and easily fits into memory. I've loaded larger tweet datasets into memory on average modern machines. But perhaps you (or someone else reading this later) aren't working on a modern machine, or it is a modern machine with an especially small memory capacity.

If it is indeed the size of the data causing the segmentation fault, then you may try the ijson module for iterating over chunks of the JSON document.

Here's an example from that project's page:

import ijson

parser = ijson.parse(urlopen('http://.../'))
stream.write('<geo>')
for prefix, event, value in parser:
    if (prefix, event) == ('earth', 'map_key'):
        stream.write('<%s>' % value)
        continent = value
    elif prefix.endswith('.name'):
        stream.write('<object name="%s"/>' % value)
    elif (prefix, event) == ('earth.%s' % continent, 'end_map'):
        stream.write('</%s>' % continent)
stream.write('</geo>')

Python Segmentation Fault when using json.loads -- alternative way to load JSON into a list?

Tags:

python

json

CurtLH

1 Answers

ely

Recent Activity

Donate For Us

Python Segmentation Fault when using json.loads -- alternative way to load JSON into a list?

Tags:

python

json

CurtLH

1 Answers

ely

Related questions

Recent Activity

Donate For Us