I am trying to load some JSON Twitter data into a list, but instead I'm getting segmemtation fault (core dumped)
.
While I would love to upgrade my memory, that simply isn't an option right now. I would like to know if there is some way to maybe iterate over this list instead of trying to trying to load it all into memory? Or maybe there is a different kind of solution to this problem that will allow me to load this JSON data into a list?
In [1]: import json
In [2]: data = []
In [3]: for i in open('tweets.json'):
...: try:
...: data.append(json.loads(i))
...: except:
...: pass
...:
Segmentation fault (core dumped)
The data was collected using the Twitter Streaming API over about 10 days and is 213M in size.
Machine Specs:
I'm using iPython (version 2.7.6), and accessing it through a Linux terminal window.
On almost any modern machine, a 213MB file is very tiny and easily fits into memory. I've loaded larger tweet datasets into memory on average modern machines. But perhaps you (or someone else reading this later) aren't working on a modern machine, or it is a modern machine with an especially small memory capacity.
If it is indeed the size of the data causing the segmentation fault, then you may try the ijson module for iterating over chunks of the JSON document.
Here's an example from that project's page:
import ijson
parser = ijson.parse(urlopen('http://.../'))
stream.write('<geo>')
for prefix, event, value in parser:
if (prefix, event) == ('earth', 'map_key'):
stream.write('<%s>' % value)
continent = value
elif prefix.endswith('.name'):
stream.write('<object name="%s"/>' % value)
elif (prefix, event) == ('earth.%s' % continent, 'end_map'):
stream.write('</%s>' % continent)
stream.write('</geo>')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With