I'm a bit idiot in programming and Python. I know that these are a lot of explanations in previous questions about this but I carefully read all of them and I didn't find the solution.
I'm trying to read a JSON file which contains about 1 billion of data like this:
334465|{"color":"33ef","age":"55","gender":"m"}
334477|{"color":"3444","age":"56","gender":"f"}
334477|{"color":"3999","age":"70","gender":"m"}
I was trying hard to overcome that 6 digit numbers at the beginning of each line, but I dont know how can I read multiple JSON objects? Here is my code but I can't find why it is not working?
import json
T =[]
s = open('simple.json', 'r')
ss = s.read()
for line in ss:
line = ss[7:]
T.append(json.loads(line))
s.close()
And the here is the error that I got:
ValueError: Extra Data: line 3 column 1 - line 5 column 48 (char 42 - 138)
Any suggestion would be very helpful for me!
If you want several objects in a JSON file you need to create an Array and push the objects into that created array.
We can merge two JSON objects using the putAll() method (inherited from interface java.
There are several problems with the logic of your code.
ss = s.read()
reads the entire file s
into a single string. The next line
for line in ss:
iterates over each character in that string, one by one. So on each loop line
is a single character. In
line = ss[7:]
you are getting the entire file contents apart from the first 7 characters (in positions 0 through 6, inclusive) and replacing the previous content of line
with that. And then
T.append(json.loads(line))
attempts to convert that to JSON and store the resulting object into the T
list.
Here's some code that does what you want. We don't need to read the entire file into a string with .read
, or into a list of lines with .readlines
, we can simply put the file handle into a for loop and that will iterate over the file line by line.
We use a with
statement to open the file, so that it will get closed automatically when we exit the with
block, or if there's an IO error.
import json
table = []
with open('simple.json', 'r') as f:
for line in f:
table.append(json.loads(line[7:]))
for row in table:
print(row)
output
{'color': '33ef', 'age': '55', 'gender': 'm'}
{'color': '3444', 'age': '56', 'gender': 'f'}
{'color': '3999', 'age': '70', 'gender': 'm'}
We can make this more compact by building the table
list in a list comprehension:
import json
with open('simple.json', 'r') as f:
table = [json.loads(line[7:]) for line in f]
for row in table:
print(row)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With