I have a large file formatted like the following:
"string in quotes"
string
string
string
number
|-
...this repeats for a while. I'm trying to convert it to JSON, so each of the chunks is like this:
"name": "string in quotes"
"description": "string"
"info": "string"
"author": "string"
"year": number
This is what I have so far:
import shutil
import os
import urllib
myFile = open('unformatted.txt','r')
newFile = open("formatted.json", "w")
newFile.write('{'+'\n'+'list: {'+'\n')
for line in myFile:
newFile.write() // this is where I'm not sure what to write
newFile.write('}'+'\n'+'}')
myFile.close()
newFile.close()
I think I could do something like with the line number modulo something, but I'm not sure if that's the right way to go about it.
You can use itertools.groupby to group all the sections then json.dump the dicts to your json file:
from itertools import groupby
import json
names = ["name", "description","info","author", "year"]
with open("test.csv") as f, open("out.json","w") as out:
grouped = groupby(map(str.rstrip,f), key=lambda x: x.startswith("|-"))
for k,v in grouped:
if not k:
json.dump(dict(zip(names,v)),out)
out.write("\n")
Input:
"string in quotes"
string
string
string
number
|-
"other string in quotes"
string2
string2
string2
number2
Output:
{"author": "string", "name": "\"string in quotes\"", "description": "string", "info": "string", "year": "number"}
{"author": "string2", "name": "\"other string in quotes\"", "description": "string2", "info": "string2", "year": "number2"}
To access just iterate over the file and loads:
In [6]: with open("out.json") as out:
for line in out:
print(json.loads(line))
...:
{'name': '"string in quotes"', 'info': 'string', 'author': 'string', 'year': 'number', 'description': 'string'}
{'name': '"other string in quotes"', 'info': 'string2', 'author': 'string2', 'year': 'number2', 'description': 'string2'}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With