I have been trying to extract only certain data from a JSON file. I managed to decode the JSON and get the wanted data into a python dict. When I print out the dict it shows all the wanted data, but when I try to write the dict into a new file, only the last object gets written. One thing that I can't understand is also why when I print the dict I get multiple dicts objects instead of 1 as I would expect.
My code:
import json
input_file=open('json.json', 'r')
output_file=open('test.json', 'w')
json_decode=json.load(input_file)
for item in json_decode:
my_dict={}
my_dict['title']=item.get('labels').get('en').get('value')
my_dict['description']=item.get('descriptions').get('en').get('value')
my_dict['id']=item.get('id')
print my_dict
back_json=json.dumps(my_dict, output_file)
output_file.write(back_json)
output_file.close()
my json.json file:
[
{"type":"item","labels":{"en":{"language":"en","value":"George Washington"}},"descriptions":{"en":{"language":"en","value":"American politician, 1st president of the United States (in office from 1789 to 1797)"}},"id":"Q23"},
{"type":"item","aliases":{"en":[{"language":"en","value":"Douglas Noël Adams"},{"language":"en","value":"Douglas Noel Adams"}]},"labels":{"en":{"language":"en","value":"Douglas Adams"}},"descriptions":{"en":{"language":"en","value":"English writer and humorist"}},"id":"Q42"},
{"type":"item","aliases":{"en":[{"language":"en","value":"George Bush"},{"language":"en","value":"George Walker Bush"}]},"labels":{"en":{"language":"en","value":"George W. Bush"}},"descriptions":{"en":{"language":"en","value":"American politician, 43rd president of the United States from 2001 to 2009"}},"id":"Q207"},
{"type":"item","aliases":{"en":[{"language":"en","value":"Velázquez"},{"language":"en","value":"Diego Rodríguez de Silva y Velázquez"}]},"labels":{"en":{"language":"en","value":"Diego Velázquez"}},"descriptions":{"en":{"language":"en","value":"Spanish painter who was the leading artist in the court of King Philip IV"}},"id":"Q297"},
{"type":"item","labels":{"en":{"language":"en","value":"Eduardo Frei Ruiz-Tagle"}},"descriptions":{"en":{"language":"en","value":"Chilean politician and former President"}},"id":"Q326"}
]
print my_dict output:
{'id': u'Q23', 'description': u'American politician, 1st president of the United States (in office from 1789 to 1797)', 'title': u'George Washington'}
{'id': u'Q42', 'description': u'English writer and humorist', 'title': u'Douglas Adams'}
{'id': u'Q207', 'description': u'American politician, 43rd president of the United States from 2001 to 2009', 'title': u'George W. Bush'}
{'id': u'Q297', 'description': u'Spanish painter who was the leading artist in the court of King Philip IV', 'title': u'Diego Vel\xe1zquez'}
{'id': u'Q326', 'description': u'Chilean politician and former President', 'title': u'Eduardo Frei Ruiz-Tagle'}
output in the file test.json:
{"id": "Q326", "description": "Chilean politician and former President", "title": "Eduardo Frei Ruiz-Tagle"}
Also I would like to know why the dict is outputing 'title': u'Diego Vel\xe1zquez' but if i go print my_dict.values()[2] i Get the name written normaly as Diego Velázquez.
Many thanks
To extract the name and projects properties from the JSON string, use the json_extract function as in the following example. The json_extract function takes the column containing the JSON string, and searches it using a JSONPath -like expression with the dot . notation. JSONPath performs a simple tree traversal.
python - json. loads allows duplicate keys in a dictionary, overwriting the first value - Stack Overflow. Stack Overflow for Teams – Start collaborating and sharing organizational knowledge.
Your code creates new dictionary object for each object with:
my_dict={}
Moreover, it overwrites the previous contents of the variable. Old dictionary in m_dict is deleted from memory.
Try to create a list before your for loop and store the result there.
result = []
for item in json_decode:
my_dict={}
my_dict['title']=item.get('labels').get('en').get('value')
my_dict['description']=item.get('descriptions').get('en').get('value')
my_dict['id']=item.get('id')
print(my_dict)
result.append(my_dict)
Finally, write the result to the output:
back_json=json.dumps(result)
Printing the dictionary object aims to help the developer by showing the type of the data. In u'Diego Vel\xe1zquez', u at the start indicates a Unicode object (string). When object using is printed, it is decoded according to current language settings in your OS.
When you do this:
for item in json_decode:
You are looping through each line in the file.
Every time through the loop you are overriding the my_dict variable, which is why you get only one line in your output.
Once you load in the file, you can simply print out the json_decode
variable to do what you want.
https://docs.python.org/3.3/library/json.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With