Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validate and format JSON files

Tags:

python

json

I have around 2000 JSON files which I'm trying to run through a Python program. A problem occurs when a JSON file is not in the correct format. (Error: ValueError: No JSON object could be decoded) In turn, I can't read it into my program.

I am currently doing something like the below:

for files in folder:     with open(files) as f:         data = json.load(f); # It causes an error at this part 

I know there's offline methods to validating and formatting JSON files but is there a programmatic way to check and format these files? If not, is there a free/cheap alternative to fixing all of these files offline i.e. I just run the program on the folder containing all the JSON files and it formats them as required?


SOLVED using @reece's comment:

invalid_json_files = [] read_json_files = [] def parse():     for files in os.listdir(os.getcwd()):         with open(files) as json_file:             try:                 simplejson.load(json_file)                 read_json_files.append(files)             except ValueError, e:                 print ("JSON object issue: %s") % e                 invalid_json_files.append(files)     print invalid_json_files, len(read_json_files) 

Turns out that I was saving a file which is not in JSON format in my working directory which was the same place I was reading data from. Thanks for the helpful suggestions.

like image 282
Black Avatar asked Apr 28 '14 15:04

Black


People also ask

How do I validate a JSON file?

The simplest way to check if JSON is valid is to load the JSON into a JObject or JArray and then use the IsValid(JToken, JsonSchema) method with the JSON Schema. To get validation error messages, use the IsValid(JToken, JsonSchema, IList<String> ) or Validate(JToken, JsonSchema, ValidationEventHandler) overloads.

What is JSON formatter and validator?

JSON Formatter and JSON Validator help to auto format JSON and validate your JSON text. It also provides a tree view that helps to navigate your formatted JSON data. It helps to validate JSON online with Error Messages. It's the only JSON tool that shows the image on hover on Image URL in a tree view.

How do I format a JSON file?

Formatting# You can format your JSON document using Ctrl+Shift+I or Format Document from the context menu.


2 Answers

The built-in JSON module can be used as a validator:

import json  def parse(text):     try:         return json.loads(text)     except ValueError as e:         print('invalid json: %s' % e)         return None # or: raise 

You can make it work with files by using:

with open(filename) as f:     return json.load(f) 

instead of json.loads and you can include the filename as well in the error message.

On Python 3.3.5, for {test: "foo"}, I get:

invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) 

and on 2.7.6:

invalid json: Expecting property name: line 1 column 2 (char 1) 

This is because the correct json is {"test": "foo"}.

When handling the invalid files, it is best to not process them any further. You can build a skipped.txt file listing the files with the error, so they can be checked and fixed by hand.

If possible, you should check the site/program that generated the invalid json files, fix that and then re-generate the json file. Otherwise, you are going to keep having new files that are invalid JSON.

Failing that, you will need to write a custom json parser that fixes common errors. With that, you should be putting the original under source control (or archived), so you can see and check the differences that the automated tool fixes (as a sanity check). Ambiguous cases should be fixed by hand.

like image 132
reece Avatar answered Oct 03 '22 23:10

reece


Yes, there are ways to validate that a JSON file is valid. One way is to use a JSON parsing library that will throw exceptions if the input you provide is not well-formatted.

try:    load_json_file(filename) except InvalidDataException: # or something    # oops guess it's not valid 

Of course, if you want to fix it, you naturally cannot use a JSON loader since, well, it's not valid JSON in the first place. Unless the library you're using will automatically fix things for you, in which case you probably wouldn't even have this question.

One way is to load the file manually and tokenize it and attempt to detect errors and try to fix them as you go, but I'm sure there are cases where the error is just not possible to fix automatically and would be better off throwing an error and asking the user to fix their files.

I have not written a JSON fixer myself so I can't provide any details on how you might go about actually fixing errors.

However I am not sure whether it would be a good idea to fix all errors, since then you'd have assume your fixes are what the user actually wants. If it's a missing comma or they have an extra trailing comma, then that might be OK, but there may be cases where it is ambiguous what the user wants.

like image 33
MxLDevs Avatar answered Oct 04 '22 00:10

MxLDevs