Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Read several json files from a folder

I would like to know how to read several json files from a single folder (without specifying the files names, just that they are json files).

Also, it is possible to turn them into a pandas DataFrame?

Can you give me a basic example?

like image 726
donpresente Avatar asked May 29 '15 21:05

donpresente


People also ask

How do I iterate multiple JSON files in Python?

Just put your files into a folder and then loop through the files in the folder like so. Show activity on this post. Add, and use, glob to iterate over files with certain file pattern.

How do I read all files in a directory in Python?

os. listdir() method in python is used to get the list of all files and directories in the specified directory. If we don't specify any directory, then list of files and directories in the current working directory will be returned.


1 Answers

One option is listing all files in a directory with os.listdir and then finding only those that end in '.json':

import os, json import pandas as pd  path_to_json = 'somedir/' json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')] print(json_files)  # for me this prints ['foo.json'] 

Now you can use pandas DataFrame.from_dict to read in the json (a python dictionary at this point) to a pandas dataframe:

montreal_json = pd.DataFrame.from_dict(many_jsons[0]) print montreal_json['features'][0]['geometry'] 

Prints:

{u'type': u'Point', u'coordinates': [-73.6051013, 45.5115944]} 

In this case I had appended some jsons to a list many_jsons. The first json in my list is actually a geojson with some geo data on Montreal. I'm familiar with the content already so I print out the 'geometry' which gives me the lon/lat of Montreal.

The following code sums up everything above:

import os, json import pandas as pd  # this finds our json files path_to_json = 'json/' json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]  # here I define my pandas Dataframe with the columns I want to get from the json jsons_data = pd.DataFrame(columns=['country', 'city', 'long/lat'])  # we need both the json and an index number so use enumerate() for index, js in enumerate(json_files):     with open(os.path.join(path_to_json, js)) as json_file:         json_text = json.load(json_file)          # here you need to know the layout of your json and each json has to have         # the same structure (obviously not the structure I have here)         country = json_text['features'][0]['properties']['country']         city = json_text['features'][0]['properties']['name']         lonlat = json_text['features'][0]['geometry']['coordinates']         # here I push a list of data into a pandas DataFrame at row given by 'index'         jsons_data.loc[index] = [country, city, lonlat]  # now that we have the pertinent json data in our DataFrame let's look at it print(jsons_data) 

for me this prints:

  country           city                   long/lat 0  Canada  Montreal city  [-73.6051013, 45.5115944] 1  Canada        Toronto  [-79.3849008, 43.6529206] 

It may be helpful to know that for this code I had two geojsons in a directory name 'json'. Each json had the following structure:

{"features": [{"properties": {"osm_key":"boundary","extent": [-73.9729016,45.7047897,-73.4734865,45.4100756], "name":"Montreal city","state":"Quebec","osm_id":1634158, "osm_type":"R","osm_value":"administrative","country":"Canada"}, "type":"Feature","geometry": {"type":"Point","coordinates": [-73.6051013,45.5115944]}}], "type":"FeatureCollection"} 
like image 157
Scott Avatar answered Sep 28 '22 07:09

Scott