Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I improve this loop in Python to increase speed

I am reading in data from police.uk's API as a JSON file and then iterating through the file to pass the data into a pandas dataframe.

The loop that I am using to extract from the JSON and put into the DataFrame is going very slowly.

The police API only allows you to download data one month at a time. I have therefore used a list of dates and a loop to download more than one month of data.

The code is below. Is there any way to improve this loop to make it go faster? I would like to download more data from the API.

import pandas as pd
import numpy as np
import requests
import json
import matplotlib.pyplot as plt
import datetime

%matplotlib inline
plt.rcParams['figure.figsize'] = (10, 5)

#Create list of periods to download data for
periods = ['2017-03', '2017-04', '2017-05', '2017-06']

#Create empty list to store retrieved JSON data
data = []

Police API URL

url = 'https://data.police.uk/api/crimes-street/all-crime'

Loop to download data and append list

for date in periods:
    parameters = {'poly': '51.6,0.06:51.6,0.2:51.5,0.2:51.5,0.062', 'date': date}
    #Query API for data
    response = requests.get(url, params=parameters)
    data += json.loads(response.content)
    print(len(data))

Create empty dataframe

df = pd.DataFrame()

=============================================

---This is the part that runs very slow -----

extract relevant parts from JSON file into Dataframe

for i in range(len(data)):
    df.loc[i, 'id'] = data[i]['id']
    df.loc[i, 'category'] = data[i]['category']
    df.loc[i, 'month'] = data[i]['month']
    df.loc[i, 'latitude'] = data[i]['location']['latitude']
    df.loc[i, 'longitude'] = data[i]['location']['longitude']

==============================================

like image 527
Michael Sinclair Avatar asked Dec 07 '25 10:12

Michael Sinclair


1 Answers

Use json_normalize with rename columns:

from pandas.io.json import json_normalize

data = [{'id': 1, 'category': 2, 'month': 1, 'location': {'latitude': 100, 'longitude': 200}},
        {'id': 2, 'category': 3, 'month': 2, 'location': {'latitude': 500, 'longitude': 100}}]

df = json_normalize(data)
df = df.rename(columns={'location.latitude':'latitude','location.longitude':'longitude'})
print (df)
   category  id  latitude  longitude  month
0         2   1       100        200      1
1         3   2       500        100      2
like image 120
jezrael Avatar answered Dec 10 '25 01:12

jezrael