Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update a pandas dataframe, from multiple API calls

I need to do a python script to

  1. Read a csv file with the columns (person_id, name, flag). The file has 3000 rows.
  2. Based on the person_id from the csv file, I need to call a URL passing the person_id to do a GET http://api.myendpoint.intranet/get-data/1234 The URL will return some information of the person_id, like example below. I need to get all rents objects and save on my csv. My output needs to be like this
import pandas as pd
import requests

ids = pd.read_csv(f"{path}/data.csv", delimiter=';')
person_rents = df = pd.DataFrame([], columns=list('person_id','carId','price','rentStatus'))

for id in ids:
    response = request.get(f'endpoint/{id["person_id"]}')
    json = response.json()
    person_rents.append( [person_id, rent['carId'], rent['price'], rent['rentStatus'] ] )
    pd.read_csv(f"{path}/data.csv", delimiter=';' )
person_id;name;flag;cardId;price;rentStatus
1000;Joseph;1;6638;1000;active
1000;Joseph;1;5566;2000;active

Response example

{
    "active": false,
    "ctodx": false,
    "rents": [{
            "carId": 6638,
            "price": 1000,
            "rentStatus": "active"
        }, {
            "carId": 5566,
            "price": 2000,
            "rentStatus": "active"
        }
    ],
    "responseCode": "OK",
    "status": [{
            "request": 345,
            "requestStatus": "F"
        }, {
            "requestId": 678,
            "requestStatus": "P"
        }
    ],
    "transaction": false
}
  1. After save the additional data from response on csv, i need to get data from another endpoint using the carId on the URL. The mileage result must be save in the same csv. http://api.myendpoint.intranet/get-mileage/6638 http://api.myendpoint.intranet/get-mileage/5566

The return for each call will be like this

{"mileage":1000.0000}
{"mileage":550.0000}

The final output must be

person_id;name;flag;cardId;price;rentStatus;mileage
1000;Joseph;1;6638;1000;active;1000.0000
1000;Joseph;1;5566;2000;active;550.0000

SOmeone can help me with this script? Could be with pandas or any python 3 lib.

like image 705
Malkath Avatar asked Sep 29 '20 20:09

Malkath


People also ask

How do you overwrite a DataFrame in Python?

Suppose that you want to replace multiple values with multiple new values for an individual DataFrame column. In that case, you may use this template: df['column name'] = df['column name']. replace(['1st old value', '2nd old value', ...], ['1st new value', '2nd new value', ...])


1 Answers

Code Explanation

  • Create dataframe, df, with pd.read_csv.
    • It is expected that all of the values in 'person_id', are unique.
  • Use .apply on 'person_id', to call prepare_data.
    • prepare_data expects 'person_id' to be a str or int, as indicated by the type annotation, Union[int, str]
  • Call the API, which will return a dict, to the prepare_data function.
  • Convert the 'rents' key, of the dict, into a dataframe, with pd.json_normalize.
  • Use .apply on 'carId', to call the API, and extract the 'mileage', which is added to dataframe data, as a column.
  • Add 'person_id' to data, which can be used to merge df with s.
  • Convert pd.Series, s to a dataframe, with pd.concat, and then merge df and s, on person_id.
  • Save to a csv with pd.to_csv in the desired form.

Potential Issues

  • If there's an issue, it's most likely to occur in the call_api function.
  • As long as call_api returns a dict, like the response shown in the question, the remainder of the code will work correctly to produce the desired output.
import pandas as pd
import requests
import json
from typing import Union

def call_api(url: str) -> dict:
    r = requests.get(url)
    return r.json()

def prepare_data(uid: Union[int, str]) -> pd.DataFrame:
    
    d_url = f'http://api.myendpoint.intranet/get-data/{uid}'
    m_url = 'http://api.myendpoint.intranet/get-mileage/'
    
    # get the rent data from the api call
    rents = call_api(d_url)['rents']
    # normalize rents into a dataframe
    data = pd.json_normalize(rents)
    
    # get the mileage data from the api call and add it to data as a column
    data['mileage'] = data.carId.apply(lambda cid: call_api(f'{m_url}{cid}')['mileage'])
    # add person_id as a column to data, which will be used to merge data to df
    data['person_id'] = uid
    
    return data
    

# read data from file
df = pd.read_csv('file.csv', sep=';')

# call prepare_data
s = df.person_id.apply(prepare_data)

# s is a Series of DataFrames, which can be combined with pd.concat
s = pd.concat([v for v in s])

# join df with s, on person_id
df = df.merge(s, on='person_id')

# save to csv
df.to_csv('output.csv', sep=';', index=False)
  • If there are any errors when running this code:
    1. Leave a comment, to let me know.
    2. edit your question, and paste the entire TraceBack, as text, into a code block.

Example

# given the following start dataframe
   person_id    name  flag
0       1000  Joseph     1
1        400     Sam     1

# resulting dataframe using the same data for both id 1000 and 400
   person_id    name  flag  carId  price rentStatus  mileage
0       1000  Joseph     1   6638   1000     active   1000.0
1       1000  Joseph     1   5566   2000     active   1000.0
2        400     Sam     1   6638   1000     active   1000.0
3        400     Sam     1   5566   2000     active   1000.0
like image 153
Trenton McKinney Avatar answered Oct 23 '22 04:10

Trenton McKinney