I am trying to convert a Pydantic model to a Pandas DataFrame
, but I am getting various errors.
Here is the code:
from typing import Optional
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
import sklearn
import pandas as pd
import numpy as np
class Userdata(BaseModel):
current_res_month_dec: Optional[int] = 0
current_res_month_nov: Optional[int] = 0
async def return_recurrent_user_predictions_gb(user_data: Userdata):
empty_dataframe = pd.DataFrame([Userdata(**{
'current_res_month_dec': user_data.current_res_month_dec,
'current_res_month_nov': user_data.current_res_month_nov})], ignore_index=True)
This is the DataFrame
that is returned when trying to execute it through /docs
in my local environment:
Response body
Download
{
"0": {
"0": [
"current_res_month_dec",
0
]
},
"1": {
"0": [
"current_res_month_nov",
0
]
}
but if I try to use this DataFrame
for a prediction:
model_has_afternoon = pickle.load(open('./models/model_gbclf_prob_current_product_has_afternoon.pickle', 'rb'))
result_afternoon = model_has_afternoon.predict_proba(empty_dataframe)[:, 1]
I get this error:
ValueError: setting an array element with a sequence.
I have tried building my own DataFrame
before, and the predictions should work with a DataFrame
.
You first need to convert the Pydantic model into a dictionary using Pydantic's dict()
method. Note that other methods, such as Python's dict()
function and .__dict__
attribute, have been found to be faster alternatives to Pydantic's dict()
method (see this answer). However, since you are using a Pydantic model, it might be best to use Pydantic's dict()
method, and then pass the dictionary to pandas.DataFrame()
surrounded by square brackets; for example, pd.DataFrame([data.dict()])
. As described in this answer, this approach can be used when you need the keys of the passed dict
to be the columns and the values to be the rows. If you need to specify a different orientation, you can also use pandas.DataFrame.from_dict()
. Afterwards, you can call model.predict(df)
to get predictions, as demonstrated here and here.
from typing import Optional
from fastapi import FastAPI
from pydantic import BaseModel
import pandas as pd
app = FastAPI()
class Userdata(BaseModel):
col1: Optional[int] = 0
col2: Optional[int] = 0
col3: str = "foo"
@app.post('/submit')
def submit_data(data: Userdata):
df = pd.DataFrame([data.dict()])
# pred = model.predict(df)
return "Success"
As you mentioned that you would like to use the DataFrame
for Machine Learning predictions, it should be noted that there are a few other options to pass the data to predict()
and predict_proba()
functions that do not require to create a DataFrame
. These options include:
model.predict([[data.col1, data.col2, data.col3]])
and
model.predict([list(data.dict().values())])
Please have a look at this answer for more details. In case you would also need to respond back to the client with a DataFrame
in JSON format, please take a look here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With