I can't seem to find any built-in way of simply converting a list of Pydantic BaseModels to a Pandas Dataframe.
from pydantic import BaseModel
import pandas as pd
class SomeModel(BaseModel):
col1: str
col2: str
data = [SomeModel(**{'col1': 'foo', 'col2': 'bar'})] * 10
pd.DataFrame(data)
Output
>> 0 1
>> 0 (col1, foo) (col2, bar)
>> 1 (col1, foo) (col2, bar)
>> ...
In this way the columns are loaded as data. A workaround is to do the following
pd.Dataframe([model.dict() for model in data])
Output
>> col1 col2
>> 0 foo bar
>> 1 foo bar
>> ...
However this method is a bit slow for larger amounts of data. Is there a faster way?
To convert a Python tuple to DataFrame, use the pd. DataFrame() constructor that accepts a tuple as an argument and it returns a DataFrame.
When we create Dataframe from a list of dictionaries, matching keys will be the columns and corresponding values will be the rows of the Dataframe. If there are no matching values and columns in the dictionary, then the NaN value will be inserted into the resulted Dataframe.
You can insert a list of values into a cell in Pandas DataFrame using DataFrame.at() , DataFrame. iat() , and DataFrame.
Python / October 18, 2019. At times, you may need to convert your list to a DataFrame in Python. You may then use this template to convert your list to pandas DataFrame: from pandas import DataFrame your_list = ['item1', 'item2', 'item3',...] df = DataFrame (your_list,columns= ['Column_Name']) In the next section, I’ll review few examples ...
A data frame could be a two-dimensional data structure, i.e., knowledge is aligned in a very tabular fashion in rows and columns. Pandas Dataframe consists of 3 principal elements, the data, rows, and columns. There are many ways to create a data frame from the list.
In this tutorial, you’ll see how to convert Pandas Series to a DataFrame. You’ll also observe how to convert multiple Series into a DataFrame. To begin, here is the syntax that you may use to convert your Series to a DataFrame: df = my_series.to_frame () Alternatively, you can use this approach to convert your Series: df = pd.DataFrame (my_series)
Pandas is a software library written for the Python programming language for data manipulation and analysis. Pandas Dataframe is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).
A quick and dirty profiling yield the following values:
from pydantic import BaseModel
import pandas as pd
from fastapi.encoders import jsonable_encoder
class SomeModel(BaseModel):
col1: int
col2: str
data = [SomeModel(col1=1,col2="foo"),SomeModel(col1=2,col2="bar")]*4*10**5
import cProfile
cProfile.run( 'pd.DataFrame([s.dict() for s in data])' ) # around 8.2s
cProfile.run( 'pd.DataFrame(jsonable_encoder(data))' ) # around 30.8s
cProfile.run( 'pd.DataFrame([s.__dict__ for s in data])' ) # around 1.7s
cProfile.run( 'pd.DataFrame([dict(s) for s in data])' ) # around 3s
Not sure if it's faster, but FastAPI exposes jsonable_encoder
which essentially performs that same transformation on an arbitrarily nested structure of BaseModel
:
from fastapi.encoders import jsonable_encoder
pd.DataFrame(jsonable_encoder(data))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With