I have an array of objects of this class
class CancerDataEntity(Model): age = columns.Text(primary_key=True) gender = columns.Text(primary_key=True) cancer = columns.Text(primary_key=True) deaths = columns.Integer() ...
When printed, array looks like this
[CancerDataEntity(age=u'80-85+', gender=u'Female', cancer=u'All cancers (C00-97,B21)', deaths=15306), CancerDataEntity(...
I want to convert this to a data frame so I can play with it in a more suitable way to me - to aggregate, count, sum and similar. How I wish this data frame to look, would be something like this:
age gender cancer deaths 0 80-85+ Female ... 15306 1 ...
Is there a way to achieve this using numpy/pandas easily, without manually processing the input array?
How do you convert an array to a DataFrame in Python? To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) .
Example 1: When we create Dataframe from a list of dictionaries, matching keys will be the columns and corresponding values will be the rows of the Dataframe. If there are no matching values and columns in the dictionary, then the NaN value will be inserted into the resulted Dataframe.
A much cleaner way to to this is to define a to_dict
method on your class and then use pandas.DataFrame.from_records
class Signal(object): def __init__(self, x, y): self.x = x self.y = y def to_dict(self): return { 'x': self.x, 'y': self.y, }
e.g.
In [87]: signals = [Signal(3, 9), Signal(4, 16)] In [88]: pandas.DataFrame.from_records([s.to_dict() for s in signals]) Out[88]: x y 0 3 9 1 4 16
Just use:
DataFrame([o.__dict__ for o in my_objs])
Full example:
import pandas as pd # define some class class SomeThing: def __init__(self, x, y): self.x, self.y = x, y # make an array of the class objects things = [SomeThing(1,2), SomeThing(3,4), SomeThing(4,5)] # fill dataframe with one row per object, one attribute per column df = pd.DataFrame([t.__dict__ for t in things ]) print(df)
This prints:
x y 0 1 2 1 3 4 2 4 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With