Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iterable from pandas dataframe

I need to create an iterable of the form (id, {feature name: features weight}) for using a python package.

my data are store in a pandas dataframe, here an example:

data = pd.DataFrame({"id":[1,2,3],
                    "gender":[1,0,1],
                    "age":[25,23,40]})

for the {feature name: features weight}) part, I know I can use this:

fe = data.to_dict(orient='records')
Out[28]: 
[{'age': 25, 'gender': 1, 'id': 1},
 {'age': 23, 'gender': 0, 'id': 2},
 {'age': 40, 'gender': 1, 'id': 3}]

I know I can also iterate over the datframe for get the id, like this:

(row[1] for row in data.itertuples())

But I can get this two together to get one iterable (generator object ) I tried :

((row[1] for row in data.itertuples()),fe[i] for i in range(len(data)))

but the syntax is wrong. Do you guys know how to do so ?

like image 845
blabla Avatar asked Jun 25 '18 10:06

blabla


People also ask

Is a Pandas DataFrame iterable?

Since a column of a Pandas DataFrame is an iterable, we can utilize zip to produce a tuple for each row just like itertuples , without all the pandas overhead!

How do I iterate through Dataframes in Pandas?

In order to iterate over rows, we apply a function itertuples() this function return a tuple for each row in the DataFrame. The first element of the tuple will be the row's corresponding index value, while the remaining values are the row values.

What does Iterrows do in Pandas?

Pandas DataFrame iterrows() Method The iterrows() method generates an iterator object of the DataFrame, allowing us to iterate each row in the DataFrame. Each iteration produces an index object and a row object (a Pandas Series object).


1 Answers

pd.DataFrame.itertuples returns named tuples. You can iterate and convert each row to a dictionary via the purpose-built method _asdict. You can wrap this in a generator function to create a lazy reader:

data = pd.DataFrame({"id":[1,2,3],
                    "gender":[1,0,1],
                    "age":[25,23,40]})

def gen_rows(df):
    for row in df.itertuples(index=False):
        yield row._asdict()

G = gen_rows(data)

print(next(G))  # OrderedDict([('age', 25), ('gender', 1), ('id', 1)])
print(next(G))  # OrderedDict([('age', 23), ('gender', 0), ('id', 2)])
print(next(G))  # OrderedDict([('age', 40), ('gender', 1), ('id', 3)])

Note that the result will be OrderedDict objects. As a subclass of dict, for most purposes this should be sufficient.

like image 133
jpp Avatar answered Sep 21 '22 14:09

jpp