I'm trying to iterate over the rows of a DataFrame that contains some int64s and some floats. <code>iterrows()</code> seems to be turning my ints into floats, which breaks everything I want to do downstream: <pre class="prettyprint"><code>>>> import pandas as pd >>> df = pd.DataFrame([[10000000000000001, 1.5], [10000000000000002, 2.5]], columns=['id', 'prc']) >>> [id for id in df.id] [10000000000000001, 10000000000000002] >>> [r['id'] for (idx,r) in df.iterrows()] [10000000000000000.0, 10000000000000002.0] </code></pre> Iterating directly over <code>df.id</code> is fine. But through <code>iterrows()</code>, I get different values. Is there a way to iterate over the rows in such a way that I can still index by column name and get all the correct values?

Here's the relevant part of the docs: <blockquote> Because <code>iterrows</code> returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames) [...] To preserve dtypes while iterating over the rows, it is better to use <code>itertuples()</code> which returns namedtuples of the values and which is generally faster as <code>iterrows</code>. </blockquote> Example for your data: <pre class="prettyprint"><code>>>> df = pd.DataFrame([[10000000000000001, 1.5], [10000000000000002, 2.5]], columns=['id', 'prc']) >>> [t[1] for t in df.itertuples()] [10000000000000001, 10000000000000002] </code></pre>

If possible you're better off avoiding iteration. Check if you can vectorize your work first. If vectorization is impossible, you probably want <code>DataFrame.itertuples</code>. That will return an iterable of (named)tuples where the first element is the index label. <pre class="prettyprint"><code>In [2]: list(df.itertuples()) Out[2]: [Pandas(Index=0, id=10000000000000001, prc=1.5), Pandas(Index=1, id=10000000000000002, prc=2.5)] </code></pre> <code>iterrows</code> returns a Series for each row. Since series are backed by numpy arrays, whose elements must all share a single type, your ints were cast as floats.

pandas iterrows changes ints into floats

Tags:

python

pandas

python-2.7

I'm trying to iterate over the rows of a DataFrame that contains some int64s and some floats. iterrows() seems to be turning my ints into floats, which breaks everything I want to do downstream:

>>> import pandas as pd
>>> df = pd.DataFrame([[10000000000000001, 1.5], [10000000000000002, 2.5]], columns=['id', 'prc'])
>>> [id for id in df.id]
[10000000000000001, 10000000000000002]
>>> [r['id'] for (idx,r) in df.iterrows()]
[10000000000000000.0, 10000000000000002.0]

Iterating directly over df.id is fine. But through iterrows(), I get different values. Is there a way to iterate over the rows in such a way that I can still index by column name and get all the correct values?

268

asked Jan 12 '16 17:01

Barry

2 Answers

Here's the relevant part of the docs:

Because iterrows returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames) [...] To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally faster as iterrows.

Example for your data:

>>> df = pd.DataFrame([[10000000000000001, 1.5], [10000000000000002, 2.5]], columns=['id', 'prc'])
>>> [t[1] for t in df.itertuples()]
[10000000000000001, 10000000000000002]

answered Oct 13 '22 22:10

timgeb

If possible you're better off avoiding iteration. Check if you can vectorize your work first.

If vectorization is impossible, you probably want DataFrame.itertuples. That will return an iterable of (named)tuples where the first element is the index label.

In [2]: list(df.itertuples())
Out[2]:
[Pandas(Index=0, id=10000000000000001, prc=1.5),
 Pandas(Index=1, id=10000000000000002, prc=2.5)]

iterrows returns a Series for each row. Since series are backed by numpy arrays, whose elements must all share a single type, your ints were cast as floats.

answered Oct 13 '22 22:10

TomAugspurger

Related questions
                            
                                replace rows in a pandas data frame
                            
                                Why are explicit calls to magic methods slower than "sugared" syntax?
                            
                                Scrapy CrawlSpider for AJAX content
                            
                                Pycharm: Expected type 'Integral', got 'str' instead
                            
                                make python wait for stored procedure to finish executing
                            
                                pandas: Boolean indexing with multi index
                            
                                testing click python applications
                            
                                Store functions in list and call them later
                            
                                Why python debugger always get this timeout waiting for response on 113 when using Pycharm?
                            
                                Python pandas dataframe - any way to set frequency programmatically?
                            
                                How does this function to remove duplicate characters from a string in python work?
                            
                                Merging a pandas groupby result back into DataFrame
                            
                                Open a new scratch file in PyCharm?
                            
                                Why does my Sieve of Eratosthenes work faster with integers than with booleans?
                            
                                Django Signals: using update_field as condition
                            
                                Per-class constants in Python
                            
                                How to test Pl/Python PostgreSQL procedures with Travis CI?
                            
                                convert Integers to RGB values and back with Python
                            
                                Airflow not scheduling Correctly Python
                            
                                How to limit query results with Django Rest filters

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With