I have a dataframe where I want to print each row to a different file. When the dataframe consists of e.g. only 50 rows, len(df)
will print 50
and iterating over the rows of the dataframe like
for index, row in df.iterrows():
print(index)
will print the index from 0
to 49
.
However, if my dataframe contains more than 50'000 rows, len(df)
and the number of iterations when iterating over df.iterrows()
differ significantly. For example, len(df)
will say e.g. 50'554 and printing the index will go up to over 400'000.
How can this be? What am I missing here?
First, as @EdChum noted in the comment, your question's title refers to iterrows
, but the example you give refers to iteritems
, which loops in the orthogonal direction to that relevant to len
. I assume you meant iterrows
(as in the title).
Note that a DataFrame's index need not be a running index, irrespective of the size of the DataFrame. For example:
df = pd.DataFrame({'a': [1, 2, 3, 4]}, index=[2, 4, 5, 1000])
>>> for index, row in df.iterrows():
... print index
2
4
5
1000
Presumably, your long DataFrame was just created differently, then, or underwent some manipulation, affecting the index.
If you really must iterate with a running index, you can use Python's enumerate
:
>>> for index, row in enumerate(df.iterrows()):
... print index
0
1
2
3
(Note that, in this case, row
is itself a tuple.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With