My final goal is making list which contain a pair for corresponding location of dataframes, like below
[df_one_first_element, df_two_first_element, column_first, index_first]
:[0.619159, 0.510162, 20140109,0.50], [0.264191,0.269053,20140213,0.50]...
So I am trying to iterate two dataframe but got stuck now. How could I iterate two dataframe which has exactly same format but different data.
For example, I have two dataframes; df_one and df_two that appear like the below:
df_one =
20140109 20140213 20140313 20140410 20140508 20140612 20140710 \
0.50 0.619159 0.264191 0.438849 0.465287 0.445819 0.412582 0.397366
0.55 0.601379 0.303953 0.457524 0.432335 0.415333 0.382093 0.382361
df_two =
20140109 20140213 20140313 20140410 20140508 20140612 20140710 \
0.50 0.510162 0.269053 0.308494 0.300554 0.294360 0.286980 0.280494
0.55 0.489953 0.258690 0.290044 0.283933 0.278180 0.271426 0.266580
And I want to access the same location of the dataframe by iterating over the whole values in the dataframe.
Firstly I tried iterrows()
i = 0
for index, row in df_one.iterrows():
j= 0
for item in row:
print df_two(i,j)
j= j+1
i = i+1
but as you know we can not access like:
df_two(i,j)
So I am currently lost the way. Or could we access the data by index name and column name?
The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements.
Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.
The output is False because the two dataframes are not equal to each other. They have different elements. Example #2: Use equals() function to test for equality between two data frame object with NaN values. Note : NaNs in the same location are considered equal.
You could use itertools.izip
:
for ( idxRow, s1 ), ( _, s2 ) in itertools.izip( df0.iterrows(), df1.iterrows() ) :
for ( idxCol, v1 ), ( _, v2 ) in itertools.izip( s1.iteritems(), s2.iteritems() ) :
print ( v1, v2, idxCol, idxRow )
In:
X Y Z
a 1.171124 0.853229 1.416635
b 0.971665 -1.727410 -0.055180
Out:
(1.1711241491561419, 1.3715317727366974, 'X', 'a')
(0.85322862359611618, 0.72799908412372294, 'Y', 'a')
(1.4166350896829785, 2.0068549773211006, 'Z', 'a')
(0.9716653056530119, 0.94413346620976102, 'X', 'b')
(-1.727409829928936, 2.9839447205351157, 'Y', 'b')
(-0.055180403519242693, 0.0030448769325464513, 'Z', 'b')
Below code will also enable you to find values on both dataframes in same locations.
python 2x
for i in range(0, len(df_one.index)):
for j in range(0, len(df_one.columns)):
print df_one.values[i,j],df_two.values[i,j],i,j
python 3x
for i in range(0, len(df_one.index)):
for j in range(0, len(df_one.columns)):
print(df_one.values[i,j],df_two.values[i,j],i,j)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With