Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: How could I iterate two dataframes which have exactly same format?

Tags:

python

pandas

My final goal is making list which contain a pair for corresponding location of dataframes, like below

 [df_one_first_element, df_two_first_element, column_first, index_first]

 :[0.619159, 0.510162, 20140109,0.50], [0.264191,0.269053,20140213,0.50]...

So I am trying to iterate two dataframe but got stuck now. How could I iterate two dataframe which has exactly same format but different data.

For example, I have two dataframes; df_one and df_two that appear like the below:

df_one = 

      20140109  20140213  20140313  20140410  20140508  20140612  20140710  \
0.50  0.619159  0.264191  0.438849  0.465287  0.445819  0.412582  0.397366   
0.55  0.601379  0.303953  0.457524  0.432335  0.415333  0.382093  0.382361  

df_two = 

      20140109  20140213  20140313  20140410  20140508  20140612  20140710  \
0.50  0.510162  0.269053  0.308494  0.300554  0.294360  0.286980  0.280494   
0.55  0.489953  0.258690  0.290044  0.283933  0.278180  0.271426  0.266580    

And I want to access the same location of the dataframe by iterating over the whole values in the dataframe.

Firstly I tried iterrows()

i = 0
for index, row in df_one.iterrows():
    j= 0
    for item in row:
        print df_two(i,j)
        j= j+1
    i = i+1

but as you know we can not access like:

df_two(i,j)

So I am currently lost the way. Or could we access the data by index name and column name?

like image 704
JonghoKim Avatar asked Jul 12 '14 04:07

JonghoKim


People also ask

How can you compare two DataFrames are identical?

The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements.

What is the most efficient way to loop through DataFrames with pandas?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.

Are two pandas DataFrames identical?

The output is False because the two dataframes are not equal to each other. They have different elements. Example #2: Use equals() function to test for equality between two data frame object with NaN values. Note : NaNs in the same location are considered equal.


Video Answer


2 Answers

You could use itertools.izip :

for ( idxRow, s1 ), ( _, s2 ) in itertools.izip( df0.iterrows(), df1.iterrows() ) :
    for ( idxCol, v1 ), ( _, v2 ) in itertools.izip( s1.iteritems(), s2.iteritems() ) :
        print ( v1, v2, idxCol, idxRow )

In:

X   Y   Z
a    1.171124    0.853229    1.416635
b    0.971665   -1.727410   -0.055180

Out:

(1.1711241491561419, 1.3715317727366974, 'X', 'a')
(0.85322862359611618, 0.72799908412372294, 'Y', 'a')
(1.4166350896829785, 2.0068549773211006, 'Z', 'a')
(0.9716653056530119, 0.94413346620976102, 'X', 'b')
(-1.727409829928936, 2.9839447205351157, 'Y', 'b')
(-0.055180403519242693, 0.0030448769325464513, 'Z', 'b')
like image 182
usual me Avatar answered Oct 24 '22 05:10

usual me


Below code will also enable you to find values on both dataframes in same locations.

python 2x

for i in range(0, len(df_one.index)):
    for j in range(0, len(df_one.columns)):
        print df_one.values[i,j],df_two.values[i,j],i,j

python 3x

for i in range(0, len(df_one.index)):
    for j in range(0, len(df_one.columns)):
        print(df_one.values[i,j],df_two.values[i,j],i,j)
like image 31
kimal Avatar answered Oct 24 '22 04:10

kimal