I have a Panda's dataframe like the following:

And i want to get the countries where its PIB in 2007 was less than in 2002, but i couldn´t code something to do that only using Pandas built in methods without use python iterations or something like that. The most i've got is the following line:
df[df[df.year == 2007].PIB < df[df.year == 2002].PIB].country
But i get the following error:
ValueError: Can only compare identically-labeled Series objects
Till' now i've only used Pandas to filter data from different columns, but i don't know how to compare data from the same column, in this case the year. Any support is welcome.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise. Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.
My strategy is using the pivot_table. There is an assumption that there are no two rows having the same ('country','year') pair. With this assumption, aggfunc=np.sum represents the only single PIB value.
table = pd.pivot_table(df, values='PIB', index=['country'],
                    columns=['year'], aggfunc=np.sum)[[2002,2007]]
list(table[table[2002] > table[2007]].index)
The pivot_table looks like this:

I suggest create Series with index by country column, but is necessary same number of countries in 2007 and 2002 for compare Series with same index values:
df = pd.DataFrame({'country': ['Afganistan', 'Zimbabwe', 'Afganistan', 'Zimbabwe'],
                  'PIB': [200, 200, 100, 300], 
                  'year': [2002, 2002, 2007, 2007]})
print (df)
      country  PIB  year
0  Afganistan  200  2002
1    Zimbabwe  200  2002
2  Afganistan  100  2007
3    Zimbabwe  300  2007
df = df.set_index('country')
print (df)
            PIB  year
country              
Afganistan  200  2002
Zimbabwe    200  2002
Afganistan  100  2007
Zimbabwe    300  2007
s1 = df.loc[df.year == 2007, 'PIB'] 
s2 = df.loc[df.year == 2002, 'PIB']
print (s1)
country
Afganistan    100
Zimbabwe      300
Name: PIB, dtype: int64
print (s2)
country
Afganistan    200
Zimbabwe      200
Name: PIB, dtype: int64
countries = s1.index[s1 < s2]
print (countries)
Index(['Afganistan'], dtype='object', name='country')
Another idea is first pivoting by DataFrame.pivot and then seelct columns by years and compare with index in boolean indexing:
df1 = df.pivot('country','year','PIB')
print (df1)
year        2002  2007
country               
Afganistan   200   100
Zimbabwe     200   300
countries = df1.index[df1[2007] < df1[2002]]
print (countries)
Index(['Afganistan'], dtype='object', name='country')
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With