Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent of R/ifelse in Python/Pandas? Compare string columns?

My goal is comparing between two columns and add the result column. R uses ifelse but I need to know pandas's way.

R

> head(mau.payment)   log_month user_id install_month payment 1   2013-06       1       2013-04       0 2   2013-06       2       2013-04       0 3   2013-06       3       2013-04   14994  > mau.payment$user.type <-ifelse(mau.payment$install_month == mau.payment$log_month, "install", "existing") > head(mau.payment)   log_month user_id install_month payment user.type 1   2013-06       1       2013-04       0  existing 2   2013-06       2       2013-04       0  existing 3   2013-06       3       2013-04   14994  existing 4   2013-06       4       2013-04       0  existing 5   2013-06       6       2013-04       0  existing 6   2013-06       7       2013-04       0  existing 

Pandas

>>> maupayment user_id  log_month  install_month 1        2013-06    2013-04              0          2013-07    2013-04              0 2        2013-06    2013-04              0 3        2013-06    2013-04          14994 

I tried some cases but did not work. It seems that string comparison does not work.

>>>np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')  TypeError: 'str' object cannot be interpreted as an integer  

Could you help me please?


Pandas and numpy version.

>>> pd.version.version '0.16.2' >>> np.version.full_version '1.9.2' 

After update the versions, it worked!

>>> np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing') array(['existing', 'install', 'existing', ..., 'install', 'install',        'install'],        dtype='<U8') 
like image 312
zono Avatar asked Feb 27 '16 05:02

zono


People also ask

What is the equivalent of dplyr in Python?

Dplython. Package dplython is dplyr for Python users. It provide infinite functionality for data preprocessing.

Is Pandas similar to dplyr?

Learn More. Heey great post, but pandas has very similar functions as dplyr. If you use those instead, you get statements very similar to your dplyr statements and you would get the same readability.


1 Answers

You have to upgrade pandas to last version, because in version 0.17.1 it works very well.

Sample (first value in column install_month is changed for matching):

print maupayment   log_month  user_id install_month  payment 1   2013-06        1       2013-06        0 2   2013-06        2       2013-04        0 3   2013-06        3       2013-04    14994  print np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing') ['install' 'existing' 'existing'] 
like image 58
jezrael Avatar answered Sep 28 '22 02:09

jezrael