Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

comparing columns pandas python

I have a csv file with 5 columns and many rows in the following format:

BAL 27  DEN 49  2013-09-05T20:30:00   

I want to compare the 2 scores and return the name of the winner as a 6th column

I tried this:

from pandas import read_csv
Games = open("games.csv","rb")
df = read_csv(Games, header=None)
#print df
#print df[0]

if df[3] > df[1]:
    print df[2]
else:
    print df[0]

I am getting an ValueError: The truth value of a Series is ambiguous

Any ideas how I can accomplish my goal?

like image 742
kegewe Avatar asked Feb 27 '14 20:02

kegewe


2 Answers

Basically, you have to remember that the boolean df["home"] > df["guest"] is a vector -- you can take advantage of this to assign the home team name to each row where the vector is True. You could try something like this:

Simulate some data:

In [22]: df = pandas.DataFrame({"home":[10,13,7,24,17], 
"guest":[13, 7, 7, 30, 17], 
"home_name":list("ABCDE"), 
"guest_name":list("abcde")})

Make a new column, and assign the guest name to each row that has the guest score greater than the home score (note that the other rows in the "winner" column will be NaN after the first assignment, and will get filled in progressively):

In [23]: df.loc[df["guest"]>df["home"], "winner"] = df["guest_name"]

In [24]: df.loc[df["guest"]<df["home"], "winner"] = df["home_name"]

In [25]: df.loc[df["guest"]==df["home"], "winner"] = "tie"

In [26]: df
Out[26]: 
  home_name guest_name  home  guest winner
0         A          a    10     13      a
1         B          b    13      7      B
2         C          c     7      7    tie
3         D          d    24     30      d
4         E          e    17     17    tie
like image 147
Noah Avatar answered Sep 30 '22 09:09

Noah


The problem with your code is that df[3] > df[1] returns a pandas.Series of booleans and as the message says The truth value of a Series is ambiguous.

Try this:

df[6] = df[0] #sets default value
df.loc[df[3]>df[1],6] = df[2] #change when second wins

Then you can do print df or print df[6].

Also you can do the reading part more easy: df = read_csv('games.csv', delim_whitespace=True,header=None)

like image 24
Alvaro Fuentes Avatar answered Sep 30 '22 08:09

Alvaro Fuentes