I have a csv file with 5 columns and many rows in the following format:
BAL 27 DEN 49 2013-09-05T20:30:00
I want to compare the 2 scores and return the name of the winner as a 6th column
I tried this:
from pandas import read_csv
Games = open("games.csv","rb")
df = read_csv(Games, header=None)
#print df
#print df[0]
if df[3] > df[1]:
print df[2]
else:
print df[0]
I am getting an ValueError: The truth value of a Series is ambiguous
Any ideas how I can accomplish my goal?
Basically, you have to remember that the boolean df["home"] > df["guest"]
is a vector -- you can take advantage of this to assign the home team name to each row where the vector is True
. You could try something like this:
Simulate some data:
In [22]: df = pandas.DataFrame({"home":[10,13,7,24,17],
"guest":[13, 7, 7, 30, 17],
"home_name":list("ABCDE"),
"guest_name":list("abcde")})
Make a new column, and assign the guest name to each row that has the guest score greater than the home score (note that the other rows in the "winner" column will be NaN after the first assignment, and will get filled in progressively):
In [23]: df.loc[df["guest"]>df["home"], "winner"] = df["guest_name"]
In [24]: df.loc[df["guest"]<df["home"], "winner"] = df["home_name"]
In [25]: df.loc[df["guest"]==df["home"], "winner"] = "tie"
In [26]: df
Out[26]:
home_name guest_name home guest winner
0 A a 10 13 a
1 B b 13 7 B
2 C c 7 7 tie
3 D d 24 30 d
4 E e 17 17 tie
The problem with your code is that df[3] > df[1]
returns a pandas.Series
of booleans and as the message says The truth value of a Series is ambiguous
.
Try this:
df[6] = df[0] #sets default value
df.loc[df[3]>df[1],6] = df[2] #change when second wins
Then you can do print df
or print df[6]
.
Also you can do the reading part more easy: df = read_csv('games.csv', delim_whitespace=True,header=None)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With