I have this dataset:
Game1 Game2 Game3 Game4 Game5
Player1 2 6 5 2 2
Player2 6 4 1 8 4
Player3 8 3 2 1 5
Player4 4 9 4 7 9
I want to calcultate the sum of the 5 games for every player.
This is my code :
import csv
f=open('Games','rb')
f=csv.reader(f,delimiter=';')
lst=list(f)
lst
import numpy as np
myarray = np.asarray(lst)
x=myarray[1,1:] #First player
y=np.sum(x)
I had the error "cannot perform reduce with flexible type". Im really very new to python and I need your help.
Thank you
You can still use a structured array as long as you familiarize yourself with the dtypes. Since your data set is extremely small, the following may serve as an example of using numpy in conjunction with list comprehensions when your dtype is uniform but named
dt = [('Game1', '<i4'), ('Game2', '<i4'), ('Game3', '<i4'),
('Game4', '<i4'), ('Game5', '<i4')]
a = np.array([(2, 6, 5, 2, 2),
(6, 4, 1, 8, 4),
(8, 3, 2, 1, 5),
(4, 9, 4, 7, 9)], dtype= dt)
nms = a.dtype.names
by_col = [(i, a[i].sum()) for i in nms if a[i].dtype.kind in ('i', 'f')]
by_col
[('Game1', 20), ('Game2', 22), ('Game3', 12), ('Game4', 18), ('Game5', 20)]
by_row = [("player {}".format(i), sum(a[i])) for i in range(a.shape[0])]
by_row
[('player 0', 17), ('player 1', 23), ('player 2', 19), ('player 3', 33)]
In this example, it would be a real pain to get each sum individually for each column name. That is where the ... a[i] for i in nms bit is useful since the list of names was retrieved by nms = a.dtype.names. Since you are doing a 'sum' then you want to restrict the summation to only integer and float types, hence the a[i].dtype.kind portion.
Summing by row is just as easy but you will notice that I didn't use this syntax but a slightly different one to avoid the error message
a[0].sum() # massive failure
....snip out huge error stuff...
TypeError: cannot perform reduce with flexible type
# whereas, this works....
sum(a[0]) # use list/tuple summation
Perhaps 'flexible' data types don't live up to their name. So you can still work with structured and recarrays if that is the way that your data comes in. You can become adept at simply reformatting your data by slicing and altering dtypes to suit your purpose. For example, since your data type are all the same and you don't have a monstrous dataset, then you can use many methods to convert to a simple structured array.
b = np.array([list(a[i]) for i in range(a.shape[0])])
b
array([[2, 6, 5, 2, 2],
[6, 4, 1, 8, 4],
[8, 3, 2, 1, 5],
[4, 9, 4, 7, 9]])
b.sum(axis=0)
array([20, 22, 12, 18, 20])
b.sum(axis=1)
array([17, 23, 19, 33])
So you have many options when dealing with structured arrays and depending on whether you need to work in pure python, numpy, pandas or a hybrid, then you should familiarize yourself with all the options.
ADDENDUM
As a shortcut, I failed to mention taking 'views' of arrays that are structured in nature, but have the same dtype. In the above case, a simple way to produce the requirements for simple array calculations by row or column are as follows... a copy of the array was made, but not necessary
b = a.view(np.int32).reshape(len(a), -1)
b
array([[2, 6, 5, 2, 2],
[6, 4, 1, 8, 4],
[8, 3, 2, 1, 5],
[4, 9, 4, 7, 9]])
b.dtype
dtype('int32')
b.sum(axis=0)
array([20, 22, 12, 18, 20])
b.sum(axis=1)
array([17, 23, 19, 33])
The complication with using numpy is that one has two sources of error (and documentation to read), namely python itself as well as numpy.
I believe your problem here is that you are working with a so-called structured (numpy) array.
Consider the following example:
>>> import numpy as np
>>> a = np.array([(1,2), (4,5)], dtype=[('Game 1', '<f8'), ('Game 2', '<f8')])
>>> a.sum()
TypeError: cannot perform reduce with flexible type
Now, I first select the data I want to use:
>>> import numpy as np
>>> a = np.array([(1,2), (4,5)], dtype=[('Game 1', '<f8'), ('Game 2', '<f8')])
>>> a["Game 1"].sum()
5.0
Which is what I wanted.
Maybe you would consider using pandas (python library), or change language to R.
Personal opinions
Even though "numpy" certainly is a mighty library I still avoid using it for data-science and other "activities" where the program is designed around "flexible" data-types. Personally I use numpy when I need something to be fast and maintainable (it is easy to write "code for the future"), but I do not have the time to write a C program.
As far as Pandas goes it is convenient for us "Python hackers" because it is "R data structures implemented in Python", whereas "R" is (obviously) an entirely new language. I personally use R as I consider Pandas to be under rapid development, which makes it difficult to write "code with the future in mind".
As suggested in a comment (@jorijnsmit I believe) there is no need to introduce large dependencies, such as pandas, for "simple" cases. The minimalistic example below, which is compatible to both Python 2 and 3, uses "typical" Python tricks to massage the data it the question.
import csv
## Data-file
data = \
'''
, Game1, Game2, Game3, Game4, Game5
Player1, 2, 6, 5, 2, 2
Player2, 6, 4 , 1, 8, 4
Player3, 8, 3 , 2, 1, 5
Player4, 4, 9 , 4, 7, 9
'''
# Write data to file
with open('data.csv', 'w') as FILE:
FILE.write(data)
print("Raw data:")
print(data)
# 1) Read the data-file (and strip away spaces), the result is data by column:
with open('data.csv','rb') as FILE:
raw = [ [ item.strip() for item in line] \
for line in list(csv.reader(FILE,delimiter=',')) if line]
print("Data after Read:")
print(raw)
# 2) Convert numerical data to integers ("float" would also work)
for (i, line) in enumerate(raw[1:], 1):
for (j, item) in enumerate(line[1:], 1):
raw[i][j] = int(item)
print("Data after conversion:")
print(raw)
# 3) Use the data...
print("Use the data")
for i in range(1, len(raw)):
print("Sum for Player %d: %d" %(i, sum(raw[i][1:])) )
for i in range(1, len(raw)):
print("Total points in Game %d: %d" %(i, sum(list(zip(*raw))[i][1:])) )
The output would be:
Raw data:
, Game1, Game2, Game3, Game4, Game5
Player1, 2, 6, 5, 2, 2
Player2, 6, 4 , 1, 8, 4
Player3, 8, 3 , 2, 1, 5
Player4, 4, 9 , 4, 7, 9
Data after Read:
[['', 'Game1', 'Game2', 'Game3', 'Game4', 'Game5'], ['Player1', '2', '6', '5', '2', '2'], ['Player2', '6', '4', '1', '8', '4'], ['Player3', '8', '3', '2', '1', '5'], ['Player4', '4', '9', '4', '7', '9']]
Data after conversion:
[['', 'Game1', 'Game2', 'Game3', 'Game4', 'Game5'], ['Player1', 2, 6, 5, 2, 2], ['Player2', 6, 4, 1, 8, 4], ['Player3', 8, 3, 2, 1, 5], ['Player4', 4, 9, 4, 7, 9]]
Use the data
Sum for Player 1: 17
Sum for Player 2: 23
Sum for Player 3: 19
Sum for Player 4: 33
Total points in Game 1: 20
Total points in Game 2: 22
Total points in Game 3: 12
Total points in Game 4: 18
Consider using Pandas module:
import pandas as pd
df = pd.read_csv('/path/to.file.csv', sep=';')
Resulting DataFrame:
In [196]: df
Out[196]:
Game1 Game2 Game3 Game4 Game5
Player1 2 6 5 2 2
Player2 6 4 1 8 4
Player3 8 3 2 1 5
Player4 4 9 4 7 9
Sum:
In [197]: df.sum(axis=1)
Out[197]:
Player1 17
Player2 23
Player3 19
Player4 33
dtype: int64
In [198]: df.sum(1).values
Out[198]: array([17, 23, 19, 33], dtype=int64)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With