Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ndarray field names for both row and column?

Tags:

python

numpy

I'm a computer science teacher trying to create a little gradebook for myself using NumPy. But I think it would make my code easier to write if I could create an ndarray that uses field names for both the rows and columns. Here's what I've got so far:

import numpy as np
num_stud = 23
num_assign = 2
grades = np.zeros(num_stud, dtype=[('assign 1','i2'), ('assign 2','i2')]) #etc
gv = grades.view(dtype='i2').reshape(num_stud,num_assign)

So, if my first student gets a 97 on 'assign 1', I can write either of:

grades[0]['assign 1'] = 97
gv[0][0] = 97

Also, I can do the following:

np.mean( grades['assign 1'] ) # class average for assignment 1
np.sum( gv[0] ) # total points for student 1

This all works. But what I can't figure out how to do is use a student id number to refer to a particular student (assume that two of my students have student ids as shown):

grades['123456']['assign 2'] = 95
grades['314159']['assign 2'] = 83

...or maybe create a second view with the different field names?

np.sum( gview2['314159'] ) # total points for the student with the given id

I know that I could create a dict mapping student ids to indices, but that seems fragile and crufty, and I'm hoping there's a better way than:

id2i = { '123456': 0, '314159': 1 }
np.sum( gv[ id2i['314159'] ] )

I'm also willing to re-architect things if there's a cleaner design. I'm new to NumPy, and I haven't written much code yet, so starting over isn't out of the question if I'm Doing It Wrong.

I am going to be needing to sum all the assignment points for over a hundred students once a day, as well as run standard deviations and other stats. Plus, I'll be waiting on the results, so I'd like it to run in only a couple of seconds.

Thanks in advance for any suggestions.

like image 343
Graham Mitchell Avatar asked Oct 11 '10 22:10

Graham Mitchell


People also ask

Are NumPy arrays row column or column row?

Data in NumPy arrays can be accessed directly via column and row indexes, and this is reasonably straightforward.

How do I get column names in an array?

Columns attribute of the dataframe returns the column labels of the dataframe. You can get the column names as an array by using the . columns. values property of the dataframe.

Is a NumPy array a column or row vector?

NumPy arrays are often used to (approximately) represent vectors however.

Can a NumPy array have different data types?

Can an array store different data types? Yes, a numpy array can store different data String, Integer, Complex, Float, Boolean.


1 Answers

From you description, you'd be better off using a different data structure than a standard numpy array. ndarrays aren't well suited to this... They're not spreadsheets.

However, there has been extensive recent work on a type of numpy array that is well suited to this use. Here's a description of the recent work on DataArrays. It will be a while before this is fully incorporated into numpy, though...

One of the projects that the upcoming numpy DataArrays is (sort of) based on is "larry" (Short for "Labeled Array"). This project sounds like exactly what you're wanting to do... (Have named rows and columns but otherwise act transparently as a numpy array.) It should be stable enough to use, (and from my limited playing around with it, it's pretty slick!) but keep in mind that it will probably be replaced by a built-in numpy class eventually.

Nonetheless, you can make good use of the fact than (simple) indexing of a numpy array returns a view, into that array, and make a class that provides both interfaces...

Alternatively, @unutbu's suggestion above is another (more simple and direct) way of handling it, if you decide to roll your own.

like image 80
Joe Kington Avatar answered Oct 15 '22 21:10

Joe Kington