Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Please clarify the following Python NumPy array initialization and splicing examples

I am using Python version 2.6 and am learning NumPy version 1.3.

I have tried out several NumPy array initialization and column splicing examples below, and added some inline questions as comments and a list of findings in the end. Hopefully someone can explain to me what is behind the differences in behaviors. Lots of inter-related questions and a rather long post, but each example is small, feel free to just answer one or a couple.

import numpy as np

print "Initializing a number of numpy arrays:\n"

a) Initialize from a list of tuples

a = np.zeros((3,),dtype=('i4,i4,a1'))
a[:] = [(1, 2, 'A'), (3, 4, 'B'),(5, 6, 'A')]
print "a: "
print a         # print => [(1, 2, 'A') (3, 4, 'B') (5, 6, 'A')]
print repr(a)   # print => array([(1, 2, 'A'), (3, 4, 'B'), (5, 6, 'A')],
                #     dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '|S1')]
print '\n'

b) A normal list of tuples

b = [];
b[:] = [(1, 2, 'A'), (3, 4, 'B'),(5, 6, 'A')]
print "b: "
print b         # print => [(1, 2, 'A'), (3, 4, 'B'), (5, 6, 'A')
print '\n'

Question 1: a) looks like a list of tuples from print, except without the comma between the tuples. If I print it with repr(a), it even has the commas. Even so, it should no longer be considered the same as b), correct?

c) Fail: Try to initialize array returned from np.zeroes as a list of list

Question 2: Is the below failing because the dtype does not match the list that I passed in?

c = np.zeros((3,),dtype=('i4,i4,a1'))
#c[:] = [[1, 2, 'A'], [3, 4, 'B'],[5, 6, 'A']]
# TypeError: expected a readable buffer object
print '\n'

d) Fail: Same as c) but try to set the dtype as a list

Question 3: Is the below failing, because I am not allowed to specify a dtype that is a list?

#d = np.zeros((3,),dtype=['i4,i4,a1'])
# TypeError: data type not understood
#d[:] = [[1, 2, 'A'], [3, 4, 'B'],[5, 6, 'A']]
print '\n'

e) Try to initialize array using np.array from a list of a list

Question 4: Why would e) below which is also a list of list work, but d) fail?

e = np.array( [[1, 2, 'A'], [3, 4, 'B'],[5, 6, 'A']] )
print "e: "
print e     # print =>  [['1' '2' 'A']
            #   ['3' '4' 'B']
            #   ['5' '6' 'A']]
print '\n'

f) Try to initialize array using np.array from a list of a tuples

Question 5: Same example as e), but this time initializing with list of tuples he print out of f) is identical as e), so initializing with list of tuples and list of list are really identical then?

f = np.array( [(1, 2, 'A'), (3, 4, 'B'),(5, 6, 'A')] )
print "f: "
print f     # print =>  [['1' '2' 'A']
            #   ['3' '4' 'B']
            #   ['5' '6' 'A']]
print '\n'

g) Try to initialize array using np.array from a CSV file

Question 6: Same example as e and f, but this time initializing from file Minor difference in quoting for the print out. There should be no difference # between the array generated like this and e) and f) right?

from StringIO import StringIO
data = StringIO( """
1, 2, A
3, 4, B
5, 6, A
""".strip())
g = np.genfromtxt(data, dtype=object, delimiter=',')
print "g: "
print g     # print =>  [[1 2 A]
            #   [3 4 B]
            #   [5 6 A]]
print '\n'

h) Splicing the NumPy arrays by column

#print "a: "
#print a[:,2]   # IndexError: invalid index
print "a: "
print a['f2']   # This is ok though

# Splicing a normal list of tuples if not expected to work
#print "b: "
#print b[:,2]   # IndexError: invalid index

Question 7 Why would splicing e below work, but a fail above with Index error with the same syntax?

print "e: "
print e[:,2]    # print => ['A' 'B' 'A']

print "f: "
print f[:,2]    # print => ['A' 'B' 'A']

print "g: "
print g[:,2]    # print => [A B A]

Finding 1: Initializing numpy.ndarray by using nd.array and a list of tuples, list of list, or CSV file are identical. This is maybe contrary to what this other answer that I viewed that says np.array expects a list of a tuples, Stack Overflow question Define dtypes in NumPy using a list?.

Finding 2: Initializing numpy.ndarray by using np.zeroes, I am unable to initialize the ndarray from a list of a list.

Finding 3: For column splicing, initializing numpy.ndarray by using nd.array, I could do a column splice (that is, e[:,2], but the syntax of splicing, using the np.zeroes initialization method is different a['f2']. A normal list of tuples cannot be spliced.

like image 971
frank Avatar asked Jul 11 '13 08:07

frank


1 Answers

Question 1

a) looks like a list of tuples from print, except without the comma between the tuples. If I print it with repr(a), it even has the commas. Even so, it should no longer be considered the same as b) correct?

Absolutely. a and b have different types: type(a) is numpy.ndarray, type(b) is list

Question 2

Is the below failing because the dtype does not match the list that I passed in?

No - the issue is that you're trying to fill it with a list of lists, rather than a list of tuples as you did with a. See here. I'm not totally sure what the deep reason is for this behaviour, but I suspect it has to do with tuples being immutable objects, whereas lists (and arrays) are mutable.

Question 3

Is the below failing because I am not allowed to specify a dtype that is a list?

Yes, and furthermore you would also fail to fill d with a list of lists (see previous answer).

Question 4

Why would e) below which is also a list of list work, but d) fail?

Look at the dtype of e - it is |S1, i.e. every element in the array is a string of length 1. If you don't specify a dtype for the array constructor, the type will be determined as the minimum type required to hold all of the objects in the sequence. In this case, since you handed it a sequence containing some strings, it will upcast the integers to strings.

Question 5

Same example as e), but this time initializing with list of tuples he print out of f) is identical as e), so initializing with list of tuples and list of list are really identical then?

Again, since you don't give the constructor a dtype, everything will get upcast to |S1.

Question 6

Same example as e and f, but this time initializing from file Minor difference in quoting for the print out. There should be no difference # between the array generated like this and e) and f) right?

No, now you're telling the constructor to create an array with dtype=object, whereas e and f will have dtype=|S1.

Question 7

Why would splicing e below work, but a fail above with Index error with the same syntax?

Look at a.shape - you'll see that it's (3,), i.e. a is a 1d vector of length 3. Although it does have fields that you can index it by, it has no second dimension for you to index into. By contrast, e.shape is (3,3), so you can index it by column.

like image 98
ali_m Avatar answered Sep 28 '22 23:09

ali_m