Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Queries about using NumPy Arrays to parse a CSV file

Q1: I have a strange issue which I can't seem to figure out.

I'm parsing through a CSV File using the NumPy Module, where a portion of the CSV File (which contains 253 rows and 4 Columns) is shown below:

Code            Date       NetPrice   Gain
MICRO US       01/05/2012   613.98   0
MICRO US       01/06/2012   622.75   1.09342432
MICRO US       01/07/2012   690.99  -0.44342342
MICRO US       01/08/2012   611.26  -3.242423423

I'm parsing through the CSV File using the code below:

micro_info = np.genfromtxt('MICRO.csv', delimiter=',', dtype=None, names=True)

However, when I run the code below, I get that the first line gives me (253,), but the second line prints the required contents of the CSV File containing all 253 rows and 4 Columns. I don't understand why this is so.

print micro_info.shape
print micro_info

Q2: Does what I am doing below make sense?

I'm essentially looking to convert the Dates to floats so that I can use Matplotlib to plot the NetPrice values of MICRO US against each Date. For this I use the code below:

convertingdates = strpdate2num(micro_info[1:,2])
datesasfloat = {1: convertingdates}
micro_info = np.genfromtxt('MICRO.csv', delimiter=',', dtype=None, converters = datesasfloat, names=True)

I will then access the Dates and NetPrice as required.

Thank You

like image 631
user131983 Avatar asked Feb 12 '26 15:02

user131983


1 Answers

With your sample text, this works:

In [314]: dconverter=pylab.strpdate2num('%M/%S/%Y')
In [316]: names='code us Date NetPrice Gain'.split()
In [317]: data=np.genfromtxt(ss,skip_header=1,dtype=None,
             converters={'Date':dconverter},names=names)
In [318]: data.shape
Out[318]: (4,)
In [319]: data['Date']
Out[319]: 
array([ 734503.00075231,  734503.00076389,  734503.00077546,
        734503.00078704])
In [320]: data['NetPrice']
Out[320]: array([ 613.98,  622.75,  690.99,  611.26])

It uses the default white spaces delimiter. Because that splits 'MICRO US', I used a custom names list, rather than the header line. I refined your use of strpdate2num.

If the file was comma delimited, then this would work (and using a corrected date converter):

In [410]: dconverter=pylab.strpdate2num('%m/%d/%Y')
In [412]: data=np.genfromtxt(ss,names=True,delimiter=',',dtype=None,
                autostrip=True,converters={'Date':dconverter})
In [413]: data
Out[413]: 
array([('MICRO US', 734507.0, 613.98, 0.0),
       ('MICRO US', 734508.0, 622.75, 1.09342432),
       ('MICRO US', 734509.0, 690.99, -0.44342342),
       ('MICRO US', 734510.0, 611.26, -3.242423423)], 
      dtype=[('Code', 'S8'), ('Date', 'O'), ('NetPrice', '<f8'), ('Gain', '<f8')])

Another way to deal with 'delimiters' is to give a list of field widths. For some reason this required an explicit dtype.

dt=np.dtype([('Code', 'S8'), ('Date', 'O'), ('NetPrice', '<f8'), ('Gain', '<f8')])
data=np.genfromtxt(ss, names=True, delimiter=[15,10,11,12],
    converters={'Date':dconverter}, dtype=dt)
like image 72
hpaulj Avatar answered Feb 14 '26 04:02

hpaulj