I have data similar to that seen in this gist and I am trying to extract the data with numpy. I am rather new to python so I tried to do so with the following code
import numpy as np
from datetime import datetime
convertfunc = lambda x: datetime.strptime(x, '%H:%M:%S:.%f')
col_headers = ["Mass", "Thermocouple", "T O2 Sensor",\
"Igniter", "Lamps", "O2", "Time"]
data = np.genfromtxt(files[1], skip_header=22,\
names=col_headers,\
converters={"Time": convertfunc})
Where as can be seen in the gist there are 22 rows of header material. In Ipython, when I "run" the following code I receive an error that ends with the following:
TypeError: float() argument must be a string or a number
The full ipython error trace can be seen here.
I am able to extract the six columns of numeric data just fine using an argument to genfromtxt like usecols=range(0,6), but when I try to use a converter to try and tackle the last column I'm stumped. Any and all comments would be appreciated!
This is happening because np.genfromtxt
is trying to create a float array, which fails because convertfunc
returns a datetime object, which cannot be cast as float. The easiest solution would be to just pass the argument dtype='object'
to np.genfromtxt
, ensuring the creation of an object array and preventing a conversion to float. However, this would mean that the other columns would be saved as strings. To get them properly saved as floats you need to specify the dtype
of each to get a structured array. Here I'm setting them all to double except the last column, which will be an object dtype:
dd = [(a, 'd') for a in col_headers[:-1]] + [(col_headers[-1], 'object')]
data = np.genfromtxt(files[1], skip_header=22, dtype=dd,
names=col_headers, converters={'Time': convertfunc})
This will give you a structured array which you can access with the names you gave:
In [74]: data['Mass']
Out[74]: array([ 0.262 , 0.2618, 0.2616, 0.2614])
In [75]: data['Time']
Out[75]: array([1900-01-01 15:49:24.546000, 1900-01-01 15:49:25.171000,
1900-01-01 15:49:25.405000, 1900-01-01 15:49:25.624000],
dtype=object)
You can use pandas read_table:
import pandas as pd
frame=pd.read_table('/tmp/gist', header=None, skiprows=22,delimiter='\s+')
worked for me. You need to process the header separately since they are variable number of space separated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With