how to use split() on python numpy.bytes_ type? (read dictionary from file)

Question

I want to read data from a (very large, whitespace separated, two-column) text file into a Python dictionary. I tried to do this with a for-loop but that was too slow. MUCH faster is reading it with numpy loadtxt into a struct array and then converting it to a dictionary:

data = np.loadtxt('filename.txt', dtype=[('field1', 'a20'), ('field2', int)], ndmin=1)
result = dict(data)

But this is surely not the best way? Any advice?

The main reason I need something else, is that the following does not work:

data[0]['field1'].split(sep='-')

It leads to the error message:

TypeError: Type str doesn't support the buffer API

If the split() method exists, why can't I use it? Should I use a different dtype? Or is there a different (fast) way to read the text file? Is there anything else I am missing?

Versions: python version 3.3.2 numpy version 1.7.1

Edit: changed data['field1'].split(sep='-') to data[0]['field1'].split(sep='-')

Jaime · Accepted Answer

The standard library split returns a variable number of arguments, depending on how many times the separator is found in the string, and is therefore not very suitable for array operations. My char numpy arrays (I'm running 1.7) do not have a split method, by the way.

You do have np.core.defchararray.partition, which is similar but poses no problems for vectorization, as well as all the other string operations:

>>> a = np.array(['a - b', 'c - d', 'e - f'], dtype=np.string_)
>>> a
array(['a - b', 'c - d', 'e - f'], 
      dtype='|S5')
>>> np.core.defchararray.partition(a, '-')
array([['a ', '-', ' b'],
       ['c ', '-', ' d'],
       ['e ', '-', ' f']], 
      dtype='|S2')

Louic · Answer

Because: type(data[0]['field1']) gives <class 'numpy.bytes_'> , the split() method does not work when it has a "normal" string as argument (is this a bug?)

the way I solved it: data[0]['field1'].split(sep=b'-') (the key to this is to put the b in front of '-')

And of course Jaime's suggestion to use the following was very helpful: np.core.defchararray.partition(a, '-') but also in this case b'-' is needed to make it work.

In fact, a related question was answered here: Type str doesn't support the buffer API although at first sight I did not realise this was the same issue.

how to use split() on python numpy.bytes_ type? (read dictionary from file)

Tags:

python

dictionary

split

numpy

Louic

2 Answers

Jaime

Louic

Recent Activity

Donate For Us

how to use split() on python numpy.bytes_ type? (read dictionary from file)

Tags:

python

dictionary

split

numpy

Louic

2 Answers

Jaime

Louic

Related questions

Recent Activity

Donate For Us