Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a string with brackets to numpy array

Description of the problem:

I have an array-like structure in a dataframe column as a string (I read the dataframe from a csv file).

One string element of this column looks like this:

In  [1]: df.iloc[0]['points']    
Out [2]: '[(-0.0426, -0.7231, -0.4207), (0.2116, -0.1733, -0.1013), (...)]'

so it's really an array-like structure, which looks 'ready for numpy' to me.

numpy.fromstring() doesn't help as it doesn't like brackets:
convert string representation of array to numpy array in python

A simple numpy.array() on the string itself, if I copy and paste it in the array() function is returning me a numpy array.
But if I fill the array() function with the variable containing the string like that: np.array(df.iloc[0]['points']) it does not work, giving me a ValueError: could not convert string to float

Convert string to numpy array

The question:

Is there any function to do that in a simple way (without replacing or regex-ing the brackets)?

like image 656
s.k Avatar asked Aug 17 '18 14:08

s.k


People also ask

Can NumPy arrays take strings?

The elements of a NumPy array, or simply an array, are usually numbers, but can also be boolians, strings, or other objects. When the elements are numbers, they must all be of the same type. For example, they might be all integers or all floating point numbers.

What does NP Fromstring do?

The fromstring() function is used to create a new 1-D array initialized from raw binary or text data in a string. A string containing the data. Read this number of dtype elements from the data. If this is negative (the default), the count will be determined from the length of the data.

What is NumPy Astype?

To modify the data type of a NumPy array, use the astype(data type) method. It is a popular function in Python used to modify the dtype of the NumPy array we've been provided with. We'll use the numpy. astype() function to modify the dtype of the specified array object.


1 Answers

You can use ast.literal_eval before passing to numpy.array:

from ast import literal_eval
import numpy as np

x = '[(-0.0426, -0.7231, -0.4207), (0.2116, -0.1733, -0.1013)]'

res = np.array(literal_eval(x))

print(res)

array([[-0.0426, -0.7231, -0.4207],
       [ 0.2116, -0.1733, -0.1013]])

You can do the equivalent with strings in a Pandas series, but it's not clear if you need to aggregate across rows. If this is the case, you can combine a list of NumPy arrays derived using the above logic.

The docs explain types acceptable to literal_eval:

Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.

So we are effectively converting a string to a list of tuples, which np.array can then convert to a NumPy array.

like image 135
jpp Avatar answered Sep 29 '22 18:09

jpp