Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use Python regex to parse string of floats output by Java Arrays.deepToString

I'm working with someone's Java code where a key data structure is a m x n x p array, float[][][]. I need to get it into Python; currently my approach is to save the array to a text file using Arrays.deepToString and then parse that text file from Python.

I am stuck on how to write a regular expression that will parse the txt. What I can do is find all the floats with their associated exponents in scientific notation. I use the following pattern to do so:

float_pat = r'\d\.\d*(?:E-\d+)?'

This works fine to capture floats in scientific notation as they are output by deepToString. Note the values are all positive because they are probabilities. I.e., I don't have any issues with how I'm capturing the numbers themselves.

What I cannot do but what I would like to do is have regex search for any number of floats enclosed in left and right brackets. I tried this:

list_of_floats_pat = r'\[(?:\d\.\d*(?:E-\d+)?), )+\]'

where I'm trying to find one or more case of the float format followed by a comma and a space enclosed by square brackets. But that returns []. Not sure what I'm not understanding.

Here's an example 2x1x1 array:

[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5], [0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5]]]

What I would want is for the regex to return two matches:

0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5

and

0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5

that I can then just parse as strings with strip and split.

I've figured out a workaround where I just find all the bracket indexes. But I'd like to know what I'm not understanding about regexs.

like image 229
NickleDave Avatar asked Dec 25 '16 18:12

NickleDave


1 Answers

The data that you have is both valid python and valid json:

>>> s = '[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5], [0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5]]]'
>>> ast.literal_eval(s)
[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 0.01050721017750691, 9.991008092716556e-05], [0.5904776610141782, 0.18175460267577365, 9.991008092716556e-05, 0.22716827582448523, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05]]]
>>> json.loads(s)
[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 0.01050721017750691, 9.991008092716556e-05], [0.5904776610141782, 0.18175460267577365, 9.991008092716556e-05, 0.22716827582448523, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05]]]

You'll be better off parsing with those libraries than trying to do so with regex.

like image 115
mgilson Avatar answered Oct 05 '22 12:10

mgilson