I'm working with someone's Java code where a key data structure is a m x n x p
array, float[][][]
. I need to get it into Python; currently my approach is to save the array to a text file using Arrays.deepToString and then parse that text file from Python.
I am stuck on how to write a regular expression that will parse the txt. What I can do is find all the floats with their associated exponents in scientific notation. I use the following pattern to do so:
float_pat = r'\d\.\d*(?:E-\d+)?'
This works fine to capture floats in scientific notation as they are output by deepToString. Note the values are all positive because they are probabilities. I.e., I don't have any issues with how I'm capturing the numbers themselves.
What I cannot do but what I would like to do is have regex search for any number of floats enclosed in left and right brackets. I tried this:
list_of_floats_pat = r'\[(?:\d\.\d*(?:E-\d+)?), )+\]'
where I'm trying to find one or more case of the float format followed by a comma and a space enclosed by square brackets. But that returns []
. Not sure what I'm not understanding.
Here's an example 2x1x1 array:
[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5], [0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5]]]
What I would want is for the regex to return two matches:
0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5
and
0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5
that I can then just parse as strings with strip and split.
I've figured out a workaround where I just find all the bracket indexes. But I'd like to know what I'm not understanding about regexs.
The data that you have is both valid python and valid json:
>>> s = '[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5], [0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5]]]'
>>> ast.literal_eval(s)
[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 0.01050721017750691, 9.991008092716556e-05], [0.5904776610141782, 0.18175460267577365, 9.991008092716556e-05, 0.22716827582448523, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05]]]
>>> json.loads(s)
[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 0.01050721017750691, 9.991008092716556e-05], [0.5904776610141782, 0.18175460267577365, 9.991008092716556e-05, 0.22716827582448523, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05]]]
You'll be better off parsing with those libraries than trying to do so with regex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With