Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get groups of numbers separated by commas in python?

Tags:

python

regex

I have the following text:

Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,
       69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,
       202, 212, 220, 227, 250, 252, 253, 259, 262, 267,
       270, 282, 296, 318, 319, 323, 326, 341}
Cluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,
       221, 279, 284, 285, 286, 287, 327, 333, 334, 335,
       336}
Cluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}
Cluster 10: {94, 123, 147}

And i want to extract by cluster the number in each set.

I have tryed using regex without much luck

I have tried:

regex="(Cluster \d+): \{((\d+)[,\}][\n ]+)+|(?:(\d+),[\n ])"

But the groups dont match.

I would like an output as:

["Cluster 7", '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', "Cluster 8", '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', "Cluster 9", '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', "Cluster 10", "94", "123", "147"]

Or maybe this is not the best approach to do this.

Thanks

like image 891
Rafael Rios Avatar asked Jan 28 '23 07:01

Rafael Rios


2 Answers

I would not use regex for this. Your text is within yaml spec and can be loaded directly with an order-preserving yaml loader such as oyaml.

import oyaml as yaml   # pip install oyaml
data = yaml.load(text)

To unpack that dict to the desired "flat" structure, it's just a list comprehension:

[x for (k, v) in data.items() for x in (k, *v)]

Note: I'm the author of oyaml.

like image 86
wim Avatar answered Jan 30 '23 21:01

wim


You can create a more generic regex:

import re
s = '\nCluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,\n       69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,\n       202, 212, 220, 227, 250, 252, 253, 259, 262, 267,\n       270, 282, 296, 318, 319, 323, 326, 341}\nCluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,\n       221, 279, 284, 285, 286, 287, 327, 333, 334, 335,\n       336}\nCluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}\nCluster 10: {94, 123, 147}\n'
data = re.findall('Cluster \d+|\d+', s)

Output:

['Cluster 7', '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', 'Cluster 8', '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', 'Cluster 9', '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', 'Cluster 10', '94', '123', '147']
like image 26
Ajax1234 Avatar answered Jan 30 '23 19:01

Ajax1234