Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a string based on a certain set of words

I have a list of strings like such,

['happy_feet', 'happy_hats_for_cats', 'sad_fox_or_mad_banana','sad_pandas_and_happy_cats_for_people'] 

Given a keyword list like ['for', 'or', 'and'] I want to be able to parse the list into another list where if the keyword list occurs in the string, split that string into multiple parts.

For example, the above set would be split into

['happy_feet', 'happy_hats',  'cats', 'sad_fox', 'mad_banana', 'sad_pandas', 'happy_cats', 'people']

Currently I've split each inner string by underscore and have a for loop looking for an index of a key word, then recombining the strings by underscore. Is there a quicker way to do this?

like image 973
SharpObject Avatar asked Dec 22 '15 07:12

SharpObject


2 Answers

>>> pat = re.compile("_(?:%s)_"%"|".join(sorted(split_list,key=len)))
>>> list(itertools.chain(pat.split(line) for line in data))

will give you the desired output for the example dataset provided

actually with the _ delimiters you dont really need to sort it by length so you could just do

>>> pat = re.compile("_(?:%s)_"%"|".join(split_list))
>>> list(itertools.chain(pat.split(line) for line in data))
like image 200
Joran Beasley Avatar answered Oct 12 '22 06:10

Joran Beasley


>>> [re.split(r"_(?:f?or|and)_", s) for s in l]
[['happy_feet'],
 ['happy_hats', 'cats'],
 ['sad_fox', 'mad_banana'],
 ['sad_pandas', 'happy_cats', 'people']]

To combine them into a single list, you can use

result = []
for s in l:
    result.extend(re.split(r"_(?:f?or|and)_", s))
like image 38
Tim Pietzcker Avatar answered Oct 12 '22 08:10

Tim Pietzcker