Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you create a Python list from a string, while keeping characters in specific keywords together?

I want to create a list from the characters in a string, but keep specific keywords together.

For example:

keywords: car, bus

INPUT:

"xyzcarbusabccar" 

OUTPUT:

["x", "y", "z", "car", "bus", "a", "b", "c", "car"] 
like image 918
Dan Martino Avatar asked Feb 07 '16 21:02

Dan Martino


People also ask

How do you turn a string into a list in Python?

To convert string to list in Python, use the string split() method. The split() is a built-in Python method that splits the strings and stores them in the list.

How do I convert a string to a list without splitting in Python?

One of these methods uses split() function while other methods convert the string into a list without split() function. Python list has a constructor which accepts an iterable as argument and returns a list whose elements are the elements of iterable. An iterable is a structure that can be iterated.

How do you get individual characters from a string in Python?

Individual characters in a string can be accessed by specifying the string name followed by a number in square brackets ( [] ). String indexing in Python is zero-based: the first character in the string has index 0 , the next has index 1 , and so on.

How do you split a word into a list of letters in Python?

Use the list() class to split a word into a list of letters, e.g. my_list = list(my_str) . The list() class will convert the string into a list of letters.


1 Answers

With re.findall. Alternate between your keywords first.

>>> import re >>> s = "xyzcarbusabccar" >>> re.findall('car|bus|[a-z]', s) ['x', 'y', 'z', 'car', 'bus', 'a', 'b', 'c', 'car'] 

In case you have overlapping keywords, note that this solution will find the first one you encounter:

>>> s = 'abcaratab' >>> re.findall('car|rat|[a-z]', s) ['a', 'b', 'car', 'a', 't', 'a', 'b'] 

You can make the solution more general by substituting the [a-z] part with whatever you like, \w for example, or a simple . to match any character.

Short explanation why this works and why the regex '[a-z]|car|bus' would not work: The regular expression engine tries the alternating options from left to right and is "eager" to return a match. That means it considers the whole alternation to match as soon as one of the options has been fully matched. At this point, it will not try any of the remaining options but stop processing and report a match immediately. With '[a-z]|car|bus', the engine will report a match when it sees any character in the character class [a-z] and never go on to check if 'car' or 'bus' could also be matched.

like image 139
timgeb Avatar answered Oct 17 '22 13:10

timgeb