Problem: split a string into a list of words by a delimiter characters passed in as a list.
String: "After the flood ... all the colors came out."
Desired output: ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']
I have written the following function - note I am aware that there are better ways to split a string using some of pythons built in functions but for sake of learning I thought I would proceed this way:
def split_string(source,splitlist):
result = []
for e in source:
if e in splitlist:
end = source.find(e)
result.append(source[0:end])
tmp = source[end+1:]
for f in tmp:
if f not in splitlist:
start = tmp.find(f)
break
source = tmp[start:]
return result
out = split_string("After the flood ... all the colors came out.", " .")
print out
['After', 'the', 'flood', 'all', 'the', 'colors', 'came out', '', '', '', '', '', '', '', '', '']
I can't figure out why "came out" is not split into "came" and "out" as two separate words. Its like as if the whitespace character between the two words is being ignored. I think the remainder of the output is junk that stems from the problem associated with the "came out" problem.
EDIT:
I followed @Ivc's suggestion and came up with the following code:
def split_string(source,splitlist):
result = []
lasti = -1
for i, e in enumerate(source):
if e in splitlist:
tmp = source[lasti+1:i]
if tmp not in splitlist:
result.append(tmp)
lasti = i
if e not in splitlist and i == len(source) - 1:
tmp = source[lasti+1:i+1]
result.append(tmp)
return result
out = split_string("This is a test-of the,string separation-code!"," ,!-")
print out
#>>> ['This', 'is', 'a', 'test', 'of', 'the', 'string', 'separation', 'code']
out = split_string("After the flood ... all the colors came out.", " .")
print out
#>>> ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']
out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
print out
#>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code']
out = split_string(" After the flood ... all the colors came out...............", " ."
print out
#>>>['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']
You don't need the inner loop call. Just this is enough:
def split_string(source,splitlist):
result = []
for e in source:
if e in splitlist:
end = source.find(e)
result.append(source[0:end])
source = source[end+1:]
return result
You can eliminate the "junk" (that is, the empty string), by checking if source[:end] is an empty string or not before you append it to the list.
You seem to be expecting:
source = tmp[start:]
To modify the source
that the outer for loop is iterating over. It won't - that loop will keep going over the string you gave it, not whatever object is now using that name. This can mean that the character you're up to mightn't be in what's left of source
.
Instead of trying to do that, keep track of the current index in the string this way:
for i, e in enumerate(source):
...
and what you're appending will always be source[lasti+1:i]
, and you just need to keep track of lasti
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With