Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string split issue

Problem: split a string into a list of words by a delimiter characters passed in as a list.

String: "After the flood ... all the colors came out."

Desired output: ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

I have written the following function - note I am aware that there are better ways to split a string using some of pythons built in functions but for sake of learning I thought I would proceed this way:

def split_string(source,splitlist):
    result = []
    for e in source:
           if e in splitlist:
                end = source.find(e)
                result.append(source[0:end])
                tmp = source[end+1:]
                for f in tmp:
                    if f not in splitlist:
                        start = tmp.find(f)
                        break
                source = tmp[start:]
    return result

out = split_string("After  the flood   ...  all the colors came out.", " .")

print out

['After', 'the', 'flood', 'all', 'the', 'colors', 'came out', '', '', '', '', '', '', '', '', '']

I can't figure out why "came out" is not split into "came" and "out" as two separate words. Its like as if the whitespace character between the two words is being ignored. I think the remainder of the output is junk that stems from the problem associated with the "came out" problem.

EDIT:

I followed @Ivc's suggestion and came up with the following code:

def split_string(source,splitlist):
    result = []
    lasti = -1
    for i, e in enumerate(source):
        if e in splitlist:
            tmp = source[lasti+1:i]
            if tmp not in splitlist:
                result.append(tmp)
            lasti = i
        if e not in splitlist and i == len(source) - 1:
            tmp = source[lasti+1:i+1]
            result.append(tmp)
    return result

out = split_string("This is a test-of the,string separation-code!"," ,!-")
print out
#>>> ['This', 'is', 'a', 'test', 'of', 'the', 'string', 'separation', 'code']

out = split_string("After  the flood   ...  all the colors came out.", " .")
print out
#>>> ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
print out
#>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code']

out = split_string(" After  the flood   ...  all the colors came out...............", " ."
print out
#>>>['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']
like image 361
codingknob Avatar asked May 30 '12 02:05

codingknob


2 Answers

You don't need the inner loop call. Just this is enough:

def split_string(source,splitlist):
    result = []
    for e in source:
           if e in splitlist:
                end = source.find(e)
                result.append(source[0:end])
                source = source[end+1:]
    return result

You can eliminate the "junk" (that is, the empty string), by checking if source[:end] is an empty string or not before you append it to the list.

like image 94
Kiet Tran Avatar answered Sep 24 '22 07:09

Kiet Tran


You seem to be expecting:

source = tmp[start:]

To modify the source that the outer for loop is iterating over. It won't - that loop will keep going over the string you gave it, not whatever object is now using that name. This can mean that the character you're up to mightn't be in what's left of source.

Instead of trying to do that, keep track of the current index in the string this way:

for i, e in enumerate(source):
   ...

and what you're appending will always be source[lasti+1:i], and you just need to keep track of lasti.

like image 38
lvc Avatar answered Sep 22 '22 07:09

lvc