Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cleaning up a string without split/strip/built-in functions

My requirements

Use Python to create a function cleanstring(S) to "clean up" the spaces in a sentence S.

  • The sentence may have extra spaces at the front and/or at the end and/or between words.
  • The subroutine returns a new version of the sentence without the extra spaces.
    • That is, in the new string, the words should be the same but there should be no spaces at the start, only one space between each word and no spaces at the end.

This program is about you writing code to search through a string to find words and so you are not allowed to use the split function in Python.

You can solve this problem with the basic capabilities of the if and while statements and string operations of len and concatentation.

For example: if the input is: " Hello to the world !" then the output should be: "Hello to the world!"

Question

My program deletes more characters in the program than needed.

Input: " Hello World ! "

Output: "HellWorl"

How do I fix the error in my program?

def cleanupstring (S):
    newstring = ["", 0]
    j = 1
    for i in range(len(S) - 1):
        if S[i] != " " and S[i+1] != " ":
            newstring[0] = newstring[0] + S[i]
        else:
            newstring[1] = newstring [1] + 1
    return newstring

# main program

sentence = input("Enter a string: ")

outputList = cleanupstring(sentence)

print("A total of", outputList[1], "characters have been removed from your 
string.")
print("The new string is:", outputList[0]) 
like image 210
Mickey Sawkiewicz Avatar asked Oct 16 '22 10:10

Mickey Sawkiewicz


2 Answers

Welcome to Stackoverflow. When I started reading I though this was going to be a "please answer my homework" question, but you've actually made a pretty fair effort at solving the problem, so I'm happy to try and help (only you can say whether I actually do).

It's sometimes difficult when you are learning a new language to drop techniques that are much more appropriate in other languages. Doing it character by character you normally just use for c in s rather than incrementing index values like you would in C (though either approach works, index incrementation where not necessary is sometimes regarded as "unpythonic"). Your basic idea seems to be to detect a space followed by another space, otherwise copying characters from the input to the output.

The logic can be simplified by retaining the last character you sent to the output. If it's a space, don't send any more spaces. A loop at the front gets rid of any leading spaces, and since there can be at most one space at the end it can be eliminated easily if present.

I'm not sure why you use a list to keep your results in, as it makes the code much more difficult to understand. If you need to return multiple pieces of information it's much easier to compute them in individual variables and then construct the result in the return statement.

So one desirable modification would be to replace newstring[0] with, say, out_s and newstring[1] with, say count. That will make it a bit clearer what's going on. Then at the end return [out_s, count] if you really need a list. A tuple using return out_s, count would be more usual.

def cleanupstring (s):
    out_s = ''
    count = 0
    last_out = ' '
    for c in s:
        if c != ' ' or last_out != ' ':
            last_out = c
            out_s += c
        else:
            count += 1
    if last_out == ' ':
        count -= 1
        out_s = out_s[:-1]
    return out_s, count

# main program

sentence = input("Enter a string: ")

outputList = cleanupstring(sentence)

print("A total of", outputList[1], "characters have been removed from your string.")
print("The new string is:", outputList[0])

Sometimes you just don't have certain pieces of information that would help you to answer the question extremely succinctly. You most likely haven't yet been taught about the strip and replace methods, and so I imagine the following (untested) code

def cleanupstring(s):
    out_s = s
    while '  ' in out_s:
        out_s = out_s.strip().replace('  ', ' ')
    return out_s, len(s)-len(out_s)

would be right out.

Also, you can use an "unpacking assignment" to bind the different elements of the function's output directly to names by writing

s, c = cleanupstring(...)

I'm sure you will agree that

print("A total of", c, "characters have been removed from your string.")
print("The new string is:", s)

is rather easier to read. Python values readability so highly because with readable code it's easier to understand the intent of the author. If your code is hard to understand there's a good chance you still have some refactoring to do!

like image 171
holdenweb Avatar answered Oct 30 '22 03:10

holdenweb


If the "space" it's literally spaces rather than whitespace then the following would work:

import re
def clean_string(value):
     return re.sub('[ ]{2,}', ' ', value.strip())

If the stripped values contains consecutive spaces then replace with one space.

like image 20
Django Doctor Avatar answered Oct 30 '22 03:10

Django Doctor