Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3: Removing u200b (zwsp) and newlines (\n) and spaces - chaining List operations?

I'm really stumped as to why this doesn't work. All I want to do is removing zwsp (u200b), and newlines and extra spaces from content read from a file.

Ultimately, I want to write this out to a new file, which I have functional, just not in the desired format yet.

My input (a short test file, which has zwsp / u200b in it) consists of the following:

Australia 1975
    ​Adelaide   ​ 2006  ​ 23,500
    Brisbane (Logan)     2006    29,700
    ​Brisbane II (North Lakes)  ​ 2016  ​ 29,000

Austria 1977
    Graz     1989    26,100
    Innsbruck    2000    16,000
    Klagenfurt   2008    27,000

My code so is as follows:

input_file = open('/home/me/python/info.txt', 'r')
file_content = input_file.read()
input_file.close()

output_nospace = file_content.replace('\u200b' or '\n' or ' ', '')

print(output_nospace)

f = open('nospace_u200b.txt', 'w')
f.write(output_nospace)
f.close()

However, this doesn't work as I expect.

Whilst it removes u200b, it does not remove newlines or spaces. I have to test for absence of u200b by checking the output file produced as part of my script.

If I remove one of the operations, e.g. /u200b, like so:

output_nospace = file_content.replace('\n' or ' ', '')

...then sure enough the resulting file is without newlines or spaces, but u200b remains as expected. Revert back to the original described at the top of this post, and it doesn't remove u200b, newlines and spaces.

Can anyone advise what I'm doing wrong here? Can you chain list operations like this? How can I get this to work?

Thanks.

like image 540
Fiddy Bux Avatar asked Dec 10 '25 01:12

Fiddy Bux


2 Answers

The result of code like "a or b or c" is just the first thing of a, b, or c that isn't considered false by Python (None, 0, "", [], and False are some false values). In this case the result is the first value, the zwsp character. It doesn't convey to the replace function that you're looking to replace a or b or c with ''; the replace code isn't informed you used 'or' at all. You can chain replacements like this, though: s.replace('a', '').replace('b', '').replace('c', ''). (Also, replace is a string operation, not a list operation, here.)

Based on this question, I'd suggest a tutorial like learnpython.org. Statements in Python or other programming languages are different from human-language sentences in ways that can confuse you when you're just starting out.

like image 147
twotwotwo Avatar answered Dec 11 '25 21:12

twotwotwo


As indicated by @twotwotwo, the following implementation of a .replace chain solves the issue.

output_nospace = \
file_content.replace('\u200b', '').replace('\n', '').replace(' ', '')

Thanks so much for pointing me in the right direction. :)

like image 38
Fiddy Bux Avatar answered Dec 11 '25 21:12

Fiddy Bux



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!