I am trying to replace newline characters in a unicode string and seem to be missing some magic codes.
My particular example is that I am working on AppEngine and trying to put titles from HTML pages into a db.StringProperty()
in my model.
So I do something like:
link.title = unicode(page_title,"utf-8").replace('\n','').replace('\r','')
and I get:
Property title is not multi-line
Are there other codes I should be using for the replace?
LF (character : \n, Unicode : U+000A, ASCII : 10, hex : 0x0a): This is simply the '\n' character which we all know from our early programming days. This character is commonly known as the 'Line Feed' or 'Newline Character'.
In Windows, a new line is denoted using “\r\n”, sometimes called a Carriage Return and Line Feed, or CRLF.
A newline is a character used to represent the end of a line of text and the beginning of a new line. With early computers, an ASCII code was created to represent a new line because all text was on one line.
Try ''.join(unicode(page_title, 'utf-8').splitlines())
. splitlines()
should let the standard library take care of all the possible crazy Unicode line breaks, and then you just join them all back together with the empty string to get a single-line version.
Python uses these characters for splitting in unicode.splitlines()
:
As Hank says, using splitlines()
will let Python take care of all of the details for you, but if you need to do it manually, then this should be the complete list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With