Removing Non-breaking Spaces From Strings Using Python
Solution 1:
You don't have a unicode string, but a UTF-8 list of bytes (which are what strings are in Python 2.x).
Try
myString = myString.replace("\xc2\xa0", " ")
Better would be to switch to unicode -- see this article for ideas. Thus you could say
uniString = unicode(myString, "UTF-8")
uniString = uniString.replace(u"\u00A0", " ")
and it should also work (caveat: I don't have Python 2.x available right now), although you will need to translate it back to bytes (binary) when sending it to a file or printing it to a screen.
Solution 2:
I hesitate before adding another answer to an old question, but since Python3 counts a Unicode "non-break space" character as a whitespace character, and since strings are Unicode by default, you can get rid of non-break spaces in a string s
using join
and split
, like this:
s = ' '.join(s.split())
This will, of course, also change any other white space (tabs, newlines, etc). And note that this is Python3 only.
Solution 3:
No, u"\u00A0"
is the escape code for non-breaking spaces. "\u00A0"
is 6 characters that are not any sort of escape code. Read this.
Solution 4:
Please note that a simple myString.strip()
will remove not only spaces, but also non-breaking-spaces from the beginning and end of myString. Not exactly what the OP asked for, but still very handy in many cases.
Solution 5:
You can simply solve this issue by enforcing the encoding.
cleaned_string = myString.encode('ascii', 'ignore')
Post a Comment for "Removing Non-breaking Spaces From Strings Using Python"