Removing Non-breaking Spaces From Strings Using Python

October 07, 2023 Post a Comment

I am having some trouble with a very basic string issue in Python (that I can't figure out). Basically, I am trying to do the following: '# read file into a string myString = fi

Solution 1:

You don't have a unicode string, but a UTF-8 list of bytes (which are what strings are in Python 2.x).

Try

myString = myString.replace("\xc2\xa0", " ")

Better would be to switch to unicode -- see this article for ideas. Thus you could say

uniString = unicode(myString, "UTF-8")
uniString = uniString.replace(u"\u00A0", " ")

and it should also work (caveat: I don't have Python 2.x available right now), although you will need to translate it back to bytes (binary) when sending it to a file or printing it to a screen.

Solution 2:

I hesitate before adding another answer to an old question, but since Python3 counts a Unicode "non-break space" character as a whitespace character, and since strings are Unicode by default, you can get rid of non-break spaces in a string s using join and split, like this:

s = ' '.join(s.split())

This will, of course, also change any other white space (tabs, newlines, etc). And note that this is Python3 only.

Baca Juga

Solution 3:

No, u"\u00A0" is the escape code for non-breaking spaces. "\u00A0" is 6 characters that are not any sort of escape code. Read this.

Solution 4:

Please note that a simple myString.strip() will remove not only spaces, but also non-breaking-spaces from the beginning and end of myString. Not exactly what the OP asked for, but still very handy in many cases.

Solution 5:

You can simply solve this issue by enforcing the encoding.

cleaned_string = myString.encode('ascii', 'ignore')

Getting Started with Python