Skip to content Skip to sidebar Skip to footer

Converting Utf-8 Characters To Scandic Letters

I am struggling with trying to encode a string where scandic letters are in utf-8 format. For example, I would like to convert following string: test_string = '\xc3\xa4\xc3\xa4abc'

Solution 1:

I can't reproduce your reading the soup message from an incoming webhook code snippet; therefore, my answer is based on hard-coded data, and shows how Python specific text encodings raw_unicode_escape and unicode_escape work in detail:

test_string = "\\xc3\\xa5\\xc3\\xa4___\xc3\xa5\xc3\xa4"# hard-codedprint('test_string                  ', test_string)
print('.encode("raw_unicode_escape")',
  test_string.encode( 'raw_unicode_escape'))
print('.decode(    "unicode_escape")',
  test_string.encode( 'raw_unicode_escape').decode( 'unicode_escape'))
print('.encode("latin1").decode()   ', 
  test_string.encode( 'raw_unicode_escape').decode( 'unicode_escape').
              encode( 'latin1').decode( 'utf-8'))

Output: \SO\68069394.py

test_string                   \xc3\xa5\xc3\xa4___åä
.encode("raw_unicode_escape") b'\\xc3\\xa5\\xc3\\xa4___\xc3\xa5\xc3\xa4'
.decode(    "unicode_escape") åä___åä
.encode("latin1").decode()    åä___åä

Solution 2:

Based on the original question and the discussion in the comments, I suspect that you're just not saving the results of the conversion. Python strings are immutable, and so just making changes to a string that's passed into a function won't do anything to the original string:

In [42]: def change_string(s):
    ...:     s = "hello world"
    ...:
    ...: test_s = "still here"
    ...: change_string(test_s)
    ...: print(test_s)
still here

Instead, you'll want to return the results of the conversion in the function and reassign the variable:

In [43]: def change_string(s):
    ...:     s = s.encode('latin1').decode('u8')
    ...:     return s
    ...:
    ...: test_s = "\xc3\xa4\xc3\xa4abc"
    ...: test_s = change_string(test_s)
    ...: print(test_s)
ääabc

Post a Comment for "Converting Utf-8 Characters To Scandic Letters"