Skip to content Skip to sidebar Skip to footer

Sqlalchemy Result For Utf-8 Column Is Of Type 'str', Why?

I have a SQL query that I execute like this with an SQLAlchemy engine: result = engine.execute('SELECT utf_8_field FROM table') The database is MySQL and the column type is TEXT w

Solution 1:

If you want the data converted automatically, you should specify the charset when you create the engine:

create_engine('mysql+mysqldb:///mydb?charset=utf8')

Setting use_unicode alone won't tell sqlalchemy which charset to use.

Solution 2:

To convert from an UTF-8 bytestring to a unicode object, you need to decode:

utf_8_field.decode('utf8')

Also, when executing a raw SELECT through .execute, SQLAlchemy has no metadata to work out that your query is returning utf-8 data, so it is not converting this information to unicode for you.

In other words, convert_unicode only works if you use the SQLAlchemy SQL expression API or the ORM functionality.

EDIT: As pointed out, your data is not even UTF-8 encoded; 0xe9 in UTF-8 would indicate a character between \u9000 and \u9fff, which are CJK unified ideographs while you said it was a latin-1 character, whose UTF-8 code would start with 0xc3. This is probably ISO-8859-1 (latin-1) or similar instead:

>>> u'é'.encode('ISO-8859-1')
'\xe9'

The conclusion then is to tell SQLAlchemy to connect with a different character set, using the charset=utf8 parameter, as pointed out by @mata.

Post a Comment for "Sqlalchemy Result For Utf-8 Column Is Of Type 'str', Why?"