Python中的編碼問題

花了點時間研究了一下python中的編碼問題，在python2.*的版本中存在著以下兩種字串的型別：

【python字串中的str型別和unicode型別】

因此，同樣是「編碼」兩個漢字，分別用str型別和unicode型別儲存時對應的type完全不一樣，以下是他們賦值操作的區別：

【str型別和unicode型別的相互轉換】

當然，python2提供了這兩種字串型別的轉換，encode的本意是從unicode型別轉str型別，decode反之。但是在python 2.7版本中，還存在str型別.encode(「utf8」)變成另外乙個str型別的情況，這讓函式的呼叫更加複雜。在anaconda python2.7版本中，中文str型別.encode(「utf8」)是不支援的，會直接拋exception。但是在ironpython 2.7版本中，中文str型別.encode(「utf8」)是支援的，但是會把中文encode成亂碼，如下兩圖所示：

anaconda python2.7 不支援中文str型別.encode(「utf8」)

ironpython2.7支援中文str型別.encode(「utf8」)

【json.dumps()函式的ensure_ascii引數】

json.dumps()函式在anaconda python2.7和ironpython2.7的行為也是不同的，這個函式有乙個重要引數ensure_ascii。python2.7官方文件對這個引數的解釋是：if ensure_ascii is true (the default), all non-ascii characters in the output are escaped with

\u***x

sequences, and the result is a

str

instance consisting of ascii characters only. ifensure_ascii is false, some chunks written to fp may be

unicode

anaconda python2.7 不支援json.dumps(中文object, ensure_ascii=true)

ironpython2.7 多數情況支援json.dumps(中文object, ensure_ascii=true)，個別情況不支援json.dumps(中文object, ensure_ascii=true)

例如 traceid:938e8b9c5289aebba98a81b146982d6a

[python3的行為]

python3中str型別直接變成了class str而不是type str，同時encode的結果直接變成了bytes class，避免了胡亂encode的問題。但是如果想要列印乙個中文object仍然需要json.dumps(中文object, ensure_ascii=false)

Python中的編碼問題

python中的編碼問題

python中編碼問題

python中編碼問題

Python中的編碼問題

python中的編碼問題

python中編碼問題

python中編碼問題

相關推薦