python2 7中文編碼 python2 7

我從外部api中獲得了乙個字串：

"\u4ece\u8d77\u70b9\u5411\u6b63\u5357\u65b9\u5411\u51fa\u53d1,\u884c\u9a76170\u7c73,\u76f4\u884c\u8fdb\u5165\u4e2d\u5173\u6751\u4e1c\u8def"

被傳給了變數a，以utf8位元組串，所以：

a'\\u4ece\\u8d77\\u70b9\\u5411\\u6b63\\u5357\\u65b9\\u5411\\u51fa\\u53d1,\\u884c\\u9a76170\\u7c73,\\u76f4\\u884c\\u8fdb\\u5165\\u4e2d\\u5173\\u6751\\u4e1c\\u8def'

print a

\u4ece\u8d77\u70b9\u5411\u6b63\u5357\u65b9\u5411\u51fa\u53d1,\u884c\u9a76170\u7c73,\u76f4\u884c\u8fdb\u5165\u4e2d\u5173\u6751\u4e1c\u8def

這就是已經成了這樣的現實狀態。

我要把a變成unicode字元要怎麼做？

# 把a變成：

u'\u4ece\u8d77\u70b9\u5411\u6b63\u5357\u65b9\u5411\u51fa\u53d1,\u884c\u9a76170\u7c73,\u76f4\u884c\u8fdb\u5165\u4e2d\u5173\u6751\u4e1c\u8def'無視a

裡的那些轉義

麼？那u'' + a

就好了……

import re

str='\\u4ece\\u8d77\\u70b9\\u5411\\u6b63\\u5357\\u65b9\\u5411\\u51fa\\u53d1\\u884c\\u9a76170\\u7c73,\\u76f4\\u884c\\u8fdb\\u5165\\u4e2d\\u5173\\u6751\\u4e1c\\u8def'

pat=re.compile('\\\\u((\d|[a-f]))')

def change_str(u_str):

return unichr(int(u_str.group(1),16))

tmp=pat.sub(change_str,str)

print(tmp)

弄了老半天只有這樣了。。。不知道有沒更好的。。

樓下的str.decode(『unicode-escape』)直接可以

str='\\u4ece\\u8d77\\u70b9\\u5411\\u6b63\\u5357\\u65b9\\u5411\\u51fa\\u53d1\\u884c\\u9a76170\\u7c73,\\u76f4\\u884c\\u8fdb\\u5165\\u4e2d\\u5173\\u6751\\u4e1c\\u8def'

print(str.decode('unicode-escape'))

出現這個問題的主要原因是python吧unicode編碼當成了普

通的字串，因此把原來的斜槓又加了乙個斜槓進行轉義

個人認為比較簡單的解決方法是用 decode(『unicode-escape』)

str = 「\u4ece\u8d77\u70b9\u5411\u6b63\u5357\u65b9\u5411\u51fa\u53d1」

uni_str = str.decode(『unicode-escape』)

print uni_str

簡單粗暴

python2 7中文編碼 python2 7

python2 7中文編碼報錯問題

python2 7 中文顯示

python 2 7中文亂碼

python2 7中文編碼 python2 7

python2 7中文編碼報錯問題

python2 7 中文顯示

python 2 7中文亂碼

相關推薦