解決python中文亂碼問題方法總結

在執行這樣類似的**：

#!/usr/bin/env pythons="中文"print s

最近經常遇到這樣的問題：

syntaxerror: non-ascii character '\xe4' in file e:\coding\python\untitled 6.py on line 3, but no encoding declared; see 程式設計客棧hon.org/peps/pep-0263.html for details

unicodedecodeerror: 'ascii' codec can't decode byte 0xe5 in position 108: ordinal not in range(128)

unicodeencodeerror: 'gb2312' codec can't encode character u'\u2014' in position 72366: illegal multibyte sequence

這些都是跟字元編碼有關的問題，很鬱悶，中文總是弄不出來，找了很多方案，這裡有些是我前幾天找到的一些方案，拿出來給大家分享一下哈

字串在python內部的表示是unicode 編碼，因此，在做編碼轉換時，通常需要以unicode作為中間編碼，即先將其他編碼的字串解碼（decode）成unicode，再從unicode編碼（encode）成另一種編碼。

decode的作用是將其他編碼的字串轉換成unicode編碼，如str1.decode('gb2312')，表示將gb2312編碼的字串str1轉換成unicode編碼。

encode的作用是將unicode編碼轉換成其他編碼的字串，如str2.encode('gb2312')，表示將unicode編碼的字串str2轉換成gb2312編碼。

在某些ide中，字串的輸出總是出現亂碼，甚至錯誤，其實是由於ide的結果輸出控制台自身不能顯示字串的編碼，而不是程式本身的問題。

如在ulipad中執行如下**：

s=u"中文"print s

會提示：

unicodeencodeerror: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

這是因為ulipadwww.cppcns.com在英文windowsxp 上的控制台資訊輸出視窗是按照ascii編碼輸出的（英文系統的預設編碼是ascii），而上面**中的字串是unicode編碼的，所以輸出時產生了錯誤。

將最後一句改為：print s.encode('gb2312')

則能正確輸出「中文」兩個字。

www.cppcns.com若最後一句改為：print s.encode('utf8')

則輸出：\xe4\xb8\xad\xe6\x96\x87，這是控制台資訊輸出視窗按照ascii編碼輸出utf8編碼的字串的結果。

下面**可能比較通用一些，如下:

#!/usr/bin/env python #coding=utf-8 s="中文"if isinstance(s,unicode): #s=u"中文" print s.encode('gb2312') else: #s="中文" print s.decode('utf-8').encode('gb2312')#!/usr/bin/env python#coding=utf-8s="中文"if isinstance(s,unicode): #s=u"中文" print s.encode('gb2312')else: #s="中文" print s.decode('utf-8').encode('gb2312')

看看下面一段**：

#!/usr/bin/env python #coding=utf-8 #python version:2.7.4 #system:windows xp import httplib2def getpagecontent(url): ''''' 使用httplib2用程式設計的方式根據url獲取網頁內容將bytes形式的內容轉換成utf-8的字串 ''' #使用ie9的user-agent，如果不設定user-agent將會得到403禁止訪問 headers= if url: response,content= httplib2.http().request(url,headers=headers) if response.status== 200 : return content

import sys reload(sys) sys.setdefaultencoding('utf-8') #修改預設編碼方式，預設為ascci print sys.getdefaultencoding() content= getpagecontent("")print content.decode('utf-8').encode('gb2312')#!/usr/bin/env python#coding=utf-8#python version:2.7.4#system:windows xpimport httplib2def getpagecontent(url): ''' 使用httplib2用程式設計的方式根據url獲取網頁內容將bytes形式的內容轉換成utf-8的字串 ''' #使用ie9的user-agent，如果不設定user-agent將會得到403禁止訪問 headers= if url: response,content= httplib2.http().request(url,headers=headers) if response.status== 200 : return content

import sysreload(sys)sys.setdefaultencoding('utf-8') #修改預設編碼方式，預設為ascciprint sys.getdefaultencoding()content= getpagecontent("")print content.decode('utf-8').encode('gb2312')

上面的**的意思：向www.jb51.net**請求他的主頁，（如果直接是utf-8編碼，不能輸出中文）想將編碼方式為utf-8轉向gd2312,出現問題三

當我把它將print content.decode('utf-8').encode('gb2312')改成print content.decode('utf-8').encode('gb2312', 『ignore')時，ｏｋ了，可以顯示中文了，但不敢確定是否為全部，貌似只有部分吧，有些不能用gb2312編碼

然而，當我把**換成 www.soso.com時，不用轉為gb2312,用utf-8即可正常顯示中文

總結一下：

向檔案直接輸出ss會丟擲同樣的異常。在處理unicode中文字串的時候，必須首先對它呼叫encode函式，轉換成其它編碼輸出。這一點對各個程式設計客棧環境都一樣。在python中，「str」物件就是乙個位元組陣列，至於裡面的內容是不是乙個合法的字串，以及這個字串採用什麼編碼（gbk, utf-8, unicode）都不重要。這些內容需要使用者自己記錄和判斷。這些的限制也同樣適用於「unicode」物件。要記住「unicode」物件中的內容可絕對不一定就是合法的unicode字串，我們很快就會看到這種情況。在windows的控制台上，支援gbk編碼的str物件和unicode編碼的unicode物件。

解決python中文亂碼問題方法總結

Python解決中文亂碼問題

linux php mysql 中文亂碼解決方案

Spring MVC POST中文亂碼解決方案

解決python中文亂碼問題方法總結

Python解決中文亂碼問題

linux php mysql 中文亂碼解決方案

Spring MVC POST中文亂碼解決方案

相關推薦