python的編碼與解碼

#########################
# python中的字元型別 #
#########################
# python中的字元型別分兩種：
# 1. str型別：ascii表中的字元，佔乙個位元組，所以也叫位元組字元。字面量用雙引號表示。
s = 'ok, '
u = u'我, '
u1 = u'我'
u2 = u'愛python'
print 's:', s
print 'u1:', u1
print 'u2:', u2
# 解析器通常把unicode字元轉換成unicode轉義序列
# 轉義序列以"\u"開頭
print 'repr(s): ', repr(s)
print 'repr(u1): ', repr(u1)
print 'repr(u2): ', repr(u2)
# 這種轉義序列只在unicode字面量中有效
print "'\u6211': ", '\u6211'
print "u'\u6211': ", u'\u6211'
print ''
# 也可以用str()函式建立str字串，用unicode()函式建立unicode字串
print 'type of str(s): ', type(str(s))
print 'type of unicode(s): ', type(unicode(s))
# 可以給unicode()函式傳入乙個unicode字串
print 'type of unicode(u): ', type(unicode(u))
# 但是如果我們給unicode()函式傳入'我'，又會怎樣呢？
try:
print "unicode('我'):", unicode('我')
except unicodedecodeerror as e:
# 錯誤資訊
# 解析器試圖用ascii編碼來解碼我們傳入的引數，原因會在下面將到
print e
#########################
# 編碼和解碼 #
#########################
# 編碼的過程其實就是採用一定的編碼格式將unicode字元轉換成str字元的過程
# 非ascii碼字元按位元組為單位被編碼成十六進製制轉義字元
# 解碼採用的編碼格式跟設定和環境有關
utf8_s = s.encode('utf-8')
utf8_u = u.encode('utf-8')
utf8_u1 = u1.encode('utf-8')
utf8_u2 = u2.encode('utf-8')
print 'utf8_s: ', s
print 'repr utf8_u: ', repr(s)
print 'utf8_u: ', u
print 'repr utf8_u: ', repr(utf8_u)
print 'utf8_u1: ', utf8_u1
print 'repr utf8_u1: ', repr(utf8_u1)
print 'utf8_u2:', utf8_u2
print 'repr utf8_u2: ', repr(utf8_u2)
# 如果我們的str字面量中有非ascii碼字元，解析器會自動對其進行編碼
print "'我愛python': ", '我愛python'
print "repr '我愛python': ", repr('我愛python')
# 來看看上面碰到的問題，我們將帶『我』（str型別）傳給unicode函式，結果報錯了
try:
print "unicode('我'):", unicode('我')
except unicodedecodeerror as e:
# 發生錯誤了，解析器試圖用ascii編碼來解碼我們傳入的引數
print e
# 原因就是解析器會先將引數用預設的編碼格式（這裡是utf-8）進行編碼，然後傳給unicode()函式，
# unicode函式的幫助資訊，其中有段是這麼說的：
'''unicode(string[, encoding[, errors]]) -> unicode object
| | create a new unicode object from the given encoded string.
| encoding defaults to the current default string encoding.
'''# unicode類總會用第二個引數指定的編碼格式來解碼第乙個引數，如果第二個引數為空，就採用預設的格式。
# 指令碼開頭指定了utf-8編碼格式，因此這裡傳入的『我』被自動採用utf-8進行編碼。
# 可是這裡unicode並沒有採用我們開頭指定的utf-8格式來解碼，而是ascii碼，那當然會報錯。
# 為什麼會採用ascii碼，我估計原因是這樣的，python 2.7.x在解析器內都是預設採用ascii作為預設編碼格式的，
# 而我們在檔案開頭指定的utf-8格式只對本檔案中的字串字面量有效，而unicode類是定義在其他的模組檔案裡。

Python 編碼與解碼

字串型別是對人類友好的符號，但計算機只認識一種符號，那就是二進位制 binary 數，或者說是數字。為了用計算機可以理解的數字描述人類使用的字元，我們需要一張數字與字元對應的表。我們都知道在計算機中 1 byte 8bits，可以儲存 0 255共256個值，也就是說 1byte最多可以表示 256...

python的編碼與解碼

1.特殊情況 xe5 xae x9d xe9 x python2控制台bai輸出會有這種情況，包括以下list裡面的漢字雖du然是utf8格式的但仍然zhi不可見中文。只需要包要檢視的list轉為str並decode string escape 例子為 li 33,39 寶馬 36,39 馬 pri...

編碼與解碼 python 經驗

位元 bit 也稱二進位制位，指二進位制中的一位，是計算機資訊的最小單位。bit是binary digit 二進位制數字的縮寫，還可被縮寫為b。位元組港澳台稱位元組,byte 乙個位元組代表8個位元，也被縮寫為b，在工業標準網路電信技術中也被成為八位組 octet 字面量，可以理解為給人看的...

python的編碼與解碼

Python 編碼與解碼

python的編碼與解碼

編碼與解碼 python 經驗

相關推薦