Python3 讀取中文檔案txt編碼問題

嘗試用python寫乙個wordcloud的時候，出現了編碼問題。

照著網上某些部落格的說法添添改改後，結果是變成了「unicodedecodeerror: 『utf-8』 codec can』t decode byte…」這個錯誤。

搗鼓了一天啊，txt（此處為本人現下內心表情）。最後，乾脆寫個最簡單的檔案讀取，竟然還是報錯。於是就考慮是不是txt的編碼問題，因為讀取的txt檔案是在mac上面新建的純文字檔案，一時沒找到在**檢視編碼，最後拷貝到windows系統上，檢視了txt檔案的編碼，竟然是ascii，不是我最愛的utf-8，mac你辜負了我對你的一番信任啊！ε(┬┬﹏┬┬)3

將txt檔案的編碼格式改為utf-8即可

此外，在開啟檔案的時候，要加上第三個引數encoding=『utf8』(沒有橫槓)。

with
open
('./test3.txt'
,'r'
,encoding=
'utf8'
)as fin:
for line in fin.
readlines()
: line = line.
strip
('\n'
)

import jieba
import jieba.analyse
from matplotlib import pyplot as plt
from scipy.misc import imread
from wordcloud import wordcloud,
stopwords
,imagecolorgenerator
# 1.讀取資料
with
open
("./test.txt"
,"r"
,encoding=
"utf8"
)as f:
text = f.
read()
keywords = jieba.analyse.
textrank
(text, topk=
50, withweight=false, allowpos=
('ns'
,'n'
,'vn'
,'v'))
file =
",".
join
(keywords)
# 指定中文字型，不然中文顯示框框
font = r'./hyqihei-25j.ttf'
print
(file)
# 指定背景圖,隨意
,#背景色
mask=image,#背景圖
stopwords=
stopwords
,#設定停用詞
max_words=
100,#設定最大文字數
max_font_size=
100,#設定最大字型
width=
800,
height=
1000,)
#生成詞云
image_colors =
imagecolorgenerator
(image)
wc.generate
(file)
# 使用matplotlib,顯示詞雲圖
plt.
imshow
(wc) #顯示詞雲圖
plt.
axis
('off'
) #關閉座標軸
plt.
show()
# 儲存
Python3 中文檔案讀寫
字串在python內部的表示是unicode編碼，因此，在做編碼轉換時，通常需要以unicode作為中間編碼，即先將其他編碼的字串解碼 decode 成unicode，再從unicode編碼 encode 成另一種編碼。在新版本的python3中，取消了unicode型別，代替它的是使用unicod...
python讀取中文檔案亂碼
近期在使用python對中文檔案進行讀寫操作時，發現讀入的中文檔案出現亂碼現象，如圖 後查閱相關資料發現，python預設使用的uascii編碼，而中文需要使用unicode編碼，因此需要對讀入的中文進行格式轉換。轉碼命令為 或者 此時，讀入的文字型別為unicode型別。如果讀入的文字中包含部分資...
python 3讀取檔案 Python3 檔案讀寫
python open 方法用於開啟乙個檔案，並返回檔案物件，在對檔案進行處理過程都需要使用到這個函式 1.讀取檔案 with open test json dumps.txt mode r encoding utf 8 as f seek 移動游標至指定位置 f.seek 0 read 讀取整個檔...

Python3 讀取中文檔案txt編碼問題

Python3 中文檔案讀寫

python讀取中文檔案亂碼

python 3讀取檔案 Python3 檔案讀寫

相關推薦