文件詞云展示

2021-10-04 06:21:11 字數 1548 閱讀 2697

思路:讀取源文件,形成分詞列表,然後讀取停用詞,將不必要的詞語進行去除,然後統計詞頻,詞云引數設定,繪製詞雲圖。

"""匯入相關庫"""

import jieba

import pandas as pd

from imageio import imread

from wordcloud import wordcloud

from matplotlib import pyplot as plt

"""讀取原始檔,然後形成分詞列表"""

with open('e:/yuanwenjian.txt','r',encoding='utf-8') as f:

txt = f.read()

txt = txt.split()

data_cut = [jieba.lcut(x) for x in txt] #分詞後結果,形式為二維列表(裡面是列表)

all_words = #轉化成一維列表(裡面是字串)

for i in data_cut:

all_words.extend(i)

# all_words.count('詞語') #統計詞頻

"""讀取停用詞文件"""

with open("e:\\stopwords.txt",'r',encoding='utf-8') as f:

stop=f.read()

stop = stop.split()

stop = [' ']+stop

data_after = [[j for j in i if j not in stop] for i in data_cut] #判斷是否為停用詞

"""統計詞頻"""

all_words =

for i in data_after:

all_words.extend(i)

num = pd.series(all_words).value_counts()

"""讀取背景"""

pic = imread('e:/素材/logo/python.png')

"""詞云引數"""

wc = wordcloud(background_color = 'white',font_path='‪c:\\windows\\fonts\\simkai.ttf',mask=pic)

'''wc = wordcloud(background_color = 'white',

font_path='‪c:\\windows\\fonts\\simkai.ttf',

max_words=200,

max_font_size=10,

mask=pic)

'''wc2 = wc.fit_words(num) #詞頻傳入

"""詞云展示"""

plt.figure(figsize=(9,9)) #的大小

plt.imshow(wc2)

plt.axis('off') #關閉座標

plt.show()

wc.to_file("ciyun.png") #儲存

文字識別的起始應用與展示 詞云

from wordcloud import wordcloud 詞云本雲 import matplotlib.pyplot as plt 作圖利器 import jieba import pandas as pd import matplotlib.image as mpimg import num...

python電影名稱詞云 python 詞云

1 寫詞云的思路 資料採集 分詞 生成詞云 2 用到的模組 wordcloud。如果沒有這個模組,cmd進入python所在目錄的scripts資料夾,通過pip安裝。pip install wordcloud。安裝出現以下錯誤 3 開始編碼 匯入模組 from wordcloud import w...

文件詞袋模型

詞袋模型記錄了單詞在詞彙表中出現的次數。def loaddataset 建立文件集合以及標籤 postinglist my dog has flea problems help please maybe not take him to dog park stupid my dalmation is ...