python3 怎麼統計英文文件常用詞？（附解釋）

# coding: utf-8
# in[32]:
#import requests
#from bs4 import beautifulsoup
#res = requests.get("")
#res.encoding = 'utf-8'
#soup = beautifulsoup(res.text,'lxml')
# in[66]:
speech_new = open("speech.txt",'r',encoding = 'utf-8').read() #當然你要有個英文文件
speech = speech_new.lower().split() #lower（） 把全部大寫變小寫， spltt（）分割字串 預設為空格
# in[70]:
dic = {}
for i in speech: 
if i not in dic: #如果字串不在dic字典裡面
dic[i] = 1 #就加上去並附上1值
else:
dic[i] = dic[i] + 1 #有了的話值就加1 
# in[68]:
import operator
list = sorted(dic.items(),key = operator.itemgetter(1), reverse=true) #dic items() , 
#key = operator.itemgetter(1)以什麼排序，我們tuple裡面有0還有1，我們輸入1
#reverse=true 大小排序
# in[94]:
from nltk.corpus import stopwords #自然語言處理
stop_words = stopwords.words('english') #取出英文停用詞
# in[103]:
for k,v in list: #把tuple裡面0給k，1給v
if k not in stop_words:
print(k,v)

但是python3自帶有個非常牛逼的東西

# in[108]:
from collections import counter #2.6以後才出現的資料結構
c = counter(speech)
# in[111]:
c.most_common(10)
# in[113]:
for sw in stop_words:
del c[sw] #刪除裡面的停用詞
# in[114]:
c.most_common(10)

非常簡單的就統計出來了

Python3求英文文件中每個單詞出現的次數並排序

本文出自天外歸雲的題目要求 1 統計英文文件中每個單詞出現的次數。2 統計結果先按次數降序排序，再按單詞首字母降序排序。3 需要考慮大檔案的讀取。我的解法如下 import chardet importre 大檔案讀取生成器 def read big file f path,chunk size ...

用python統計英文文章詞頻

import re with open text.txt as f 讀取檔案中的字串 txt f.read 去除字串中的標點數字等 txt re.sub d s txt 替換換行符，大小寫轉換，拆分成單詞列表 word list txt.replace n replace lower split ...

Python3 中文檔案讀寫

字串在python內部的表示是unicode編碼，因此，在做編碼轉換時，通常需要以unicode作為中間編碼，即先將其他編碼的字串解碼 decode 成unicode，再從unicode編碼 encode 成另一種編碼。在新版本的python3中，取消了unicode型別，代替它的是使用unicod...

python3 怎麼統計英文文件常用詞？（附解釋）

Python3求英文文件中每個單詞出現的次數並排序

用python統計英文文章詞頻

Python3 中文檔案讀寫

相關推薦