英文文章的詞頻統計

今天去面試，被問到如何實現詞頻統計，因為之前都是直接呼叫value_counts()函式統計，在被要求不用該函式實現統計，一緊張就卡殼了，回到家大概自己想了一下，怎麼一步步復現。

實現的方法有多種，我才用的辦法是先把檔案處理成string型別，然後string處理函式

#讀入檔案並處理成文字
defread_file
(text_file)
: string_for_count=
with
open
(text_file)
as wf:
for line in wf:
return
''.join(string_for_count)

隨後定義文字中詞語的詞頻統計

#統計詞頻
defword_counts
(string)
: string_list = string.replace(
'\n',''
).lower(
).split(
' ')
count_dic =
for item in string_list:
if item in count_dic.keys():
count_dic[item]+=1
else
: count_dic[item]=1
if' '
in count_dic:
count_dic.pop('')
#刪除空格
count_list =
sorted
(count_dic.iteritems(
),key=
lambda x: x[1]
,reverse=
true
)return count_list

寫的倉促，僅僅紀念慘淡的一次面試，以及渣渣的學習之路

在複習《機器學習實戰》的時候，對於第二部分統計詞頻，獲取到了使**更簡潔的表達方式，如下：

def
word_counts
(string)
: string_list = string.replace(
'\n',''
).lower(
).split(
' ')
count_list =
for item in string_list:
count_list[item]
= count_list.get(item,0)
+1if' '
in count_list:
count_list.pop('')
#刪除空格
count_list =
sorted
(count_list.items(
),key=
lambda x: x[1]
,reverse=
true
)return count_list

用python統計英文文章詞頻

import re with open text.txt as f 讀取檔案中的字串 txt f.read 去除字串中的標點數字等 txt re.sub d s txt 替換換行符，大小寫轉換，拆分成單詞列表 word list txt.replace n replace lower split ...

經典演算法英文文章統計字元頻率

比如說我存了一些字元在txt檔案裡面，比如說 a.txt 裡面存了 abc edf ccdef 字元這樣這個程式能夠實現統計檔案裡面有多少字元，多少種字元，每個字元出現的頻率上個例子應該是11個字元 6種字元，a 1個 b 1個 c 3個 d 2個 e 2個 f 2個 1 讀入檔案存進char ...

英文詞頻統計

詞頻統計預處理將所有,等分隔符全部替換為空格將所有大寫轉換為小寫生成單詞列表生成詞頻統計排序排除語法型詞彙，代詞冠詞連詞輸出詞頻最大top10 word lately,i ve been,i ve been losing sleep dreaming about the things...

英文文章的詞頻統計

用python統計英文文章詞頻

經典演算法 英文文章統計字元頻率

英文詞頻統計

相關推薦

經典演算法英文文章統計字元頻率