用python統計文字裡的單詞出現次數最多的10個

python入門教程至此已學習完畢，下面是結業指令碼：（一部分是書裡的原始碼，一部分是自己加的練習題）

#寫乙個文字統計的指令碼：計算並列印有關文字檔案的統計資料，包括檔案裡包含多少個字元、行、單詞數，以及前10個出現次數最多的單詞按順序排列
import time
keep=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',' ','-',"'"]
stop_words=['the','and','i','to','of','a','you','my','that','in','she','he','her','his','it','be','was','had']
def normalize(s):
result=''
for c in s.lower():
if c in keep:
result+=c
return result
def make_dict(s):
words=normalize(s).split()
d={}
for w in words:
if w in d:
d[w]+=1
else:
d[w]=1
return d
def file_status(f):
c=open(f).read()
''' 採用每次讀取一行的方式
fopen=open(f)
c=''
for line in fopen:
c+=line
'''print(f,'status:')
print('長度：',len(c))
print('行數：',c.count('\n'))
print('單詞數：',len(normalize(c).split()))
d=make_dict(c)
print('單詞數：',sum(d[w] for w in d))
print('不同單詞數：',len([w for w in d]))
print('單詞平均長度：',sum(len(w) for w in d)/sum(d[w] for w in d))
print('只出現過一次的單詞總數：',len([d[w] for w in d if d[w]==1]))
lst=[(d[w],w) for w in d]
lst.sort()
lst.reverse()
print('前10名出現次數最多的單詞和次數是：')
i=1for count,word in lst[:10]:
print('%d.%4d %s'%(i,count,word))
i+=1
print('前10名出現次數最多的單詞和次數是(去掉功能詞後)：')
j=1for count,word in lst[:]:
if word not in stop_words:
print('%d.%4d %s'%(j,count,word))
j+=1
if j==11:
break
start_time=time.time()
file_status('d:\code\python\pg1342.txt')
end_time=time.time()

給自己贊乙個 ^^

附：教程《python程式設計入門（第3版）》【加】toby

統計文字行單詞數

1 include2 include3 using namespace std 4 5 bool rowcount int chars,int words 識別並規範只有乙個狹義字元不包括分隔符的文字行13 14 for int i 0 i15 if str i a str i z str i ...

用python統計多個文字中你想統計的單詞

import collections 計數器 import os import string path users u workspace python learning show me the code 0006 diary diary dir dir os.listdir path 讀取目錄 s...

C 統計文字單詞個數2

using system using system.collections using system.linq using system.text using system.io using system.text.regularexpressions class getfiles else 不是第...

用python統計文字裡的單詞出現次數最多的10個

統計文字行單詞數

用python統計多個文字中你想統計的單詞

C 統計文字單詞個數2

相關推薦