軟工作業3 詞頻統計

pycharm2018、python3.7

def process_file(dst):  # 讀檔案到緩衝區
try: # 開啟檔案
f = open(dst, 'r', encoding='gb2312') # dst為文字的目錄路徑
except ioerror as s:
print(s)
return none
try: # 讀檔案到緩衝區
bvffer = f.read()
except:
print('read file error!')
return none
f.close()
return bvffer

def process_buffer(bvffer):
if bvffer:
word_freq = {} # 下面新增處理緩衝區 bvffer**，統計每個單詞的頻率，存放在字典word_freq
for ch in '「『!;,.?」': 
bvffer = bvffer.lower().replace(ch, " ")
words = bvffer.strip().split()
# strip()
for word in words:
word_freq[word] = word_freq.get(word, 0) + 1
return word_freq

def
output_result(word_freq):
ifword_freq:
#根據v[1]即詞頻數量排序
sorted_word_freq = sorted(word_freq.items(), key=lambda v: v[1], reverse=true)
for item in sorted_word_freq[:10]: #
輸出 top 10 的單詞
print("
(%s,%d) 
" % (item[0], item[1]))

def main():
dst = 'venv/src/gone_with_the_wind.txt' # 《飄》檔案的路徑
bvffer = process_file(dst)
word_freq = process_buffer(bvffer)
output_result(word_freq)

絕對不要用tab, 也不要tab和空格混用. 對於行連線的情況, 你應該要麼垂直對齊換行的元素(見行長度部分的示例), 或者使用4空格的懸掛式縮排(這時第一行不應該有引數):例如：

def main():
dst = 'venv/src/gone_with_the_wind.txt' # 《飄》檔案的路徑
bvffer = process_file(dst)
word_freq = process_buffer(bvffer)
output_result(word_freq)

《飄》文字檔案的詞頻統計執行截圖

軟工作業詞頻統計

1 讀檔案到緩衝區 process file dst def process file dst try f open dst,r 開啟檔案 except ioerror ass print s return none try bvffer f.read 讀檔案到緩衝區 except print re...

詞頻統計作業第一次軟工作業

只是一些簡單感想。之前沒有學過c 和c 老師上來就這麼自主地布置了乙個這樣的作業確實有點出乎意料。之前團隊作業要求採訪學長，學長的印象是每週要花10h以上在軟工，均1000行這回寫第一次作業，讓沒有基礎的我體會到了學長的話。這個作業花了大約15h的時間。其中除錯占用了12h。查閱了很多資料，還是覺...

詞頻統計實現方法大致思路（軟工個人作業一）

基本功能 1.統計檔案的字元數 2.統計檔案的單詞總數 3.統計檔案的總行數 4.統計檔案中各單詞的出現次數 5.對給定資料夾及其遞迴子資料夾下的所有檔案進行統計 6.統計兩個單詞片語在一起的頻率，輸出頻率最高的前10個。7.在linux系統下，進行效能分析，過程寫到blog中附加題資料結構...

軟工作業3 詞頻統計

軟工作業 詞頻統計

詞頻統計作業 第一次軟工作業

詞頻統計實現方法大致思路（軟工個人作業一）

相關推薦

軟工作業詞頻統計

詞頻統計作業第一次軟工作業