Python 統計詞頻

# calhamletv1.py
def gettext():
txt = open("hamlet.txt", "r").read()
txt = txt.lower()
for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_『~':
txt = txt.replace(ch, " ") # 將文字中特殊字元替換為空格
return txt
hamlettxt = gettext()
words = hamlettxt.split()
counts = {}
for word in words:
counts[word] = counts.get(word, 0) + 1
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=true)
for i in range(10):
word, count = items[i]
print("".format(word, count))

items.sorted( key=lambda x: x[1]) 中 items為待排序的物件；key=lambda x: x[1] 為對前面的物件中的第二維資料（即value）的值進行排序。 key=lambda 變數：變數[維數] 。維數可以按照自己的需要進行設定。

還可以直接寫成：sorted(counts.items(), key=lambda x: x[1], reverse=true)

表示左對齊，：右對齊,0表示第乙個引數，1表示第二個引數

參考:format用法詳解

python 詞頻統計

import re 正規表示式庫 import collections 詞頻統計庫 f open text word frequency statistics.txt article f.read lower 統一轉化成小寫 f.close pattern re.compile t n articl...

python統計詞頻

已知有鍵值對店名，城市的鍵值對，我們現在的需求是根據城市來統計店的分布。資料的格式如下我們希望輸出資料的格式如下所示所有的資料都是以txt檔案儲存的。from collections import counter from pprint import pprint import os imp...

python統計詞頻

1 將檔案讀入緩衝區 dst指文字檔案存放路徑，設定成形參，也可以不設，具體到函式裡設定 def process file dst 讀檔案到緩衝區 try 開啟檔案 txt open dst,r except ioerror ass print s return none try 讀檔案到緩衝區 b...

Python 統計詞頻

python 詞頻統計

python統計詞頻

python統計詞頻

相關推薦