python三國演義人物出場統計

2021-08-31 13:46:30 字數 1859 閱讀 5137

開源**

#!/usr/bin/env python

# coding=utf-8

#e10.4calthreekingdoms.py

import jieba

excludes =

txt = open("threekingdom.txt", "rb").read()

words = jieba.lcut(txt)

counts = {}

for word in words:

if len(word) == 1:

continue

elif word == "諸葛亮" or word == "孔明曰":

rword = "孔明"

elif word == "關公" or word == "雲長":

rword = "關羽"

elif word == "玄德" or word == "玄德曰":

rword = "劉備"

elif word == "孟德" or word == "丞相":

rword = "曹操"

else:

rword = word

counts[rword] = counts.get(rword,0) + 1

for word in excludes:

del(counts[word])

items = list(counts.items())

items.sort(key=lambda x:x[1], reverse=true)

for i in range(55):

word, count = items[i]

print ("".format(word, count))

threekingdom.txt  kingdom.py

kou@ubuntu:~/python/file_文字處理$ python3 kingdom.py

building prefix dict from the default dictionary ...

dumping model to file cache /tmp/jieba.cache

loading model cost 2.446 seconds.

prefix dict has been built succesfully.

曹操 1348

劉備 1144

孔明 865

關羽 557

呂布 322

張飛 300

三國演義人物詞頻統計 1

沒有把長度為1的單詞進行篩選 path c users desktop 三國演義.txt text open path,r encoding utf 8 read 使用結巴的函式對文字進行分詞 words jieba.lcut text 定義字典型別去儲存文字和文字出現的次數 counts for ...

三國演義人物詞頻統計 2

對長度為1的單詞進行篩選 import jieba path c users desktop 三國演義.txt text open path,r encoding utf 8 read 使用結巴的函式對文字進行分詞 words jieba.lcut text 定義字典型別去儲存文字和文字出現的次數 ...

Python爬蟲三國演義

定位目標 在這裡插入 片import requests from bs4 import beautifulsoup f open sanguo.txt w encoding utf 8 檔案儲存在當前資料夾中 headers url page text requests.get url url,he...