python統計詞頻

2021-10-17 11:27:13 字數 1934 閱讀 7030

已知有鍵值對《店名,城市》的鍵值對,我們現在的需求是根據城市來統計店的分布。資料的格式如下:

我們希望輸出資料的格式如下所示

所有的資料都是以txt檔案儲存的。

# from collections import counter

# from pprint import pprint

import os

import csv

import codecs

def getnum(l1):

dic1 = {}

for i in l1:

if i in dic1.keys():

dic1[i] = dic1[i] + 1

else:

dic1[i] = 1

return dic1

def main():

# 統計城市和店名

f2 = open("pos.txt","r",encoding='utf-8')

dic = {}

# dic儲存的是[城市,【店1,店2,...】]

for line in f2:

# 所有的資料的格式是 [店名()\t城市\n]

x = line.split('\t')

# x的格式應該是 [店名()\n],[城市]

y0 = x[0].split('(')

# 因為資料裡面會存在有可能xx,xx(xx路店),所以我們得只提取店名

y1 = x[1].split('\n')

# 提取城市的姓名

if y1[0] in dic.keys():

else:

dic[y1[0]] =

ans = {}

dicnew = {}

with open('params.txt', 'w') as f:

for key, v in dic.items():

dic1 = getnum(v)

f.write(key)

f.write('\n')

for k in sorted(dic1,key=dic1.__getitem__):

f.write(k)

f.write(' ')

f.write(str(dic1[k]))

f.write('\n')

def main1():

# 不管城市,統計全國包子店的分布

f2 = open("pos.txt","r",encoding='utf-8')

dic = {}

for line in f2:

y0 = line.split('(')

y0 = y0[0]

if '\n' in y0:

y0 = y0[0:-1]

else:

y0 = y0

if y0 in dic.keys():

dic[y0] = dic[y0] + 1

else:

dic[y0] = 1

dicnew = {}

for k in sorted(dic,key=dic.__getitem__):

dicnew[k] = dic[k]

with open('params.txt', 'w') as f:

for key, value in dicnew.items():

f.write(key)

f.write(' ')

f.write(str(value))

f.write('\n')

if __name__ == '__main__':

main()

# main1()

Python 統計詞頻

calhamletv1.py def gettext txt open hamlet.txt r read txt txt.lower for ch in txt txt.replace ch,將文字中特殊字元替換為空格 return txt hamlettxt gettext words haml...

python 詞頻統計

import re 正規表示式庫 import collections 詞頻統計庫 f open text word frequency statistics.txt article f.read lower 統一轉化成小寫 f.close pattern re.compile t n articl...

python統計詞頻

1 將檔案讀入緩衝區 dst指文字檔案存放路徑,設定成形參,也可以不設,具體到函式裡設定 def process file dst 讀檔案到緩衝區 try 開啟檔案 txt open dst,r except ioerror ass print s return none try 讀檔案到緩衝區 b...