統計文章內各個單詞出現的次數

演算法的思路是：

從頭到尾遍歷檔案，從檔案中讀取遍歷到的每乙個單詞。

把遍歷到的單詞放到hash_map中，並統計這個單詞出現的次數。

遍歷hash_map，將遍歷到的單詞的出現次數放到優先順序佇列中。

當優先順序佇列的元素個數超過k個時就把元素級別最低的那個元素從佇列中取出，這樣始終保持佇列的元素是k個。

遍歷完hash_map，則佇列中就剩下了出現次數最多的那k個元素。

具體實現和結果如下：

//
#include "
stdafx.h
"#include 
#include 
#include 
#include 
#include 
#include 
#include 
using
namespace
std;
using
namespace
boost;
void top_k_words()//
出現次數最多的是個單詞
string
s; hash_map
countwords;
while (true
) 
}cout
<<"
單詞總數 （重複的不計數）:
"priority_queue
int,string>,vectorint,string>>,greaterint,string>>>countmax;
for(hash_map::const_iterator i=countwords.begin();
i!=countwords.end();i++)
}while(!countmax.empty())
cout
<<"
time elapsed 
"}int main(int argc, char*ar**)

linux下不能使用hash_map，改為map來統計單詞的個數：

//
#include #include 
#include 
#include 
#include 
#include 
using
namespace
std;
void top_k_words()//
出現次數最多的是個單詞 
string
s; map
countwords;
while (true
) 
}cout
<<"
單詞總數 （重複的不計數）:
"priority_queue
int,string>,vectorint,string>>,greaterint,string>>>countmax;
for(map::const_iterator i=countwords.begin(); i!=countwords.end();i++)
}while(!countmax.empty())
}int main(int argc, char*ar**)

統計文章單詞出現次數

英文文章中的標點符號的處理，單詞大小寫的處理，再將單詞通過字典的統計出現次數，最後用sorted 排序利用maketrans函式將標點符號對映為空格 table str.maketrans 開啟需要統計的檔案 f open r c python 1.txt file1 f.read f.close...

統計文章中單詞出現的次數（續）

符號問題的處理 void filtrate word string word 處理字串中的標點符號順便把單詞中的大小寫也統一一下，很簡單 void strip cap string word 將單詞中的大寫字母轉化成小寫字母兩處處理都用到string類的函式find first of 有幾個過載...

python統計文章單詞次數

題目是這樣的你有乙個目錄，放了你乙個月的日記，都是 txt，為了避免分詞的問題，假設內容都是英文，請統計出你認為每篇日記最重要的詞。其實就是統計一篇文章出現最多的單詞，但是要去除那些常見的連詞介詞和謂語動詞等，coding utf 8 import collections import re i...

統計文章內各個單詞出現的次數

統計文章單詞出現次數

統計文章中單詞出現的次數（續）

python統計文章單詞次數

相關推薦