詞數統計及其重要程度統計

import jieba
from sklearn.feature_extraction.text import countvectorizer #統計詞數，英文
'''# 構建文章【英文】
content = ['this is the first document.', 'this is the second second document.', 'and the third one.', 'is this the first document? i x y']
#構建例項
con_vet = countvectorizer()
#進行提取詞語
x = con_vet.fit_transform(content)
print(x) # (0, 1) 1 (文章下標,分詞下標) 詞在文章**現的次數 sparse矩陣
print(x.toarray()) # 將 sparse矩陣 轉化為 陣列
# 獲取提取到詞語
names = con_vet.get_feature_names()
print(names) # 提取到的詞
'''# 構建文章【中文】
content =
["今天陽光真好"
,"我要去看北京天安門"
,"逛完天安門之後我要去王府井"
,"吃烤蠍子與烤蜈蚣"
,"晚上去後海游個泳"
]content_list =
for tmp in content:
# 使用精確模式進行分詞 cut_all預設為精確模式
res = jieba.cut(tmp,cut_all=
false
) res_str =
','.join(res)
#構建例項
con_vet = countvectorizer(stop_words=
['我要'
,'之後'])
#進行提取詞語
x = con_vet.fit_transform(content_list)
print
(x)# (0, 1) 1 (文章下標,分詞下標) 詞在文章**現的次數 sparse矩陣
print
(x.toarray())
# 將 sparse矩陣 轉化為 陣列
# 獲取提取到詞語
names = con_vet.get_feature_names(
)print
(names)
# 提取到的詞

from sklearn.feature_extraction.text import tfidfvectorizer
# 構建文章【英文】
content =
['this is the first document.'
,'this is the second second document.'
,'and the third one.'
,'is this the first document? i x y'
]#構建例項
# min_df = 1 # 設定分詞的時候，詞必須至少出現一次
# stop_words ===停用詞 不重要的詞去掉
tf_vet = tfidfvectorizer(stop_words=
['is'
,'and'])
#進行提取詞語
x =tf_vet.fit_transform(content)
print
(x)#(0, 1) 1 (文章下標,分詞下標) 詞在文章**現的重要程度 sparse矩陣
print
(x.toarray())
#將 sparse矩陣 轉化為 陣列
# 獲取提取到詞語
names = tf_vet.get_feature_names(
)print
(names)
# 提取到的詞

統計單詞數

include include include include include includeusing namespace std struct node int main else if s.find 1 transform s.begin s.end s.begin toupper sourc...

單詞數目統計

時間限制 10 sec 記憶體限制 128 mb neo 給你一系列字串，請你輸出字串中的不同單詞個數以及總單詞個數。多組輸入，每組資料都是一行字串長度小於200 其中每個單詞以空格隔開單詞都是小寫字母組成輸出字串中的不同單詞個數以及總單詞個數。i love china aa aa bb 3 ...

2019 3 20統計單詞數

題目描述修羅王和邪狼潛入銀行盜走了大量的珠寶，警察經過仔細查詢和推理，終於找到了裝有這批珠寶的保險櫃，但無法開啟保險櫃。經過觀察發現保險櫃背面有一行字元，且發現只要統計出這行字串中有多少個單詞就是開啟保險櫃的密碼。單詞之間由乙個或多個空格分開，且字串不以空格開頭。輸入描述輸入一行包含若干空格的字...

詞數統計及其重要程度統計

統計單詞數

單詞數目統計

2019 3 20統計單詞數

相關推薦