NLTK使用彙總

使用pip安裝nltk，**如下所示(需要注意的是這只是第一步)：

pip install nltk

得到nltk的儲存目錄，**和截圖如下所示：

import nltk
print
(nltk.data.path)

unzip nltk_data.zip

例如在執行下列**時出現錯誤：

)但可能會出現遠端主機強迫關閉了乙個現有的連線的錯誤，此時我們就需要使用其他辦法。

from nltk import sent_tokenize
sents = sent_tokenize(
'zhangsan is a boy. and lisi is a girl'
)print
(sents)

需要注意的是，只能對句號後有空格的句子進行分割。

from nltk import word_tokenize
tokenized_word = word_tokenize(
'i love a good boy'
)print
(tokenized_word)

可以分句之後再進行分詞。

from nltk.corpus import stopwords
stop_words =
set(stopwords.words(
"english"
))

詞形還原與詞幹提取類似，但不同之處在於詞幹提取經常可能創造出不存在的詞彙，詞形還原的結果是乙個真正的詞彙。所以我們這裡只介紹詞形還原。但是詞性還原又取決於詞性，所以我們需要借助詞性標註得到的結果。

import nltk
text = nltk.word_tokenize('what does the fox say')
print(text)
print(nltk.pos_tag(text))
結果為：
['what', 'does', 'the', 'fox', 'say']
輸出是元組列表，元組中的第乙個元素是單詞，第二個元素是詞性標籤
[('what', 'wdt'), ('does', 'vbz'), ('the', 'dt'), ('fox', 'nns'), ('say', 'vbp')]

標記（tag）

含義（meaning）

例子（examples）

adj形容詞（adjective）

new，good，high，special，big

adv副詞（adverb）

really,，already，still，early，now

cnj連詞（conjunction）

and，or，but，if，while

det限定詞（determiner）

the，a，some，most，every

ex存在量詞（existential）

there，there』s

fw外來詞（foreign word）

dolce，ersatz，esprit，quo，maitre

mod情態動詞（modal verb）

will，can，would，may，must

n名詞（noun）

year，home，costs，time

np專有名詞（proper noun）

alison，africa，april，washington

num數詞（number）

twenty-four，fourth，1991，14:24

pro代詞（pronoun）

he，their，her，its，my，i，us

p介詞（preposition）

on，of，at，with，by，into，under

to詞 to（the word to）

touh

感嘆詞（interjection）

ah，bang，ha，whee，hmpf，oops

v動詞（verb）

is，has，get，do，make，see，run

vd過去式（past tense）

said，took，told，made，asked

vg現在分詞（present participle）

******，going，playing，working

vn過去分詞（past participle）

given，taken，begun，sung

whwh限定詞（wh determiner）

who，which，when，what，where

也可以使用nltk.help.upenn_tagset()進行檢視。上述**有錯誤！！！)

from nltk.stem import wordnetlemmatizer
lemmatizer = wordnetlemmatizer()
print(lemmatizer.lemmatize('playing', pos="v"))
print(lemmatizer.lemmatize('playing', pos="n"))
print(lemmatizer.lemmatize('playing', pos="a"))
print(lemmatizer.lemmatize('playing', pos="r"))
'''結果為：
play
playing
playing
playing
'''

由於word2vec本質上是對每個句子求詞向量，所以我們需要對文章劃分成句子。

from nltk.tokenize import sent_tokenize
text="""hello mr. smith, how are you doing today? the weather is great, and city is awesome.
the sky is pinkish-blue. you shouldn't eat cardboard"""
tokenized_text = sent_tokenize(text)
print(tokenized_text)

指標使用彙總

include int main include include void f int p 3 4 原始型別 void f int p 4 省略第一維長度 void f int p 4 指標指向乙個陣列，陣列中的每個元素是int型別 void f int p error 不能將第二維退化為指標 v...

Tomcat使用彙總

使用tomcat部署web應用一定要重啟！重啟！重啟。搞了三天的web程式部署，倒在了tomcat上，記錄這幾天使用的心得，防止自己下次再犯。tomcat伺服器檔案的結構 server.xml被稱為tomcat的主配置檔案或者全域性配置檔案，他主要完成 1.提供tomcat元件的初始配置 2.說明t...

adb shell dumpsys 使用彙總

一 activity和broadcast行為不正常時，如何抓取log？1 請使用mtklog抓取，保證有main log和event log產生 2 對於activity行為不正常，請開啟activity的log開關再抓取 adb shell dumpsys activity log a on 這個...

NLTK使用彙總

指標使用彙總

Tomcat使用彙總

adb shell dumpsys 使用彙總

相關推薦