NLTK學習筆記

2021-08-13 21:50:42 字數 3468 閱讀 4754

學習參考書:

nltk.set_proxy("**.com:80")

nltk.download()

2. 使用sents(fileid)函式時候出現:resource 'tokenizers/punkt/english.pickle' not found.  please use the nltk ********** to obtain the resource:

import nltk

nltk.download()

3. 語料corpus元素獲取函式

from nltk.corpus import webtext

webtext.fileids()      #得到語料中所有檔案的id集合

webtext.raw(fileid)  #給定檔案的所有字元集合

webtext.words(fileid) #所有單詞集合

webtext.sents(fileid)  #所有句子集合

example

description

fileids()

the files of the corpus

fileids([categories])

the files of the corpus corresponding to these categories

categories()

the categories of the corpus

categories([fileids])

the categories of the corpus corresponding to these files

raw()

the raw content of the corpus

raw(fileids=[f1,f2,f3])

the raw content of the specified files

raw(categories=[c1,c2])

the raw content of the specified categories

words()

the words of the whole corpus

words(fileids=[f1,f2,f3])

the words of the specified fileids

words(categories=[c1,c2])

the words of the specified categories

sents()

the sentences of the whole corpus

sents(fileids=[f1,f2,f3])

the sentences of the specified fileids

sents(categories=[c1,c2])

the sentences of the specified categories

abspath(fileid)

the location of the given file on disk

encoding(fileid)

the encoding of the file (if known)

open(fileid)

open a stream for reading the given corpus file

root()

the path to the root of locally installed corpus

readme()

the contents of the readme file of the corpus

4.文字處理的一些常用函式

假若text是單詞集合的列表

len(text)  #單詞個數

set(text)  #去重

sorted(text) #排序

text.count('a') #數給定的單詞的個數

text.index('a') #給定單詞首次出現的位置

freqdist(text) #單詞及頻率,keys()為單詞,*[key]得到值

freqdist(text).plot(50,cumulative=true) #畫累積圖

bigrams(text) #所有的相鄰二元組

text.collocations() #找文字中頻繁相鄰二元組

text.concordance("word") #找給定單詞出現的位置及上下文

text.similar("word") #找和給定單詞語境相似的所有單詞

text.common_context("a「,"b") #找兩個單詞相似的上下文語境

text.dispersion_plot(['a','b','c',...]) #單詞在文字中的位置分布比較圖

text.generate() #隨機產生一段文字

nltk's conditional frequency distributions: commonly-used methods and idioms for defining,accessing, and visualizing a conditional frequency distribution.of counters.

example

description

cfdist = conditionalfreqdist(pairs)

create a conditional frequency distribution from a list of pairs

cfdist.conditions()

alphabetically sorted list of conditions

cfdist[condition]

the frequency distribution for this condition

cfdist[condition][sample]

frequency for the given sample for this condition

cfdist.tabulate()

tabulate the conditional frequency distribution

cfdist.tabulate(samples, conditions)

tabulation limited to the specified samples and conditions

cfdist.plot()

graphical plot of the conditional frequency distribution

cfdist.plot(samples, conditions)

graphical plot limited to the specified samples and conditions

cfdist1 < cfdist2

test if samples in cfdist1

occur less frequently than incfdist2

to be continued

NLTK學習筆記

學習參考書 nltk.set proxy com 80 nltk.download 2.使用sents fileid 函式時候出現 resource tokenizers punkt english.pickle not found.please use the nltk to obtain the...

NLTK學習筆記

學習參考書 nltk.set proxy com 80 nltk.download 2.使用sents fileid 函式時候出現 resource tokenizers punkt english.pickle not found.please use the nltk to obtain the...

NLTK 學習筆記(2)

pos速查表 標記含義 例子adj 形容詞new,good,high,special,big,local adv副詞 really,already,still,early,now cnj連詞 and,or,but,if,while,although det限定詞 the,a,some,most,ev...