自然語言處理基礎技術工具篇之TextBlob

安裝：pip install textblob

配置國內源安裝：pip install textblob -i

參考：

from textblob import textblob

text =
'i love natural language processing! i am not like fish!'
blob = textblob(text)

1.詞性標註

blob.tags

[('i', 'prp'),
('love', 'vbp'),
('natural', 'jj'),
('language', 'nn'),
('processing', 'nn'),
('i', 'prp'),
('am', 'vbp'),
('not', 'rb'),
('like', 'in'),
('fish', 'nn')]

2.短語抽取

np = blob.noun_phrases 
for w in np:
print
(w)

natural language processing

3.計算句子情感值

for sentence in blob.sentences:
print
(sentence +
'------>'
+str
(sentence.sentiment.polarity)
)

i love natural language processing!------>0.3125
i am not like you!------>0.0

4.tokenization（把文字切割成句子或者單詞）

token = blob.words
for w in token:
print
(w)

i love natural language processingiam notlike fish

sentence = blob.sentences
for s in sentence:
print
(s)

i love natural language processing! i am not like fish!

5.詞語變形(words inflection)

token = blob.words
for w in token:
# 變複數
print
(w.pluralize())
# 變單數
print
(w.singularize(
))

we ilove love naturals natural languages language processings processingwei amsam nots notlikes like fish fish

6.詞幹化(words lemmatization)

from textblob import word
w = word(
'went'
)print
(w.lemmatize(
'v')
)w = word(
'octopi'
)print
(w.lemmatize(
))

go octopus

7.整合wordnet

from textblob.wordnet import verb
word = word(
'octopus'
)syn_word = word.synsets
for syn in syn_word:
print
(syn)

synset('octopus.n.01')
synset('octopus.n.02')

指定返回的同義詞集為動詞

syn_word1 = word(
"hack"
).get_synsets(pos=verb)
for syn in syn_word1:
print
(syn)

synset('chop.v.05')
synset('hack.v.02')
synset('hack.v.03')
synset('hack.v.04')
synset('hack.v.05')
synset('hack.v.06')
synset('hack.v.07')
synset('hack.v.08')

檢視synset(同義詞集)的具體定義

word(
"beautiful"
).definitions

['delighting the senses or exciting intellectual or emotional admiration',
'(of weather) highly enjoyable']

8.拼寫糾正(spelling correction)

sen =
'i lvoe naturl language processing!'
sen = textblob(sen)
print
(sen.correct(
))

i love nature language processing!

word.spellcheck()返回拼寫建議以及置信度

w1 = word(
'good'
)w2 = word(
'god'
)w3 = word(
'gd'
)print
(w1.spellcheck())
print
(w2.spellcheck())
print
(w3.spellcheck(
))

[('good', 1.0)]
[('god', 1.0)]
[('go', 0.586139896373057), ('god', 0.23510362694300518), ('d', 0.11658031088082901), ('g', 0.03626943005181347), ('ed', 0.009067357512953367), ('rd', 0.006476683937823834), ('nd', 0.0038860103626943004), ('gr', 0.0025906735751295338), ('sd', 0.0006476683937823834), ('md', 0.0006476683937823834), ('id', 0.0006476683937823834), ('gdp', 0.0006476683937823834), ('ga', 0.0006476683937823834), ('ad', 0.0006476683937823834)]

9.句法分析(parsing)

text = textblob(
'i lvoe naturl language processing!'
)print
(text.parse(
))

i/prp/b-np/o lvoe/nn/i-np/o naturl/nn/i-np/o language/nn/i-np/o processing/nn/i-np/o !/./o/o

10.n-grams

text = textblob(
'i lvoe naturl language processing!'
)print
(text.ngrams(n=2)
)

[wordlist(['i', 'lvoe']), wordlist(['lvoe', 'naturl']), wordlist(['naturl', 'language']), wordlist(['language', 'processing'])]

歡迎關注【ai小白入門】，這裡分享python、機器學習、深度學習、自然語言處理、人工智慧等技術，關注前沿技術，求職經驗等，陪有夢想的你一起成長。

自然語言處理基礎技術工具篇之Jieba

沒想到堅持學習以及寫作總結已經超過半個月了，謝謝大家的關注點讚收藏前面談了nlp的基礎技術，我始終覺得，入門學習一件事情最好的方式就是實踐，加之現在python如此好用，有越來越多的不錯nlp的python庫，所以接下來的一段時間裡，讓我們一起來感受一下這些不錯的工具。我均使用jupyter編...

自然語言處理基礎技術工具篇之spaCy

安裝 pip install spacy 國內源安裝 pip install spacy i import spacy nlp spacy.load en doc nlp u this is a sentence.1.tokenize功能for token in doc print token th...

自然語言處理基礎技術工具篇之Flair

flair簡介 flair是最近開源的乙個基於pytorch的nlp框架，據官方github介紹，它具有以下特點乙個功能強大的nlp庫。flair允許您將最先進的自然語言處理 nlp 模型應用於您的文字，例如命名實體識別 ner 詞性標註 pos 意義消歧和分類。文字嵌入庫。flair具有簡單的介...

自然語言處理基礎技術工具篇之TextBlob

自然語言處理基礎技術工具篇之Jieba

自然語言處理基礎技術工具篇之spaCy

自然語言處理基礎技術工具篇之Flair

相關推薦