自然語言處理基礎技術工具篇之TextBlob

2021-09-07 10:20:34 字數 4801 閱讀 3234

安裝:pip install textblob

配置國內源安裝:pip install textblob -i

參考:

from textblob import textblob
text =

'i love natural language processing! i am not like fish!'

blob = textblob(text)

1.詞性標註
blob.tags
[('i', 'prp'),

('love', 'vbp'),

('natural', 'jj'),

('language', 'nn'),

('processing', 'nn'),

('i', 'prp'),

('am', 'vbp'),

('not', 'rb'),

('like', 'in'),

('fish', 'nn')]

2.短語抽取
np = blob.noun_phrases 

for w in np:

print

(w)

natural language processing
3.計算句子情感值
for sentence in blob.sentences:

print

(sentence +

'------>'

+str

(sentence.sentiment.polarity)

)

i love natural language processing!------>0.3125

i am not like you!------>0.0

4.tokenization(把文字切割成句子或者單詞)
token = blob.words

for w in token:

print

(w)

i

love

natural

language

processingiam

notlike

fish

sentence = blob.sentences

for s in sentence:

print

(s)

i love natural language processing!

i am not like fish!

5.詞語變形(words inflection)
token = blob.words

for w in token:

# 變複數

print

(w.pluralize())

# 變單數

print

(w.singularize(

))

we

ilove

love

naturals

natural

languages

language

processings

processingwei

amsam

nots

notlikes

like

fish

fish

6.詞幹化(words lemmatization)
from textblob import word

w = word(

'went'

)print

(w.lemmatize(

'v')

)w = word(

'octopi'

)print

(w.lemmatize(

))

go

octopus

7.整合wordnet
from textblob.wordnet import verb

word = word(

'octopus'

)syn_word = word.synsets

for syn in syn_word:

print

(syn)

synset('octopus.n.01')

synset('octopus.n.02')

指定返回的同義詞集為動詞

syn_word1 = word(

"hack"

).get_synsets(pos=verb)

for syn in syn_word1:

print

(syn)

synset('chop.v.05')

synset('hack.v.02')

synset('hack.v.03')

synset('hack.v.04')

synset('hack.v.05')

synset('hack.v.06')

synset('hack.v.07')

synset('hack.v.08')

檢視synset(同義詞集)的具體定義

word(

"beautiful"

).definitions

['delighting the senses or exciting intellectual or emotional admiration',

'(of weather) highly enjoyable']

8.拼寫糾正(spelling correction)
sen =

'i lvoe naturl language processing!'

sen = textblob(sen)

print

(sen.correct(

))

i love nature language processing!
word.spellcheck()返回拼寫建議以及置信度

w1 = word(

'good'

)w2 = word(

'god'

)w3 = word(

'gd'

)print

(w1.spellcheck())

print

(w2.spellcheck())

print

(w3.spellcheck(

))

[('good', 1.0)]

[('god', 1.0)]

[('go', 0.586139896373057), ('god', 0.23510362694300518), ('d', 0.11658031088082901), ('g', 0.03626943005181347), ('ed', 0.009067357512953367), ('rd', 0.006476683937823834), ('nd', 0.0038860103626943004), ('gr', 0.0025906735751295338), ('sd', 0.0006476683937823834), ('md', 0.0006476683937823834), ('id', 0.0006476683937823834), ('gdp', 0.0006476683937823834), ('ga', 0.0006476683937823834), ('ad', 0.0006476683937823834)]

9.句法分析(parsing)
text = textblob(

'i lvoe naturl language processing!'

)print

(text.parse(

))

i/prp/b-np/o lvoe/nn/i-np/o naturl/nn/i-np/o language/nn/i-np/o processing/nn/i-np/o !/./o/o
10.n-grams
text = textblob(

'i lvoe naturl language processing!'

)print

(text.ngrams(n=2)

)

[wordlist(['i', 'lvoe']), wordlist(['lvoe', 'naturl']), wordlist(['naturl', 'language']), wordlist(['language', 'processing'])]
歡迎關注【ai小白入門】,這裡分享python、機器學習、深度學習、自然語言處理、人工智慧等技術,關注前沿技術,求職經驗等,陪有夢想的你一起成長。

自然語言處理基礎技術工具篇之Jieba

沒想到堅持學習以及寫作總結已經超過半個月了,謝謝大家的關注 點讚 收藏 前面談了nlp的基礎技術,我始終覺得,入門學習一件事情最好的方式就是實踐,加之現在python如此好用,有越來越多的不錯nlp的python庫,所以接下來的一段時間裡,讓我們一起來感受一下這些不錯的工具。我均使用jupyter編...

自然語言處理基礎技術工具篇之spaCy

安裝 pip install spacy 國內源安裝 pip install spacy i import spacy nlp spacy.load en doc nlp u this is a sentence.1.tokenize功能for token in doc print token th...

自然語言處理基礎技術工具篇之Flair

flair簡介 flair是最近開源的乙個基於pytorch的nlp框架,據官方github介紹,它具有以下特點 乙個功能強大的nlp庫。flair允許您將最先進的自然語言處理 nlp 模型應用於您的文字,例如命名實體識別 ner 詞性標註 pos 意義消歧和分類。文字嵌入庫。flair具有簡單的介...