Python踩坑指南（第二季）

本期圍繞jieba講乙個我遇到的實際問題，在同乙個服務裡，存在兩個不同介面a和b，都用到了jieba分詞，區別在於兩者需要呼叫不同的詞庫，巧合中，存在以下情況：

詞庫a："幹拌麵"

詞庫b："乾拌","面"

在服務啟動的時候，由於詞庫a優先被載入了，再去載入詞庫b的時候發現，並沒有載入成功：

介面a中：

jieba.load_userdict("a.txt")

介面b中：

jieba.load_userdict("b.txt")

結果發現，在切幹拌麵這個詞的時候，介面b中還是沒有切成功。其實每次在我們載入jieba的時候，可以注意一下會出現以下info：

building prefix dict from the default dictionary ... dumping model to file cache /var/folders/hv/kfb7n4lj06590hqxjv6f3dd00000gn/t/jieba.cache loading model cost 0.824 seconds.

prefix dict has been built succesfully.

顯而易見，先進行了building prefix dict，再dumping model to file cache，後續loading model都會來自這，所以這個地方導致以上問題。

我是這麼處理的：

介面a中：

jieba1 = jieba.tokenizer(dictionary="a.txt")

介面b中：

jieba2 = jieba.tokenizer(dictionary="b.txt")

案例如下：

in [1]: import jieba
in [2]: jieba1=jieba.tokenizer(dictionary="a.txt")
in [3]: jieba2=jieba.tokenizer(dictionary="b.txt")
in [4]: jieba1.lcut("幹拌麵")
building prefix dict from /users/slade/desktop/a.txt ...
dumping model to file cache /var/folders/hv/kfb7n4lj06590hqxjv6f3dd00000gn/t/jieba.u5221c1b70f06b36e44bc519f39715c96.cache
loading model cost 0.006 seconds.
prefix dict has been built succesfully.
out[4]: ['幹拌麵']
in [5]: jieba2.lcut("幹拌麵")
building prefix dict from /users/slade/desktop/b.txt ...
dumping model to file cache /var/folders/hv/kfb7n4lj06590hqxjv6f3dd00000gn/t/jieba.uc4f38d90bf7ce748744ff94fb2863fe4.cache
loading model cost 0.003 seconds.
prefix dict has been built succesfully.
out[5]: ['乾拌', '面']

需要注意的是，去看tokenizer原始碼，裡面有這麼一段讀取呼叫：

def gen_pfdict(self, f):
lfreq = {}
ltotal = 0
f_name = resolve_filename(f)
for lineno, line in enumerate(f, 1):
try:
line = line.strip().decode('utf-8')
word, freq = line.split(' ')[:2]
freq = int(freq)
lfreq[word] = freq
ltotal += freq
for ch in xrange(len(word)):
wfrag = word[:ch + 1]
if wfrag not in lfreq:
lfreq[wfrag] = 0
except valueerror:
raise valueerror(
'invalid dictionary entry in %s at line %s: %s' % (f_name, lineno, line))
f.close()
return lfreq, ltotal

在load_userdict的時候詞庫的詞頻可以省略不寫，word, freq = line.split(' ')[:2]決定了這邊需要加上，這個依賴於版本，我並沒有實驗不同版本。

a.txt:

幹拌麵 1

b.txt:

乾拌 1

面 1

遷移填坑第二季

之前說到，配置了遷移環境碰到了各種坑，然後終於解決掉了，終於能夠nova live migration kobe compute5了。然後就開始批量生產遷移環境，然後。之前是只用了compute3和compute5，然後把compute6和compute7也配置好nfs和libvirt，然後嘗試把k...

Java 基礎（第二季）

public class helloworld public class helloworld int num1 int num2 初始化塊 static public static void main string args 結果如下通過靜態初始化塊為靜態變數num3賦值通過初始化塊為變數nu...

X A B （第二季水）

description give you two numbers a and b,if a is equal to b,you should print yes or print no input each test case contains two numbers a and b.output ...

Python踩坑指南（第二季）

遷移填坑第二季

Java 基礎（第二季）

X A B （第二季水）

相關推薦