載入GloVe模型和Word2Vec模型

2021-08-26 08:38:13 字數 1755 閱讀 2513

1 google用word2vec預訓練了300維的新聞語料的詞向量googlenews-vecctors-negative300.bin,解壓後3.39個g。

可以用gensim載入進來,但是需要記憶體足夠大。

#載入google訓練的詞向量

import gensim

model = gensim.models.keyedvectors.load_word2vec_format('googlenews-vectors-negative300.bin',binary=true)

print(model['love'])

2 用glove預訓練的詞向量也可以用gensim載入進來,只是在載入之前要多做一步操作,**參考。

glove300維的詞向量有5.25個g。

# 用gensim開啟glove詞向量需要在向量的開頭增加一行:所有的單詞數 詞向量的維度

import gensim

import os

import shutil

import hashlib

from sys import platform

#計算行數,就是單詞數

def getfilelinenums(filename):

f = open(filename, 'r')

count = 0

for line in f:

count += 1

return count

#linux或者windows下開啟詞向量檔案,在開始增加一行

def prepend_line(infile, outfile, line):

with open(infile, 'r') as old:

with open(outfile, 'w') as new:

new.write(str(line) + "\n")

shutil.copyfileobj(old, new)

def prepend_slow(infile, outfile, line):

with open(infile, 'r') as fin:

with open(outfile, 'w') as fout:

fout.write(line + "\n")

for line in fin:

fout.write(line)

def load(filename):

num_lines = getfilelinenums(filename)

gensim_file = 'glove_model.txt'

gensim_first_line = "{} {}".format(num_lines, 300)

# prepends the line.

if platform == "linux" or platform == "linux2":

prepend_line(filename, gensim_file, gensim_first_line)

else:

prepend_slow(filename, gensim_file, gensim_first_line)

model = gensim.models.keyedvectors.load_word2vec_format(gensim_file)

load('glove.840b.300d.txt')

生成的glove_model.txt就是可以直接用gensim開啟的模型。

word2vec 和 glove 模型的區別

2019 09 09 15 36 13 問題描述 word2vec 和 glove 這兩個生成 word embedding 的演算法有什麼區別。問題求解 glove global vectors for word representation 與word2vec,兩個模型都可以根據詞彙的 共現 c...

詞向量之載入word2vec和glove

1 google用word2vec預訓練了300維的新聞語料的詞向量googlenews vecctors negative300.bin,解壓後3.39個g。可以用gensim載入進來,但是需要記憶體足夠大。載入google訓練的詞向量 import gensim model gensim.mod...

初次理解GloVe及其與word2vec區別

glove global vectors for word representation1 進行詞的向量化表示,使得向量之間盡可能多的蘊含語義和語法的資訊。glove是乙個基於全域性詞頻統計 count based overall statistics 的詞表徵 word representatio...