models.word2vec_inner
– Cython routines for training Word2Vec models¶
Optimized cython functions for training Word2Vec
model.
-
gensim.models.word2vec_inner.
init
()¶ - Precompute function sigmoid(x) = 1 / (1 + exp(-x)), for x values discretized into table EXP_TABLE.
Also calculate log(sigmoid(x)) into LOG_TABLE.
- Returns
Enumeration to signify underlying data type returned by the BLAS dot product calculation. 0 signifies double, 1 signifies double, and 2 signifies that custom cython loops were used instead of BLAS.
- Return type
{0, 1, 2}
-
gensim.models.word2vec_inner.
score_sentence_cbow
(model, sentence, _work, _neu1)¶ Obtain likelihood score for a single sentence in a fitted CBOW representation.
Notes
This scoring function is only implemented for hierarchical softmax (model.hs == 1). The model should have been trained using the skip-gram model (model.cbow == 1`).
- Parameters
model (
Word2Vec
) – The trained model. It MUST have been trained using hierarchical softmax and the CBOW algorithm.sentence (list of str) – The words comprising the sentence to be scored.
_work (np.ndarray) – Private working memory for each worker.
_neu1 (np.ndarray) – Private working memory for each worker.
- Returns
The probability assigned to this sentence by the Skip-Gram model.
- Return type
float
-
gensim.models.word2vec_inner.
score_sentence_sg
(model, sentence, _work)¶ Obtain likelihood score for a single sentence in a fitted skip-gram representation.
Notes
This scoring function is only implemented for hierarchical softmax (model.hs == 1). The model should have been trained using the skip-gram model (model.sg == 1`).
- Parameters
model (
Word2Vec
) – The trained model. It MUST have been trained using hierarchical softmax and the skip-gram algorithm.sentence (list of str) – The words comprising the sentence to be scored.
_work (np.ndarray) – Private working memory for each worker.
- Returns
The probability assigned to this sentence by the Skip-Gram model.
- Return type
float
-
gensim.models.word2vec_inner.
train_batch_cbow
(model, sentences, alpha, _work, _neu1, compute_loss)¶ Update CBOW model by training on a batch of sentences.
Called internally from
train()
.- Parameters
model (
Word2Vec
) – The Word2Vec model instance to train.sentences (iterable of list of str) – The corpus used to train the model.
alpha (float) – The learning rate.
_work (np.ndarray) – Private working memory for each worker.
_neu1 (np.ndarray) – Private working memory for each worker.
compute_loss (bool) – Whether or not the training loss should be computed in this batch.
- Returns
Number of words in the vocabulary actually used for training (They already existed in the vocabulary and were not discarded by negative sampling).
- Return type
int
-
gensim.models.word2vec_inner.
train_batch_sg
(model, sentences, alpha, _work, compute_loss)¶ Update skip-gram model by training on a batch of sentences.
Called internally from
train()
.- Parameters
model (
Word2Vec
) – The Word2Vec model instance to train.sentences (iterable of list of str) – The corpus used to train the model.
alpha (float) – The learning rate
_work (np.ndarray) – Private working memory for each worker.
compute_loss (bool) – Whether or not the training loss should be computed in this batch.
- Returns
Number of words in the vocabulary actually used for training (They already existed in the vocabulary and were not discarded by negative sampling).
- Return type
int