models.word2vec_inner – Cython routines for training Word2Vec models

Optimized cython functions for training Word2Vec model.

gensim.models.word2vec_inner.init()
Precompute function sigmoid(x) = 1 / (1 + exp(-x)), for x values discretized into table EXP_TABLE.

Also calculate log(sigmoid(x)) into LOG_TABLE.

Returns

Enumeration to signify underlying data type returned by the BLAS dot product calculation. 0 signifies double, 1 signifies double, and 2 signifies that custom cython loops were used instead of BLAS.

Return type

{0, 1, 2}

gensim.models.word2vec_inner.score_sentence_cbow(model, sentence, _work, _neu1)

Obtain likelihood score for a single sentence in a fitted CBOW representation.

Notes

This scoring function is only implemented for hierarchical softmax (model.hs == 1). The model should have been trained using the skip-gram model (model.cbow == 1`).

Parameters
  • model (Word2Vec) – The trained model. It MUST have been trained using hierarchical softmax and the CBOW algorithm.

  • sentence (list of str) – The words comprising the sentence to be scored.

  • _work (np.ndarray) – Private working memory for each worker.

  • _neu1 (np.ndarray) – Private working memory for each worker.

Returns

The probability assigned to this sentence by the Skip-Gram model.

Return type

float

gensim.models.word2vec_inner.score_sentence_sg(model, sentence, _work)

Obtain likelihood score for a single sentence in a fitted skip-gram representation.

Notes

This scoring function is only implemented for hierarchical softmax (model.hs == 1). The model should have been trained using the skip-gram model (model.sg == 1`).

Parameters
  • model (Word2Vec) – The trained model. It MUST have been trained using hierarchical softmax and the skip-gram algorithm.

  • sentence (list of str) – The words comprising the sentence to be scored.

  • _work (np.ndarray) – Private working memory for each worker.

Returns

The probability assigned to this sentence by the Skip-Gram model.

Return type

float

gensim.models.word2vec_inner.train_batch_cbow(model, sentences, alpha, _work, _neu1, compute_loss)

Update CBOW model by training on a batch of sentences.

Called internally from train().

Parameters
  • model (Word2Vec) – The Word2Vec model instance to train.

  • sentences (iterable of list of str) – The corpus used to train the model.

  • alpha (float) – The learning rate.

  • _work (np.ndarray) – Private working memory for each worker.

  • _neu1 (np.ndarray) – Private working memory for each worker.

  • compute_loss (bool) – Whether or not the training loss should be computed in this batch.

Returns

Number of words in the vocabulary actually used for training (They already existed in the vocabulary and were not discarded by negative sampling).

Return type

int

gensim.models.word2vec_inner.train_batch_sg(model, sentences, alpha, _work, compute_loss)

Update skip-gram model by training on a batch of sentences.

Called internally from train().

Parameters
  • model (Word2Vec) – The Word2Vec model instance to train.

  • sentences (iterable of list of str) – The corpus used to train the model.

  • alpha (float) – The learning rate

  • _work (np.ndarray) – Private working memory for each worker.

  • compute_loss (bool) – Whether or not the training loss should be computed in this batch.

Returns

Number of words in the vocabulary actually used for training (They already existed in the vocabulary and were not discarded by negative sampling).

Return type

int