`similarities.annoy` – Approximate Vector Search using Annoy¶

This module integrates Spotify’s Annoy (Approximate Nearest Neighbors Oh Yeah) library with Gensim’s Word2Vec, Doc2Vec, FastText and KeyedVectors word embeddings.

Important

To use this module, you must have the annoy library installed. To install it, run pip install annoy.

class gensim.similarities.annoy.AnnoyIndexer(model=None, num_trees=None)¶

This class allows the use of Annoy for fast (approximate) vector retrieval in most_similar() calls of Word2Vec, Doc2Vec, FastText and Word2VecKeyedVectors models.

Parameters

model (trained model, optional) – Use vectors from this model as the source for the index.
num_trees (int, optional) – Number of trees for Annoy indexer.

Examples

>>> from gensim.similarities.annoy import AnnoyIndexer
>>> from gensim.models import Word2Vec
>>>
>>> sentences = [['cute', 'cat', 'say', 'meow'], ['cute', 'dog', 'say', 'woof']]
>>> model = Word2Vec(sentences, min_count=1, seed=1)
>>>
>>> indexer = AnnoyIndexer(model, 2)
>>> model.most_similar("cat", topn=2, indexer=indexer)
[('cat', 1.0), ('dog', 0.32011348009109497)]

build_from_doc2vec()¶: Build an Annoy index using document vectors from a Doc2Vec model.

build_from_keyedvectors()¶: Build an Annoy index using word vectors from a KeyedVectors model.

build_from_word2vec()¶: Build an Annoy index using word vectors from a Word2Vec model.

load(fname)¶

Load an AnnoyIndexer instance from disk.

Parameters: fname (str) – The path as previously used by save().

Examples

>>> from gensim.similarities.index import AnnoyIndexer
>>> from gensim.models import Word2Vec
>>> from tempfile import mkstemp
>>>
>>> sentences = [['cute', 'cat', 'say', 'meow'], ['cute', 'dog', 'say', 'woof']]
>>> model = Word2Vec(sentences, min_count=1, seed=1, iter=10)
>>>
>>> indexer = AnnoyIndexer(model, 2)
>>> _, temp_fn = mkstemp()
>>> indexer.save(temp_fn)
>>>
>>> new_indexer = AnnoyIndexer()
>>> new_indexer.load(temp_fn)
>>> new_indexer.model = model

most_similar(vector, num_neighbors)¶

Find num_neighbors most similar items.

Parameters

vector (numpy.array) – Vector for word/document.
num_neighbors (int) – Number of most similar items

Returns

List of most similar items in format [(item, cosine_distance), … ]

Return type

list of (str, float)

save(fname, protocol=2)¶

Save AnnoyIndexer instance to disk.

Parameters

fname (str) – Path to output file, will produce 2 files: fname - parameters and fname.d - AnnoyIndex.
protocol (int, optional) – Protocol for pickle.

Notes

This method saves only the index. The trained model isn’t preserved.

Get Expert Help From The Gensim Authors

similarities.annoy – Approximate Vector Search using Annoy¶

`similarities.annoy` – Approximate Vector Search using Annoy¶