similarities.annoy
– Approximate Vector Search using Annoy¶
This module integrates Spotify’s Annoy (Approximate Nearest Neighbors Oh Yeah)
library with Gensim’s Word2Vec
, Doc2Vec
,
FastText
and KeyedVectors
word embeddings.
Important
To use this module, you must have the annoy
library installed.
To install it, run pip install annoy
.
-
class
gensim.similarities.annoy.
AnnoyIndexer
(model=None, num_trees=None)¶ This class allows the use of Annoy for fast (approximate) vector retrieval in most_similar() calls of
Word2Vec
,Doc2Vec
,FastText
andWord2VecKeyedVectors
models.- Parameters
model (trained model, optional) – Use vectors from this model as the source for the index.
num_trees (int, optional) – Number of trees for Annoy indexer.
Examples
>>> from gensim.similarities.annoy import AnnoyIndexer >>> from gensim.models import Word2Vec >>> >>> sentences = [['cute', 'cat', 'say', 'meow'], ['cute', 'dog', 'say', 'woof']] >>> model = Word2Vec(sentences, min_count=1, seed=1) >>> >>> indexer = AnnoyIndexer(model, 2) >>> model.most_similar("cat", topn=2, indexer=indexer) [('cat', 1.0), ('dog', 0.32011348009109497)]
-
build_from_doc2vec
()¶ Build an Annoy index using document vectors from a Doc2Vec model.
-
build_from_keyedvectors
()¶ Build an Annoy index using word vectors from a KeyedVectors model.
-
build_from_word2vec
()¶ Build an Annoy index using word vectors from a Word2Vec model.
-
load
(fname)¶ Load an AnnoyIndexer instance from disk.
- Parameters
fname (str) – The path as previously used by
save()
.
Examples
>>> from gensim.similarities.index import AnnoyIndexer >>> from gensim.models import Word2Vec >>> from tempfile import mkstemp >>> >>> sentences = [['cute', 'cat', 'say', 'meow'], ['cute', 'dog', 'say', 'woof']] >>> model = Word2Vec(sentences, min_count=1, seed=1, iter=10) >>> >>> indexer = AnnoyIndexer(model, 2) >>> _, temp_fn = mkstemp() >>> indexer.save(temp_fn) >>> >>> new_indexer = AnnoyIndexer() >>> new_indexer.load(temp_fn) >>> new_indexer.model = model
-
most_similar
(vector, num_neighbors)¶ Find num_neighbors most similar items.
- Parameters
vector (numpy.array) – Vector for word/document.
num_neighbors (int) – Number of most similar items
- Returns
List of most similar items in format [(item, cosine_distance), … ]
- Return type
list of (str, float)
-
save
(fname, protocol=2)¶ Save AnnoyIndexer instance to disk.
- Parameters
fname (str) – Path to output file, will produce 2 files: fname - parameters and fname.d -
AnnoyIndex
.protocol (int, optional) – Protocol for pickle.
Notes
This method saves only the index. The trained model isn’t preserved.