Home
Documentation
Support
API
About

4.0

Introduction
Distributed Computing
Documentation
Support
Experiments on the English Wikipedia
API Reference

Get Expert Help From The Gensim Authors

• Consulting in Machine Learning & NLP

• Corporate trainings in Data Science, NLP and Deep Learning

»
API Reference

API Reference¶

Modules:

interfaces – Core gensim interfaces
utils – Various utility functions
matutils – Math utils
_matutils – Compiled extension for math utils
downloader – Downloader API for gensim
corpora.bleicorpus – Corpus in Blei’s LDA-C format
corpora.csvcorpus – Corpus in CSV format
corpora.dictionary – Construct word<->id mappings
corpora.hashdictionary – Construct word<->id mappings
corpora.indexedcorpus – Random access to corpus documents
corpora.lowcorpus – Corpus in GibbsLda++ format
corpora.malletcorpus – Corpus in Mallet format
corpora.mmcorpus – Corpus in Matrix Market format
corpora._mmreader – Read corpus in the Matrix Market format
corpora.sharded_corpus – Corpus stored in separate files
corpora.svmlightcorpus – Corpus in SVMlight format
corpora.textcorpus – Tools for building corpora with dictionaries
corpora.ucicorpus – Corpus in UCI format
corpora.wikicorpus – Corpus from a Wikipedia dump
models.ldamodel – Latent Dirichlet Allocation
- Usage examples
models.ldamulticore – parallelized Latent Dirichlet Allocation
- Usage examples
models.nmf – Non-Negative Matrix factorization
models.lsimodel – Latent Semantic Indexing
models.ldaseqmodel – Dynamic Topic Modeling in Python
models.tfidfmodel – TF-IDF model
models.rpmodel – Random Projections
models.hdpmodel – Hierarchical Dirichlet Process
models.logentropy_model – LogEntropy model
models.normmodel – Normalization model
models.translation_matrix – Translation Matrix model
- How to make translation between two set of word-vectors
- How to make translation between two Doc2Vec models
models.lsi_dispatcher – Dispatcher for distributed LSI
- How to use distributed LSI
- Command line arguments
models.lsi_worker – Worker for distributed LSI
- How to use distributed LSI
- Command line arguments
models.lda_dispatcher – Dispatcher for distributed LDA
- How to use distributed LdaModel
- Command line arguments
models.lda_worker – Worker for distributed LDA
- How to use distributed LdaModel
- Command line arguments
models.atmodel – Author-topic models
models.word2vec – Word2vec embeddings
- Introduction
- Other embeddings
- Usage examples
models.keyedvectors – Store and query word vectors
- Why use KeyedVectors instead of a full model?
- How to obtain word vectors?
- What can I do with word vectors?
models.doc2vec – Doc2vec paragraph embeddings
- Introduction
- Usage examples
models.fasttext – FastText model
- Introduction
- Usage examples
- Implementation Notes
models._fasttext_bin – Facebook’s fastText I/O
models.phrases – Phrase (collocation) detection
models.poincare – Train and use Poincare embeddings
viz.poincare – Visualize Poincare embeddings
models.coherencemodel – Topic coherence pipeline
models.basemodel – Core TM interface
models.callbacks – Callbacks for track and viz LDA train process
- Usage examples
models.word2vec_inner – Cython routines for training Word2Vec models
models.doc2vec_inner – Cython routines for training Doc2Vec models
models.fasttext_inner – Cython routines for training FastText models
models.wrappers.ldamallet – Latent Dirichlet Allocation via Mallet
- Installation
models.wrappers.dtmmodel – Dynamic Topic Models (DTM) and Dynamic Influence Models (DIM)
- Installation
models.wrappers.ldavowpalwabbit – Latent Dirichlet Allocation via Vowpal Wabbit
- Installation
models.wrappers.wordrank – Word Embeddings from WordRank
- Installation
models.wrappers.varembed – VarEmbed Word Embeddings
similarities.docsim – Document similarity queries
- How It Works
similarities.termsim – Term similarity queries
similarities.annoy – Approximate Vector Search using Annoy
similarities.nmslib – Approximate Vector Search using NMSLIB
- Example usage
- Load and save example
- What is NMSLIB
- Why use NMSIB?
sklearn_api.atmodel – Scikit learn wrapper for Author-topic model
sklearn_api.d2vmodel – Scikit learn wrapper for paragraph2vec model
sklearn_api.hdp – Scikit learn wrapper for Hierarchical Dirichlet Process model
sklearn_api.ldamodel – Scikit learn wrapper for Latent Dirichlet Allocation
sklearn_api.ldaseqmodel – Scikit learn wrapper for LdaSeq model
sklearn_api.lsimodel – Scikit learn wrapper for Latent Semantic Indexing
sklearn_api.phrases – Scikit learn wrapper for phrase (collocation) detection
sklearn_api.rpmodel – Scikit learn wrapper for Random Projection model
sklearn_api.text2bow – Scikit learn wrapper word<->id mapping
sklearn_api.tfidf – Scikit learn wrapper for TF-IDF model
sklearn_api.w2vmodel – Scikit learn wrapper for word2vec model
test.utils – Internal testing functions
topic_coherence.aggregation – Aggregation module
topic_coherence.direct_confirmation_measure – Direct confirmation measure module
topic_coherence.indirect_confirmation_measure – Indirect confirmation measure module
topic_coherence.probability_estimation – Probability estimation module
topic_coherence.segmentation – Segmentation module
topic_coherence.text_analysis – Analyzing the texts of a corpus to accumulate statistical information about word occurrences
scripts.package_info – Information about gensim package
scripts.glove2word2vec – Convert glove format to word2vec
- How to use
- Command line arguments
scripts.make_wikicorpus – Convert articles from a Wikipedia dump to vectors.
scripts.word2vec_standalone – Train word2vec on text file CORPUS
scripts.make_wiki_online – Convert articles from a Wikipedia dump
scripts.make_wiki_online_lemma – Convert articles from a Wikipedia dump
scripts.make_wiki_online_nodebug – Convert articles from a Wikipedia dump
scripts.word2vec2tensor – Convert the word2vec format to Tensorflow 2D tensor
- How to use
- Command line arguments
scripts.segment_wiki – Convert wikipedia dump to json-line format
- How to use
- Command line arguments
parsing.porter – Porter Stemming Algorithm
parsing.preprocessing – Functions to preprocess raw text
summarization.bm25 – BM25 ranking function
summarization.commons – Graph functions used in TextRank summarization
summarization.graph – Graph used in TextRank summarization
summarization.keywords – Keywords for TextRank summarization algorithm
summarization.mz_entropy – Keywords for the Montemurro and Zanette entropy algorithm
summarization.pagerank_weighted – Weighted PageRank algorithm
summarization.summarizer – TextRank Summarizer
summarization.syntactic_unit – Syntactic Unit class
summarization.textcleaner – Preprocessing for TextRank summarization

Next Previous

© Copyright 2009-now, Radim Řehůřek. Last updated on Sep 25, 2020.
Radim Řehůřek – Machine learing and data mining expert
Created by edgy.digital