Library includes
Scalable statistical semantics
Analyze plain-text documents for semantic structure
Retrieve semantically similar documents
Features of gensim phyton library
Scalability
Gensim can process large, web-scale corpora, using incremental online training algorithms. There is no need for the whole input corpus to reside fully in RAM at any one time.
Platform independent
Being pure Python, gensim runs on Linux, Windows and OS X, as well as any other platform that supports Python and NumPy.
Robust
Gensim has been in use in various systems by various people and organizations for over 4 years. It's well past the initial “look mom, I published a script“ stage of open-source projects.
Open source
The GNU LGPL license allows both personal and commercial use, provided any modifications to gensim itself are in turn open-sourced. Other modes (dual licensing) are also possible.
Efficient implementations
The core algorithms in gensim use highly optimized math routines. Gensim also contains a distributed version of several algorithms, intended to speed up processing and retrieval on machine clusters.
Converters & I/O formats
Gensim contains memory-efficient implementations to several popular data formats: Matrix Market, SVMlight, Blei's LDA-C... These can be used for input, output, or to convert between one another.
Similarity queries
As a natural next step to topic modelling, gensim also contains code for fast indexing of documents in their semantic representation, and retrieval of topically similar documents.
Support
Gensim is supported and maintained by means of community effort. See the support page for information on using the mailing list, tutorials, FAQ, code hosting and instructions for contributors.
Installation
Quick install
Run in your terminal (recommended):
pip install --upgrade gensim
or, alternatively for conda environments:
conda install -c conda-forge gensim
That's it! Congratulations, you can proceed to the tutorials.
In case that failed, make sure you're installing into a writeable location.
Code dependencies
Gensim runs on Linux, Windows and Mac OS X, and should run on any other platform that supports Python 2.7 or 3.5+ and NumPy. Gensim depends on the following software:
- Python , tested with versions 2.7, 3.5, 3.6 and 3.7.
- NumPy for number crunching.
- smart_open for transparently opening files on remote storages or compressed files.
Testing Gensim
Problems?
Use the Gensim discussion group for questions and troubleshooting. See the support page for commercial support.
Who is using Gensim?
Doing something interesting with Gensim? Ask to be featured here.
Forever-free open-source
Gensim is licensed under the OSI-approved GNU
LGPLv2.1 license
and can be downloaded either from its Github
repository or from the Python Package Index.