models.rpmodel
– Random Projections¶
Random Projections (also known as Random Indexing).
For theoretical background on Random Projections, see 1.
Examples
>>> from gensim.models import RpModel
>>> from gensim.corpora import Dictionary
>>> from gensim.test.utils import common_texts, temporary_file
>>>
>>> dictionary = Dictionary(common_texts) # fit dictionary
>>> corpus = [dictionary.doc2bow(text) for text in common_texts] # convert texts to BoW format
>>>
>>> model = RpModel(corpus, id2word=dictionary) # fit model
>>> result = model[corpus[3]] # apply model to document, result is vector in BoW format
>>>
>>> with temporary_file("model_file") as fname:
... model.save(fname) # save model to file
... loaded_model = RpModel.load(fname) # load model
References
- 1
Kanerva et al., 2000, Random indexing of text samples for Latent Semantic Analysis, https://cloudfront.escholarship.org/dist/prd/content/qt5644k0w6/qt5644k0w6.pdf
-
class
gensim.models.rpmodel.
RpModel
(corpus, id2word=None, num_topics=300)¶ Bases:
gensim.interfaces.TransformationABC
- Parameters
corpus (iterable of iterable of (int, int)) – Input corpus.
id2word ({dict of (int, str),
Dictionary
}, optional) – Mapping token_id -> token, will be determine from corpus if id2word == None.num_topics (int, optional) – Number of topics.
-
__getitem__
(bow)¶ Get random-projection representation of the input vector or corpus.
- Parameters
bow ({list of (int, int), iterable of list of (int, int)}) – Input document or corpus.
- Returns
list of (int, float) – if bow is document OR
TransformedCorpus
– if bow is corpus.
Examples
>>> from gensim.models import RpModel >>> from gensim.corpora import Dictionary >>> from gensim.test.utils import common_texts >>> >>> dictionary = Dictionary(common_texts) # fit dictionary >>> corpus = [dictionary.doc2bow(text) for text in common_texts] # convert texts to BoW format >>> >>> model = RpModel(corpus, id2word=dictionary) # fit model >>> >>> # apply model to document, result is vector in BoW format, i.e. [(1, 0.3), ... ] >>> result = model[corpus[0]]
-
initialize
(corpus)¶ Initialize the random projection matrix.
- Parameters
corpus (iterable of iterable of (int, int)) – Input corpus.
-
classmethod
load
(fname, mmap=None)¶ Load an object previously saved using
save()
from a file.- Parameters
fname (str) – Path to file that contains needed object.
mmap (str, optional) – Memory-map option. If the object was saved with large arrays stored separately, you can load these arrays via mmap (shared memory) using mmap=’r’. If the file being loaded is compressed (either ‘.gz’ or ‘.bz2’), then `mmap=None must be set.
See also
save()
Save object to file.
- Returns
Object loaded from fname.
- Return type
object
- Raises
AttributeError – When called on an object instance instead of class (this is a class method).
-
save
(fname_or_handle, separately=None, sep_limit=10485760, ignore=frozenset({}), pickle_protocol=2)¶ Save the object to a file.
- Parameters
fname_or_handle (str or file-like) – Path to output file or already opened file-like object. If the object is a file handle, no special array handling will be performed, all attributes will be saved to the same file.
separately (list of str or None, optional) –
If None, automatically detect large numpy/scipy.sparse arrays in the object being stored, and store them into separate files. This prevent memory errors for large objects, and also allows memory-mapping the large arrays for efficient loading and sharing the large arrays in RAM between multiple processes.
If list of str: store these attributes into separate files. The automated size check is not performed in this case.
sep_limit (int, optional) – Don’t store arrays smaller than this separately. In bytes.
ignore (frozenset of str, optional) – Attributes that shouldn’t be stored at all.
pickle_protocol (int, optional) – Protocol number for pickle.
See also
load()
Load object from file.