sklearn_api.rpmodel – Scikit learn wrapper for Random Projection model

Scikit learn interface for RpModel.

Follows scikit-learn API conventions to facilitate using gensim along with scikit-learn.

Examples

>>> from gensim.sklearn_api.rpmodel import RpTransformer
>>> from gensim.test.utils import common_dictionary, common_corpus
>>>
>>> # Initialize and fit the model.
>>> model = RpTransformer(id2word=common_dictionary).fit(common_corpus)
>>>
>>> # Use the trained model to transform a document.
>>> result = model.transform(common_corpus[3])
class gensim.sklearn_api.rpmodel.RpTransformer(id2word=None, num_topics=300)

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Base Word2Vec module, wraps RpModel.

For more information please have a look to Random projection.

Parameters
  • id2word (Dictionary, optional) – Mapping token_id -> token, will be determined from corpus if id2word == None.

  • num_topics (int, optional) – Number of dimensions.

fit(X, y=None)

Fit the model according to the given training data.

Parameters

X (iterable of list of (int, number)) – Input corpus in BOW format.

Returns

The trained model.

Return type

RpTransformer

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X ({array-like, sparse matrix, dataframe} of shape (n_samples, n_features)) –

  • y (ndarray of shape (n_samples,), default=None) – Target values.

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

object

transform(docs)

Find the Random Projection factors for docs.

Parameters

docs ({iterable of iterable of (int, int), list of (int, number)}) – Document or documents to be transformed in BOW format.

Returns

RP representation for each input document.

Return type

numpy.ndarray of shape [len(docs), num_topics]