models.lsi_worker
– Worker for distributed LSI¶
Worker (“slave”) process used in computing distributed Latent Semantic Indexing (LSI,
LsiModel
) models.
Run this script on every node in your cluster. If you wish, you may even run it multiple times on a single machine, to make better use of multiple cores (just beware that memory footprint increases linearly).
How to use distributed LSI¶
Install needed dependencies (Pyro4)
pip install gensim[distributed]
Setup serialization (on each machine)
export PYRO_SERIALIZERS_ACCEPTED=pickle export PYRO_SERIALIZER=pickle
Run nameserver
python -m Pyro4.naming -n 0.0.0.0 &
Run workers (on each machine)
python -m gensim.models.lsi_worker &
Run dispatcher
python -m gensim.models.lsi_dispatcher &
Run
LsiModel
in distributed mode:>>> from gensim.test.utils import common_corpus, common_dictionary >>> from gensim.models import LsiModel >>> >>> model = LsiModel(common_corpus, id2word=common_dictionary, distributed=True)
Command line arguments¶
...
optional arguments:
-h, --help show this help message and exit
-
class
gensim.models.lsi_worker.
Worker
¶ Bases:
object
Partly initialize the model.
A full initialization requires a call to
initialize()
.-
exit
()¶ Terminate the worker.
-
getstate
()¶ Log and get the LSI model’s current projection.
- Returns
The current projection.
- Return type
-
initialize
(myid, dispatcher, **model_params)¶ Fully initialize the worker.
- Parameters
myid (int) – An ID number used to identify this worker in the dispatcher object.
dispatcher (
Dispatcher
) – The dispatcher responsible for scheduling this worker.**model_params – Keyword parameters to initialize the inner LSI model, see
LsiModel
.
-
processjob
(job)¶ Incrementally process the job and potentially logs progress.
- Parameters
job (iterable of list of (int, float)) – Corpus in BoW format.
-
requestjob
()¶ Request jobs from the dispatcher, in a perpetual loop until
getstate()
is called.- Raises
RuntimeError – If self.model is None (i.e. worker not initialized).
-
reset
()¶ Reset the worker by deleting its current projection.
-