`models.wrappers.dtmmodel` – Dynamic Topic Models (DTM) and Dynamic Influence Models (DIM)¶

Python wrapper for Dynamic Topic Models (DTM) and the Document Influence Model (DIM).

Installation¶

You have 2 ways, how to make binaries:

Use precompiled binaries for your OS version from /magsilva/dtm/
Compile binaries manually from /blei-lab/dtm (original instruction available in https://github.com/blei-lab/dtm/blob/master/README.md), or use this
```
git clone https://github.com/blei-lab/dtm.git
sudo apt-get install libgsl0-dev
cd dtm/dtm
make
```

Examples

>>> from gensim.test.utils import common_corpus, common_dictionary
>>> from gensim.models.wrappers import DtmModel
>>>
>>> path_to_dtm_binary = "/path/to/dtm/binary"
>>> model = DtmModel(
...     path_to_dtm_binary, corpus=common_corpus, id2word=common_dictionary,
...     time_slices=[1] * len(common_corpus)
... )

class gensim.models.wrappers.dtmmodel.DtmModel(dtm_path, corpus=None, time_slices=None, mode='fit', model='dtm', num_topics=100, id2word=None, prefix=None, lda_sequence_min_iter=6, lda_sequence_max_iter=20, lda_max_em_iter=10, alpha=0.01, top_chain_var=0.005, rng_seed=0, initialize_lda=True)¶

Bases: gensim.utils.SaveLoad

Python wrapper using DTM implementation.

Communication between DTM and Python takes place by passing around data files on disk and executing the DTM binary as a subprocess.

Warning

This is only python wrapper for DTM implementation, you need to install original implementation first and pass the path to binary to dtm_path.

Parameters

dtm_path (str) – Path to the dtm binary, e.g. /home/username/dtm/dtm/main.
corpus (iterable of iterable of (int, int)) – Collection of texts in BoW format.
time_slices (list of int) – Sequence of timestamps.
mode ({'fit', 'time'}, optional) – Controls the mode of the mode: ‘fit’ is for training, ‘time’ for analyzing documents through time according to a DTM, basically a held out set.
model ({'fixed', 'dtm'}, optional) – Control model that will be runned: ‘fixed’ is for DIM and ‘dtm’ for DTM.
num_topics (int, optional) – Number of topics.
id2word (Dictionary, optional) – Mapping between tokens ids and words from corpus, if not specified - will be inferred from corpus.
prefix (str, optional) – Prefix for produced temporary files.
lda_sequence_min_iter (int, optional) – Min iteration of LDA.
lda_sequence_max_iter (int, optional) – Max iteration of LDA.
lda_max_em_iter (int, optional) – Max em optimization iterations in LDA.
alpha (int, optional) – Hyperparameter that affects sparsity of the document-topics for the LDA models in each timeslice.
top_chain_var (float, optional) – This hyperparameter controls one of the key aspect of topic evolution which is the speed at which these topics evolve. A smaller top_chain_var leads to similar word distributions over multiple timeslice.
rng_seed (int, optional) – Random seed.
initialize_lda (bool, optional) – If True - initialize DTM with LDA.

convert_input(corpus, time_slices)¶

Convert corpus into LDA-C format by BleiCorpus and save to temp file. Path to temporary file produced by ftimeslices().

Parameters

corpus (iterable of iterable of (int, float)) – Corpus in BoW format.
time_slices (list of int) – Sequence of timestamps.

dtm_coherence(time, num_words=20)¶

Get all topics of a particular time-slice without probability values for it to be used. For either “u_mass” or “c_v” coherence.

Parameters

num_words (int) – Number of words.
time (int) – Timestamp

Returns

coherence_topics – All topics of a particular time-slice without probability values for it to be used.

Return type

list of list of str

Warning

TODO: because of print format right now can only return for 1st time-slice, should we fix the coherence printing or make changes to the print statements to mirror DTM python?

dtm_vis(corpus, time)¶

Get data specified by pyLDAvis format.

Parameters

corpus (iterable of iterable of (int, float)) – Collection of texts in BoW format.
time (int) – Sequence of timestamp.

Notes

All of these are needed to visualise topics for DTM for a particular time-slice via pyLDAvis.

Returns

doc_topic (numpy.ndarray) – Document-topic proportions.
topic_term (numpy.ndarray) – Calculated term of topic suitable for pyLDAvis format.
doc_lengths (list of int) – Length of each documents in corpus.
term_frequency (numpy.ndarray) – Frequency of each word from vocab.
vocab (list of str) – List of words from docpus.

fcorpus()¶

Get path to corpus file.

Returns: Path to corpus file.
Return type: str

fcorpustxt()¶

Get path to temporary file.

Returns: Path to multiple train binary file.
Return type: str

fem_steps()¶

Get path to temporary em_step data file.

Returns: Path to em_step data file.
Return type: str

finit_alpha()¶

Get path to initially trained lda alpha file.

Returns: Path to initially trained lda alpha file.
Return type: str

finit_beta()¶

Get path to initially trained lda beta file.

Returns: Path to initially trained lda beta file.
Return type: str

flda_ss()¶

Get path to initial lda binary file.

Returns: Path to initial lda binary file.
Return type: str

fout_gamma()¶

Get path to temporary gamma data file.

Returns: Path to gamma data file.
Return type: str

fout_influence()¶

Get template of path to temporary file.

Returns: Path to file.
Return type: str

fout_liklihoods()¶

Get path to temporary lhood data file.

Returns: Path to lhood data file.
Return type: str

fout_observations()¶

Get template of path to temporary file.

Returns: Path to file.
Return type: str

fout_prob()¶

Get template of path to temporary file.

Returns: Path to file.
Return type: str

foutname()¶

Get path to temporary file.

Returns: Path to file.
Return type: str

ftimeslices()¶

Get path to time slices binary file.

Returns: Path to time slices binary file.
Return type: str

classmethod load(fname, mmap=None)¶

Load an object previously saved using save() from a file.

Parameters

fname (str) – Path to file that contains needed object.
mmap (str, optional) – Memory-map option. If the object was saved with large arrays stored separately, you can load these arrays via mmap (shared memory) using mmap=’r’. If the file being loaded is compressed (either ‘.gz’ or ‘.bz2’), then `mmap=None must be set.

Get Expert Help From The Gensim Authors

models.wrappers.dtmmodel – Dynamic Topic Models (DTM) and Dynamic Influence Models (DIM)¶

Installation¶

`models.wrappers.dtmmodel` – Dynamic Topic Models (DTM) and Dynamic Influence Models (DIM)¶