summarization.summarizer
– TextRank Summarizer¶
This module provides functions for summarizing texts. Summarizing is based on ranks of text sentences using a variation of the TextRank algorithm 1.
- 1(1,2)
Federico Barrios, Federico L´opez, Luis Argerich, Rosita Wachenchauzer (2016). Variations of the Similarity Function of TextRank for Automated Summarization, https://arxiv.org/abs/1602.03606
Example
>>> from gensim.summarization.summarizer import summarize
>>> text = '''Rice Pudding - Poem by Alan Alexander Milne
... What is the matter with Mary Jane?
... She's crying with all her might and main,
... And she won't eat her dinner - rice pudding again -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... I've promised her dolls and a daisy-chain,
... And a book about animals - all in vain -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... She's perfectly well, and she hasn't a pain;
... But, look at her, now she's beginning again! -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... I've promised her sweets and a ride in the train,
... And I've begged her to stop for a bit and explain -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... She's perfectly well and she hasn't a pain,
... And it's lovely rice pudding for dinner again!
... What is the matter with Mary Jane?'''
>>> print(summarize(text))
And she won't eat her dinner - rice pudding again -
I've promised her dolls and a daisy-chain,
I've promised her sweets and a ride in the train,
And it's lovely rice pudding for dinner again!
-
gensim.summarization.summarizer.
summarize
(text, ratio=0.2, word_count=None, split=False)¶ Get a summarized version of the given text.
The output summary will consist of the most representative sentences and will be returned as a string, divided by newlines.
Note
The input should be a string, and must be longer than
INPUT_MIN_LENGTH
sentences for the summary to make sense. The text will be split into sentences using the split_sentences method in thegensim.summarization.texcleaner
module. Note that newlines divide sentences.- Parameters
text (str) – Given text.
ratio (float, optional) – Number between 0 and 1 that determines the proportion of the number of sentences of the original text to be chosen for the summary.
word_count (int or None, optional) – Determines how many words will the output contain. If both parameters are provided, the ratio will be ignored.
split (bool, optional) – If True, list of sentences will be returned. Otherwise joined strings will bwe returned.
- Returns
list of str – If split OR
str – Most representative sentences of given the text.
-
gensim.summarization.summarizer.
summarize_corpus
(corpus, ratio=0.2)¶ - Get a list of the most important documents of a corpus using a variation of the TextRank algorithm 1.
Used as helper for summarize
summarizer()
Note
The input must have at least
INPUT_MIN_LENGTH
documents for the summary to make sense.- Parameters
corpus (list of list of (int, int)) – Given corpus.
ratio (float, optional) – Number between 0 and 1 that determines the proportion of the number of sentences of the original text to be chosen for the summary, optional.
- Returns
Most important documents of given corpus sorted by the document score, highest first.
- Return type
list of str