summarization.keywords
– Keywords for TextRank summarization algorithm¶
This module contains functions to find keywords within a text.
Examples
>>> from gensim.summarization import keywords
>>> text = '''Challenges in natural language processing frequently involve
... speech recognition, natural language understanding, natural language
... generation (frequently from formal, machine-readable logical forms),
... connecting language and machine perception, dialog systems, or some
... combination thereof.'''
>>> keywords(text).split('\n')
[u'natural language', u'machine', u'frequently']
-
gensim.summarization.keywords.
get_graph
(text)¶ Creates and returns graph from given text, cleans and tokenize text before building graph.
- Parameters
text (str) – Sequence of values.
- Returns
Created graph.
- Return type
-
gensim.summarization.keywords.
keywords
(text, ratio=0.2, words=None, split=False, scores=False, pos_filter=('NN', 'JJ'), lemmatize=False, deacc=True)¶ Get the most ranked words of provided text and/or its combinations.
- Parameters
text (str) – Input text.
ratio (float, optional) – If no “words” option is selected, the number of sentences is reduced by the provided ratio, else, the ratio is ignored.
words (int, optional) – Number of returned words.
split (bool, optional) – Whether split keywords if True.
scores (bool, optional) – Whether score of keyword.
pos_filter (tuple, optional) – Part of speech filters.
lemmatize (bool, optional) – If True - lemmatize words.
deacc (bool, optional) – If True - remove accentuation.
- Returns
result (list of (str, float)) – If scores, keywords with scores OR
result (list of str) – If split, keywords only OR
result (str) – Keywords, joined by endl.