1.4. uniseg.sentencebreak — Sentence break

sentence boundaries

UAX #29: Unicode Text Segmentation http://www.unicode.org/reports/tr29/tr29-15.html

uniseg.sentencebreak.sentence_break(c, index=0)

Return Sentence_Break property value of c

c must be a single Unicode code point string.

>>> print(sentence_break(u'\x0d'))
CR
>>> print(sentence_break(u' '))
Sp
>>> print(sentence_break(u'a'))
Lower

If index is specified, this function consider c as a unicode string and return Sentence_Break property of the code point at c[index].

>>> print(sentence_break(u'a\x0d', 1))
CR
uniseg.sentencebreak.sentence_breakables(s)

Iterate sentence breaking opportunities for every position of s

1 for “break” and 0 for “do not break”. The length of iteration will be the same as len(s).

>>> s = 'He said, “Are you going?” John shook his head.'
>>> list(sentence_breakables(s))
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
uniseg.sentencebreak.sentence_boundaries(s, tailor=None)

Iterate indices of the sentence boundaries of s

This function yields from 0 to the end of the string (== len(s)).

>>> list(sentence_boundaries(u'ABC'))
[0, 3]
>>> s = 'He said, “Are you going?” John shook his head.'
>>> list(sentence_boundaries(s))
[0, 26, 46]
>>> list(sentence_boundaries(u''))
[]
uniseg.sentencebreak.sentences(s, tailor=None)

Iterate every sentence of s

>>> s = 'He said, “Are you going?” John shook his head.'
>>> list(sentences(s)) == ['He said, “Are you going?” ', 'John shook his head.']
True