1.5. uniseg.linebreak — Line break

Unicode line breaking algorithm

UAX #14: Unicode Line Breaking Algorithm
http://www.unicode.org/reports/tr14/tr14-24.html
uniseg.linebreak.line_break(c, index=0)

Return the Line_Break property of c

c must be a single Unicode code point string.

>>> print(line_break('\x0d'))
CR
>>> print(line_break(' '))
SP
>>> print(line_break('1'))
NU

If index is specified, this function consider c as a unicode string and return Line_Break property of the code point at c[index].

>>> print(line_break(u'a\x0d', 1))
CR
uniseg.linebreak.line_break_breakables(s, legacy=False)

Iterate line breaking opportunities for every position of s

1 means “break” and 0 means “do not break” BEFORE the postion. The length of iteration will be the same as len(s).

>>> list(line_break_breakables('ABC'))
[0, 0, 0]
>>> list(line_break_breakables('Hello, world.'))
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
>>> list(line_break_breakables(u''))
[]
uniseg.linebreak.line_break_boundaries(s, legacy=False, tailor=None)

Iterate indices of the line breaking boundaries of s

This function yields from 0 to the end of the string (== len(s)).

uniseg.linebreak.line_break_units(s, legacy=False, tailor=None)

Iterate every line breaking token of s

>>> s = 'The quick (“brown”) fox can’t jump 32.3 feet, right?'
>>> '|'.join(line_break_units(s)) == 'The |quick |(“brown”) |fox |can’t |jump |32.3 |feet, |right?'
True
>>> list(line_break_units(u''))
[]
>>> list(line_break_units('αα')) == [u'αα']
True
>>> list(line_break_units(u'αα', True)) == [u'α', u'α']
True