1.6. uniseg.wrap — Text wrapping

Unicode-aware text wrapping

class uniseg.wrap.Wrapper

Text wrapping engine

Usually, you don’t need to create an instance of the class directly. Use wrap() instead.

wrap(formatter, s, cur=0, offset=0, char_wrap=None)

Wrap string s with formatter and invoke its handlers

The optional arguments, cur is the starting position of the string in logical length, and offset means left-side offset of the wrapping area in logical length — this parameter is only used for calculating tab-stopping positions for now.

If char_wrap is set to True, the text will be warpped with its grapheme cluster boundaries instead of its line break boundaries. This may be helpful when you don’t want the word wrapping feature in your application.

This function returns the total count of wrapped lines.

  • Changed in version 0.7: The order of the parameters are changed.
  • Changed in version 0.7.1: It returns the count of lines now.
uniseg.wrap.wrap(formatter, s, cur=0, offset=0, char_wrap=None)

Wrap string s with formatter using the module’s static Wrapper instance

See Wrapper.wrap() for further details of the parameters.

  • Changed in version 0.7.1: It returns the count of lines now.
class uniseg.wrap.Formatter

The abstruct base class for formatters invoked by a Wrapper object

This class is implemented only for convinience sake and does nothing itself. You don’t have to design your own formatter as a subclass of it, while it is not deprecated either.

Your formatters should have the methods and properties this class has. They are invoked by a Wrapper object to determin logical widths of texts and to give you the ways to handle them, such as to render them.

handle_new_line()

The handler method which is invoked when the current line is over and a new line begins

handle_text(text, extents)

The handler method which is invoked when text should be put on the current position with extents

reset()

Reset all states of the formatter

tab_width

The logical width of tab forwarding

This property value is used by a Wrapper object to determin the actual forwarding extents of tabs in each of the positions.

text_extents(s)

Return a list of logical lengths from start of the string to each of characters in s

wrap_width

The logical width of text wrapping

Note that returning None (which is the default) means “do not wrap” while returning 0 means “wrap as narrowly as possible.”

class uniseg.wrap.TTFormatter(wrap_width, tab_width=8, tab_char=u' ', ambiguous_as_wide=False)

A Fixed-width text wrapping formatter

ambiguous_as_wide

Treat code points with its East_Easian_Width property is ‘A’ as those with ‘W’; having double width as alpha-numerics

handle_new_line()

The handler which is invoked when the current line is over and a new line begins

handle_text(text, extents)

The handler which is invoked when a text should be put on the current position

lines()

Iterate every wrapped line strings

reset()

Reset all states of the formatter

tab_char

Character to fill tab spaces with

tab_width

forwarding size of tabs

text_extents(s)

Return a list of logical lengths from start of the string to each of characters in s

wrap_width

Wrapping width

uniseg.wrap.tt_width(s, index=0, ambiguous_as_wide=False)

Return logical width of the grapheme cluster at s[index] on fixed-width typography

Return value will be 1 (halfwidth) or 2 (fullwidth).

Generally, the width of a grapheme cluster is determined by its leading code point.

>>> tt_width('A')
1
>>> tt_width('\u8240')     # U+8240: CJK UNIFIED IDEOGRAPH-8240
2
>>> tt_width('g\u0308')    # U+0308: COMBINING DIAERESIS
1
>>> tt_width('\U00029e3d') # U+29E3D: CJK UNIFIED IDEOGRAPH-29E3D
2

If ambiguous_as_wide is specified to True, some characters such as greek alphabets are treated as they have fullwidth as well as ideographics does.

>>> tt_width('\u03b1')     # U+03B1: GREEK SMALL LETTER ALPHA
1
>>> tt_width('\u03b1', ambiguous_as_wide=True)
2
uniseg.wrap.tt_text_extents(s, ambiguous_as_wide=False)

Return a list of logical widths from the start of s to each of characters (not of code points) on fixed-width typography

>>> tt_text_extents('')
[]
>>> tt_text_extents('abc')
[1, 2, 3]
>>> tt_text_extents('\u3042\u3044\u3046')
[2, 4, 6]
>>> import sys
>>> s = '\U00029e3d'   # test a code point out of BMP
>>> actual = tt_text_extents(s)
>>> expect = [2] if sys.maxunicode > 0xffff else [2, 2]
>>> len(s) == len(expect)
True
>>> actual == expect
True

The meaning of ambiguous_as_wide is the same as that of tt_width().

uniseg.wrap.tt_wrap(s, wrap_width, tab_width=8, tab_char=u' ', ambiguous_as_wide=False, cur=0, offset=0, char_wrap=False)

Wrap s with given parameters and return a list of wrapped lines

See TTFormatter for wrap_width, tab_width and tab_char, and tt_wrap() for cur, offset and char_wrap.