1.1. uniseg.codepoint
— Unicode code point¶
Unicode code point
-
uniseg.codepoint.
ord
(c, index=None)¶ Return the integer value of the Unicode code point c
NOTE: Some Unicode code points may be expressed with a couple of other code points (“surrogate pair”). This function treats surrogate pairs as representations of original code points; e.g.
ord(u'\ud842\udf9f')
returns134047
(0x20b9f
).u'\ud842\udf9f'
is a surrogate pair expression which meansu'\U00020b9f'
.>>> ord('a') 97 >>> ord('\u3042') 12354 >>> ord('\U00020b9f') 134047 >>> ord('abc') Traceback (most recent call last): ... TypeError: need a single Unicode code point as parameter
It returns the result of built-in ord() when c is a single str object for compatibility:
>>> ord('a') 97
When index argument is specified (to not
None
), this function treats c as a Unicode string and returns integer value of code point atc[index]
(or may bec[index:index+2]
):>>> ord('hello', 0) 104 >>> ord('hello', 1) 101 >>> ord('a\U00020b9f', 1) 134047
-
uniseg.codepoint.
unichr
(cp)¶ Return the unicode object represents the code point integer cp
>>> unichr(0x61) == 'a' True
Notice that some Unicode code points may be expressed with a couple of other code points (“surrogate pair”) in narrow-build Python. In those cases, this function will return a unicode object of which length is more than one; e.g.
unichr(0x20b9f)
returnsu'\U00020b9f'
while built-inunichr()
may raise ValueError.>>> unichr(0x20b9f) == '\U00020b9f' True
-
uniseg.codepoint.
code_points
(s)¶ Iterate every Unicode code points of the unicode string s
>>> s = 'hello' >>> list(code_points(s)) == ['h', 'e', 'l', 'l', 'o'] True
The number of iteration may differ from the
len(s)
, because some code points may be represented as a couple of other code points (“surrogate pair”) in narrow-build Python.>>> s = 'abc\U00020b9f\u3042' >>> list(code_points(s)) == ['a', 'b', 'c', '\U00020b9f', '\u3042'] True