| LMLML |
functor UTF16CodecPrimArgBase(P : sig val isLEOpt : bool option end) =
struct
val names
val minOrdw
val maxOrdw
val decode
val encodeChar
val encode
val convert
end
isLEOpt field to indicate
the endian in which 16bit integers are serialized.
For UTF-16LE, isLEOpt should be SOME true.
For UTF-16BE, isLEOpt should be SOME false.
For UTF-16, isLEOpt should be NONE.
| Value detail |
|---|
val names
val minOrdw
val maxOrdw
val decode
If the first 16-bit word is between 0wxD800 and 0wxDBFF, it is the first word of a surrogate pair. Then, its following 16-bit word should be between 0wxDC00 and 0wxDFFF. The two 16-bit words constitute a surrogate pair.
The following two 16-bit words
1101 10qq xxxx xxxx 1101 11yy zzzz zzzzare decoded into a 32-bit codepoint
0000 0000 0000 qqxx xxxx xxyy zzzz zzzz
Note:
1101 1000 = 0xD8 1101 1011 = 0xDB 1101 1100 = 0xDC 1101 1111 = 0xDF
For UTF-16 encoding, the string may begin with a byte order mark U+FEFF. If the first two bytes are [FF, FE], endian is little endian. If the first two bytes are [FE, FF], endian is big endian. Otherwise, the string is in big endian order.
val encodeChar
val encode
val convert
| LMLML: Library of MultiLingualization for ML |