Overview  Index  Help 
LMLML

UTF16CodecPrimArgBase


functor UTF16CodecPrimArgBase(P : sig val isLEOpt : bool option end) =
struct
  val names
  val minOrdw
  val maxOrdw
  val decode
  val encodeChar
  val encode
  val convert
end

implementation of variations of UTF-16 codec: UTF-16LE, UTF-16BE, UTF-16. The argument structure has only a isLEOpt field to indicate the endian in which 16bit integers are serialized. For UTF-16LE, isLEOpt should be SOME true. For UTF-16BE, isLEOpt should be SOME false. For UTF-16, isLEOpt should be NONE.

       
Value detail

names

val names


minOrdw

val minOrdw


maxOrdw

val maxOrdw


decode

val decode

decodes a code point from a byte sequence.

If the first 16-bit word is between 0wxD800 and 0wxDBFF, it is the first word of a surrogate pair. Then, its following 16-bit word should be between 0wxDC00 and 0wxDFFF. The two 16-bit words constitute a surrogate pair.

The following two 16-bit words

   1101 10qq xxxx xxxx
   1101 11yy zzzz zzzz
 
are decoded into a 32-bit codepoint
   0000 0000 0000 qqxx xxxx xxyy zzzz zzzz
 

Note:

   1101 1000 = 0xD8
   1101 1011 = 0xDB
   1101 1100 = 0xDC
   1101 1111 = 0xDF
 

For UTF-16 encoding, the string may begin with a byte order mark U+FEFF. If the first two bytes are [FF, FE], endian is little endian. If the first two bytes are [FE, FF], endian is big endian. Otherwise, the string is in big endian order.


encodeChar

val encodeChar


encode

val encode


convert

val convert

 


Overview  Index  Help 
LMLML: Library of MultiLingualization for ML