Title: UTF-8, a transformation format of Unicode and ISO 10646
Author(s): F. Yergeau.
Status: INFORMATIONAL
Date: Oct 1996
Length: 11932
Obsoleted by: RFC2279
The Unicode Standard, version 1.1, and ISO/IEC 10646-1:1993 jointly define a 16 bit character set which encompasses most of the world's writing systems. 16-bit characters, however, are not compatible with many current applications and protocols, and this has led to the development of a few so-called UCS transformation formats (UTF), each with different characteristics. UTF-8, the object of this memo, has the characteristic of preserving the full US-ASCII range: US-ASCII characters are encoded in one octet having the usual US-ASCII value, and any octet with such a value can only be an US-ASCII character. This provides compatibility with file systems, parsers and other software that rely on US-ASCII values but are transparent to other values.
|
|
|