UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.
UTF-8 is a character-encoding that can describe the set of unicode code points in byte sequences of one to four bytes.
UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on linux and other recent unix-like operating systems. It was designed to be backwards-compatible with ascii while still supporting representation of all Unicode code points.
The algorithm for encoding code points in UTF-8 is described in RFC 3629.
Related tags
- The character-encoding tag discusses the general concept of character-set encodings
- The unicode character set can be represented in a variety of encodings, one of which is UTF-8
- The ascii character set and encoding it generalizes
- Other UTFs: utf-16 utf-32, rarely used: utf-7 utf-1 utf-18 utf-36 utf8mb4