Comprehensive list of character encoding formats
Here is a comprehensive list of character encoding formats, organized by categories. These formats are used to represent text in computers and digital communication systems, allowing for consistent storage, display, and transmission of textual data across different platforms and languages.
1. Standard ASCII and Extensions
ASCII (American Standard Code for Information Interchange): A 7-bit encoding covering basic English characters, control codes, and symbols.
Extended ASCII (8-bit): Adds an additional 128 characters (256 total) for more symbols and accented characters, varying by region.
2. ISO-8859 Series (ISO/IEC 8859)
ISO-8859-1: Latin-1, for Western European languages.
ISO-8859-2: Latin-2, for Central and Eastern European languages.
ISO-8859-3: Latin-3, for South European languages.
ISO-8859-4: Latin-4, for North European languages.
ISO-8859-5: Cyrillic, for Slavic languages using Cyrillic scripts.
ISO-8859-6: Arabic, for Arabic script.
ISO-8859-7: Greek, for Greek script.
ISO-8859-8: Hebrew, for Hebrew script.
ISO-8859-9: Latin-5, for Turkish language.
ISO-8859-10: Latin-6, for Nordic languages.
ISO-8859-11: Thai, for Thai script.
ISO-8859-13: Baltic languages.
ISO-8859-14: Latin-8, for Celtic languages.
ISO-8859-15: Latin-9, an updated version of Latin-1 with the Euro symbol.
ISO-8859-16: Latin-10, for Southeast European languages.
3. Unicode and Its Encodings
UTF-8 (Unicode Transformation Format-8): Variable-length encoding (1-4 bytes per character) compatible with ASCII. Widely used on the internet.
UTF-16: Uses 2 or 4 bytes per character; efficient for scripts with larger character sets.
UTF-32: Fixed-length encoding using 4 bytes per character; includes all Unicode characters.
UTF-7: 7-bit encoding designed for compatibility with systems that only support 7-bit data, now mostly obsolete.
UCS-2 (Universal Character Set-2): 16-bit encoding for Unicode, predecessor to UTF-16 but without support for supplementary characters.
4. Legacy Encoding Systems
EBCDIC (Extended Binary Coded Decimal Interchange Code): An 8-bit encoding system used primarily on IBM mainframes.
KOI8-R: Encoding for Cyrillic scripts, commonly used in Russia.
KOI8-U: Similar to KOI8-R but includes additional Ukrainian characters.
MacRoman: Character encoding used on classic Macintosh systems for Western European languages.
Windows-1252 (CP-1252): A character encoding for Western European languages, commonly used in Microsoft Windows.
Windows-1251: Encoding for Cyrillic scripts, also used on Windows systems.
Windows-1253: Encoding for Greek characters.
Windows-1255: Encoding for Hebrew characters.
Windows-1256: Encoding for Arabic characters.
5. Encodings for Specific Languages and Regions
Big5: Encoding for Traditional Chinese, mainly used in Taiwan and Hong Kong.
GB2312: Encoding for Simplified Chinese, used in China.
GBK: An extended version of GB2312, covering additional Chinese characters.
GB18030: An encoding that covers the entire Unicode character set, used in China.
Shift JIS: Encoding for Japanese characters, combining single-byte and double-byte characters.
EUC-JP (Extended Unix Code for Japanese): Another encoding for Japanese, widely used in Unix systems.
ISO-2022-JP: An encoding for Japanese that supports switching between ASCII and Japanese characters.
ISO-2022-KR: Encoding for Korean, similar to ISO-2022-JP.
EUC-KR (Extended Unix Code for Korean): Encoding used for Korean characters.
TIS-620: Encoding for Thai characters.
VISCII: Encoding for Vietnamese, includes characters specific to the Vietnamese language.
6. Encodings for Middle Eastern Languages
ISO-8859-6: Arabic character encoding.
Windows-1256: Arabic character encoding on Windows systems.
MacArabic: Arabic encoding used on Macintosh systems.
ISO-8859-8: Hebrew character encoding.
Windows-1255: Hebrew encoding on Windows systems.
MacHebrew: Hebrew encoding used on Macintosh systems.
7. Specialized and Obsolete Encodings
MIME Encodings (Multipurpose Internet Mail Extensions):
Base64: Encoding that represents binary data in ASCII for transmission over text-based protocols.
Quoted-Printable: Encoding for representing ASCII text with occasional non-ASCII characters, used in email.
UUEncode: An early encoding used to send binary files over email and text-only networks.
BinHex: Encoding used on Macintosh systems for encoding binary files as ASCII text.
YEnc: A more efficient encoding than UUEncode and BinHex, used in newsgroups for binary data.
8. Binary Encodings and Compression-Oriented Encodings
Brotli: Used for compressed web content, often as an encoding for HTTP.
Gzip: Compression encoding, commonly used to compress HTTP responses.
Deflate: Another compression encoding used for HTTP responses, combining LZ77 and Huffman coding.
Summary of Key Encodings
This list covers both legacy and modern character encodings, including Unicode and its variations (UTF-8, UTF-16, UTF-32), regional encodings (like Big5, GB2312, Shift JIS), and specialized encodings used in specific applications (like MIME/Base64 for email).
Unicode, especially UTF-8, has become the standard for representing characters across platforms and languages due to its ability to cover a vast range of symbols and scripts.
Last updated