Plane (Unicode)

Main article: Universal Character Set characters

In the Unicode standard, a plane is a continuous group of 65,536 (= 2¹⁶) code points. There are 17 planes, identified by the numbers 0 to 16_decimal, which corresponds with the possible values 00–10_hexadecimal of the first two positions in six position format (hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly-used characters. The higher planes 1 through 16 are called "supplementary planes",^[1] or humorously "astral planes". As of Unicode version 9.0, six of the planes have assigned code points (characters), and four are named.

The limit of 17 (which is not a power of 2) is due to the design of UTF-16, which can encode a maximum value of 0x10FFFF,^[2] the last code point in plane 16. The encoding scheme used by UTF-8 was designed with a much larger limit of 2³¹ code points (32,768 planes), and can encode 2²¹ code points (32 planes) even if limited to 4 bytes.^[3] Since Unicode limits the code points to the 17 planes that can be encoded by UTF-16, code points above 0x10FFFF are invalid in UTF-8 and UTF-32.

The 17 planes can accommodate 1,114,112 code points. Of these, 2,048 are surrogates, 66 are non-characters, and 137,468 are reserved for private use, leaving 974,530 for public assignment.

Planes are further subdivided into Unicode blocks, which, unlike planes, do not have a fixed size. The 273 blocks defined in Unicode 9.0 cover 24% of the possible code point space, and range in size from a minimum of 16 code points (twelve blocks) to a maximum of 65,536 code points (Supplementary Private Use Area-A and -B, which constitute the entirety of planes 15 and 16). For future usage, ranges of characters have been tentatively mapped out for most known current and ancient writing systems.^[4]

Overview

Unicode planes and used code point ranges
Basic		Supplementary
Plane 0		Plane 1		Plane 2		Planes 3–13	Plane 14	Planes 15–16
0000–FFFF		10000–1FFFF		20000–2FFFF		30000–DFFFF	E0000–EFFFF	F0000–10FFFF
Basic Multilingual Plane		Supplementary Multilingual Plane		Supplementary Ideographic Plane		unassigned	Supplementary Special-purpose Plane	Supplementary Private Use Area planes
BMP		SMP		SIP		—	SSP	SPUA-A/B
0000–0FFF 1000–1FFF 2000–2FFF 3000–3FFF 4000–4FFF 5000–5FFF 6000–6FFF 7000–7FFF	8000–8FFF 9000–9FFF A000–AFFF B000–BFFF C000–CFFF D000–DFFF E000–EFFF F000–FFFF	10000–10FFF 11000–11FFF 12000–12FFF 13000–13FFF 14000–14FFF 16000–16FFF 17000–17FFF	18000–18FFF 1B000–1BFFF 1D000–1DFFF 1E000–1EFFF 1F000–1FFFF	20000–20FFF 21000–21FFF 22000–22FFF 23000–23FFF 24000–24FFF 25000–25FFF 26000–26FFF 27000–27FFF	28000–28FFF 29000–29FFF 2A000–2AFFF 2B000–2BFFF 2C000–2CFFF 2F000–2FFFF		E0000–E0FFF	15: SPUA-A F0000–FFFFF 16: SPUA-B 100000–10FFFF

Assigned characters as of Unicode version 9.0
Plane	Allocated code points^{[note 1]}	Assigned characters^{[note 2]}
0 BMP	65,408	55,237
1 SMP	21,520	19,277
2 SIP	53,424	53,386
14 SSP	368	337
15 SPUA-A	65,536
16 SPUA-B	65,536
Totals	271,792	128,237

↑ Code points which have been allocated to a Unicode block.
↑ The total number of graphic, format and control characters (i.e., excluding private-use characters, noncharacters and surrogate code points).

Basic Multilingual Plane

A map of the Basic Multilingual Plane. Each numbered box represents 256 code points.

The first plane, plane 0, the Basic Multilingual Plane (BMP) contains characters for almost all modern languages, and a large number of symbols. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing. Most of the assigned code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.

The High Surrogates (U+D800–U+DBFF) and Low Surrogate (U+DC00–U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a pair of 16-bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.

65,408 of the 65,536 code points in this plane have been allocated to a Unicode block, leaving just 128 code points in unallocated ranges (64 code points at 0860..089F, 48 code points at 1C90..1CBF and 16 code points at 2FE0..2FEF).

As of Unicode 9.0, the BMP comprises the following 161 blocks:

C0 Controls and Basic Latin (ASCII) (0000–007F)
C1 Controls and Latin-1 Supplement (Extended ASCII) (0080–00FF)
Latin Extended-A (0100–017F)
Latin Extended-B (0180–024F)
IPA Extensions (0250–02AF)
Spacing Modifier Letters (02B0–02FF)
Combining Diacritical Marks (0300–036F)
Greek and Coptic (0370–03FF)
Cyrillic (0400–04FF)
Cyrillic Supplement (0500–052F)
Armenian (0530–058F)
Aramaic Scripts:
- Hebrew (0590–05FF)
- Arabic (0600–06FF)
- Syriac (0700–074F)
- Arabic Supplement (0750–077F)
- Thaana (0780–07BF)
- N'Ko (07C0–07FF)
- Samaritan (0800–083F)
- Mandaic (0840–085F)
- Arabic Extended-A (08A0–08FF)
Brahmic scripts:
- Devanagari (0900–097F)
- Bengali (0980–09FF)
- Gurmukhi (0A00–0A7F)
- Gujarati (0A80–0AFF)
- Oriya (0B00–0B7F)
- Tamil (0B80–0BFF)
- Telugu (0C00–0C7F)
- Kannada (0C80–0CFF)
- Malayalam (0D00–0D7F)
- Sinhala (0D80–0DFF)
- Thai (0E00–0E7F)
- Lao (0E80–0EFF)
- Tibetan (0F00–0FFF)
- Myanmar (1000–109F)
Georgian (10A0–10FF)
Hangul Jamo (1100–11FF)
Ethiopic (1200–137F)
Ethiopic Supplement (1380–139F)
Cherokee (13A0–13FF)
Unified Canadian Aboriginal Syllabics (1400–167F)
Ogham (1680–169F)
Runic (16A0–16FF)
Philippine scripts:
- Tagalog (1700–171F)
- Hanunoo (1720–173F)
- Buhid (1740–175F)
- Tagbanwa (1760–177F)
Khmer (1780–17FF)
Mongolian (1800–18AF)
Unified Canadian Aboriginal Syllabics Extended (18B0–18FF)
Limbu (1900–194F)
Tai scripts:
- Tai Le (1950–197F)
- Tai Lue (1980–19DF)
- Khmer Symbols (19E0–19FF)
- Buginese (1A00–1A1F)
- Tai Tham (1A20–1AAF)
Combining Diacritical Marks Extended (1AB0-1AFF)
Balinese (1B00–1B7F)
Sundanese (1B80–1BBF)
Batak (1BC0–1BFF)
Lepcha (1C00–1C4F)
Ol Chiki (1C50–1C7F)
Cyrillic Extended-C (1C80–1C8F)
Sundanese Supplement (1CC0–1CCF)
Vedic Extensions (1CD0–1CFF)
Latin-2 supplement:
- Phonetic Extensions (1D00–1D7F)
- Phonetic Extensions Supplement (1D80–1DBF)
- Combining Diacritical Marks Supplement (1DC0–1DFF)
- Latin Extended Additional (1E00–1EFF)
Greek Extended (1F00–1FFF)
Symbols:
- General Punctuation (2000–206F)
- Superscripts and Subscripts (2070–209F)
- Currency Symbols (20A0–20CF)
- Combining Diacritical Marks for Symbols (20D0–20FF)
- Letterlike Symbols (2100–214F)
- Number Forms (2150–218F)
- Arrows (2190–21FF)
- Mathematical Operators (2200–22FF)
- Miscellaneous Technical (2300–23FF)
- Control Pictures (2400–243F)
- Optical Character Recognition (2440–245F)
- Enclosed Alphanumerics (2460–24FF)
- Box Drawing (2500–257F)
- Block Elements (2580–259F)
- Geometric Shapes (25A0–25FF)
- Miscellaneous Symbols (2600–26FF)
- Dingbats (2700–27BF)
- Miscellaneous Mathematical Symbols-A (27C0–27EF)
- Supplemental Arrows-A (27F0–27FF)
- Braille Patterns (2800–28FF)
- Supplemental Arrows-B (2900–297F)
- Miscellaneous Mathematical Symbols-B (2980–29FF)
- Supplemental Mathematical Operators (2A00–2AFF)
- Miscellaneous Symbols and Arrows (2B00–2BFF)
Glagolitic (2C00–2C5F)
Latin Extended-C (2C60–2C7F)
Coptic (2C80–2CFF)
Georgian Supplement (2D00–2D2F)
Tifinagh (2D30–2D7F)
Ethiopic Extended (2D80–2DDF)
Cyrillic Extended-A (2DE0–2DFF)
Supplemental Punctuation (2E00–2E7F)
CJK scripts and symbols:
- CJK Radicals Supplement (2E80–2EFF)
- Kangxi Radicals (2F00–2FDF)
- Ideographic Description Characters (2FF0–2FFF)
- CJK Symbols and Punctuation (3000–303F)
- Hiragana (3040–309F)
- Katakana (30A0–30FF)
- Bopomofo (3100–312F)
- Hangul Compatibility Jamo (3130–318F)
- Kanbun (3190–319F)
- Bopomofo Extended (31A0–31BF)
- CJK Strokes (31C0–31EF)
- Katakana Phonetic Extensions (31F0–31FF)
- Enclosed CJK Letters and Months (3200–32FF)
- CJK Compatibility (3300–33FF)
- CJK Unified Ideographs Extension A (3400–4DBF)
- Yijing Hexagram Symbols (4DC0–4DFF)
- CJK Unified Ideographs (4E00–9FFF)
Yi Syllables (A000–A48F)
Yi Radicals (A490–A4CF)
Lisu (A4D0–A4FF)
Vai (A500–A63F)
Cyrillic Extended-B (A640–A69F)
Bamum (A6A0–A6FF)
Modifier Tone Letters (A700–A71F)
Latin Extended-D (A720–A7FF)
Syloti Nagri (A800–A82F)
Common Indic Number Forms (A830–A83F)
Phags-pa (A840–A87F)
Saurashtra (A880–A8DF)
Devanagari Extended (A8E0–A8FF)
Kayah Li (A900–A92F)
Rejang (A930–A95F)
Hangul Jamo Extended-A (A960–A97F)
Javanese (A980–A9DF)
Myanmar Extended-B (A9E0-A9FF)
Cham (AA00–AA5F)
Myanmar Extended-A (AA60–AA7F)
Tai Viet (AA80–AADF)
Meetei Mayek Extensions (AAE0–AAFF)
Ethiopic Extended-A (AB00–AB2F)
Latin Extended-E (AB30-AB6F)
Cherokee Supplement (AB70-ABBF)
Meetei Mayek (ABC0–ABFF)
Hangul Syllables (AC00–D7AF)
Hangul Jamo Extended-B (D7B0–D7FF)
Surrogates:
- High Surrogates (D800–DBFF)
- Low Surrogates (DC00–DFFF)
Private Use Area (E000–F8FF)
CJK Compatibility Ideographs (F900–FAFF)
Alphabetic Presentation Forms (FB00–FB4F)
Arabic Presentation Forms-A (FB50–FDFF)
Variation Selectors (FE00–FE0F)
Vertical Forms (FE10–FE1F)
Combining Half Marks (FE20–FE2F)
CJK Compatibility Forms (FE30–FE4F)
Small Form Variants (FE50–FE6F)
Arabic Presentation Forms-B (FE70–FEFF)
Halfwidth and Fullwidth Forms (FF00–FFEF)
Specials (FFF0–FFFF)

Supplementary Multilingual Plane

A map of the Supplementary Multilingual Plane. Each numbered box represents 256 code points.

Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts (except CJK ideographic), and symbols and notation used within certain fields. Scripts include Linear B, Egyptian hieroglyphs, and cuneiform scripts, and also reform orthographies like Shavian and Deseret. Symbols and notations include historic and modern musical notation; mathematical alphanumerics; Emoji and other pictographic sets; and game symbols for playing cards, Mah Jongg, and dominoes.

As of Unicode 9.0, the SMP comprises the following 103 blocks:

Archaic Greek and Other Left-to-right scripts:
- Linear B (10000–100FF)
- Aegean Numbers (10100–1013F)
- Ancient Greek Numbers (10140–1018F)
- Ancient Symbols (10190–101CF)
- Phaistos Disc (101D0–101FF)
- Lycian (10280–1029F)
- Carian (102A0–102DF)
- Coptic Epact Numbers (102E0-102FF)
- Old Italic (10300–1032F)
- Gothic (10330–1034F)
- Old Permic (10350-1037F)
- Ugaritic (10380–1039F)
- Old Persian (103A0–103DF)
- Deseret (10400–1044F)
- Shavian (10450–1047F)
- Osmanya (10480–104AF)
- Osage (104B0–104FF)
- Elbasan (10500-1052F)
- Caucasian Albanian (10530-1056F)
- Linear A (10600-1077F)
Right-to-left scripts:
- Cypriot Syllabary (10800–1083F)
- Imperial Aramaic (10840–1085F)
- Palmyrene (10860-1087F)
- Nabataean (10880-108AF)
- Hatran (108E0-108FF)
- Phoenician (10900–1091F)
- Lydian (10920–1093F)
- Meroitic Hieroglyphs (10980–1099F)
- Meroitic Cursive (109A0–109FF)
- Kharoshthi (10A00–10A5F)
- Old South Arabian (10A60–10A7F)
- Old North Arabian (10A80-10A9F)
- Manichaean (10AC0-10AFF)
- Avestan (10B00–10B3F)
- Inscriptional Parthian (10B40–10B5F)
- Inscriptional Pahlavi (10B60–10B7F)
- Psalter Pahlavi (10B80-10BAF)
- Old Turkic (10C00–10C4F)
- Old Hungarian (10C80-10CFF)
- Rumi Numeral Symbols (10E60–10E7F)
Brahmic scripts:
- Brahmi (11000–1107F)
- Kaithi (11080–110CF)
- Sora Sompeng (110D0–110FF)
- Chakma (11100–1114F)
- Mahajani (11150-1117F)
- Sharada (11180–111DF)
- Sinhala Archaic Numbers (111E0-111FF)
- Khojki (11200-1124F)
- Multani (11280-112AF)
- Khudawadi (112B0-112FF)
- Grantha (11300-1137F)
- Newa (11400-1147F)
- Tirhuta (11480-114DF)
- Siddham (11580-115FF)
- Modi (11600-1165F)
- Takri (11680–116CF)
- Ahom (11700-1173F)
- Warang Citi (118A0-118FF)
- Pau Cin Hau (11AC0-11AFF)
- Bhaiksuki (11C00-11C6F)
- Marchen (11C70-11CBF)
Mongolian Supplement (11660-1167F)
Cuneiform (12000–123FF)
Cuneiform Numbers and Punctuation (12400–1247F)
Early Dynastic Cuneiform (12480-1254F)
Egyptian Hieroglyphs (13000–1342F)
Anatolian Hieroglyphs (14400-1467F)
Bamum Supplement (16800–16A3F)
Mro (16A40-16A6F)
Bassa Vah (16AD0-16AFF)
Pahawh Hmong (16B00-16B8F)
Miao (16F00–16F9F)
Ideographic Symbols and Punctuation (16FE0-16FFF)
Tangut (17000–187FF)
Tangut Components (18800–18AFF)
Kana Supplement (1B000–1B0FF)
Duployan shorthand (1BC00-1BCAF)
Supplementary symbols:
- Musical notation:
  - Byzantine Musical Symbols (1D000–1D0FF)
  - Musical Symbols (1D100–1D1FF)
  - Ancient Greek Musical Notation (1D200–1D24F)
- Mathematical symbols:
  - Tai Xuan Jing Symbols (1D300–1D35F)
  - Counting Rod Numerals (1D360–1D37F)
  - Mathematical Alphanumeric Symbols (1D400–1D7FF)
- Sutton SignWriting (1D800-1DAAF)
- Glagolitic Supplement (1E000-1E02F)
- Mende Kikakui (1E800-1E8DF)
- Adlam (1E900-1E95F)
- Arabic Mathematical Alphabetic Symbols (1EE00–1EEFF)
- Game tiles and cards:
  - Mahjong Tiles (1F000–1F02F)
  - Domino Tiles (1F030–1F09F)
  - Playing Cards (1F0A0–1F0FF)
- Enclosed Alphanumeric Supplement (1F100–1F1FF)
- Enclosed Ideographic Supplement (1F200–1F2FF)
- Miscellaneous Symbols and Pictographs (1F300–1F5FF)
- Emoticons (1F600–1F64F)
- Ornamental Dingbats (1F650-1F67F)
- Transport and Map Symbols (1F680–1F6FF)
- Alchemical symbols (1F700–1F77F)
- Geometric Shapes Extended (1F780-1F7FF)
- Supplemental Arrows-C (1F800-1F8FF)
- Supplemental Symbols and Pictographs (1F900-1F9FF)

Supplementary Ideographic Plane

A map of the Supplementary Ideographic Plane. Each numbered box represents 256 code points.

Plane 2, the Supplementary Ideographic Plane (SIP), is used for CJK Ideographs, mostly CJK Unified Ideographs, that were not included in earlier character encoding standards.

As of Unicode 9.0, the SIP comprises the following five blocks:

CJK Unified Ideographs Extension B (20000–2A6DF)
CJK Unified Ideographs Extension C (2A700–2B73F)
CJK Unified Ideographs Extension D (2B740–2B81F)
CJK Unified Ideographs Extension E (2B820-2CEAF)
CJK Compatibility Ideographs Supplement (2F800–2FA1F)

Unassigned planes

Planes 3 to 13 (planes 3 to D in hexadecimal): No characters have yet been assigned to Planes 3 through 13. Plane 3 is tentatively named the Tertiary Ideographic Plane (TIP), but as of version 9.0 there are no characters assigned to it.^[5] It is reserved for Oracle Bone script, Bronze Script, Small Seal Script, additional CJK unified ideographs, and other historic ideographic scripts.^[6]

It is not anticipated that all these planes will be used in the foreseeable future, given the total sizes of the known writing systems left to be encoded. The number of possible symbol characters that could arise outside of the context of writing systems is potentially huge. At the moment, these 11 planes out of 17 are unused.

Supplementary Special-purpose Plane

A map of the Supplementary Special-purpose Plane. Each numbered box represents 256 code points.

Plane 14 (E in hexadecimal), the Supplementary Special-purpose Plane (SSP), currently contains non-graphical characters. The first block is for special use tag characters. The other block contains glyph variation selectors to indicate an alternate glyph for a character that cannot be determined by context.

As of Unicode 9.0, the SSP comprises the following two blocks:

Tags (E0000–E007F)
Variation Selectors Supplement (E0100–E01EF)

Private Use Area planes

The two planes 15 and 16 (planes F and 10 in hexadecimal), are designated as "private use planes". They contain blocks called Supplementary Private Use Area-A and -B which are available for character assignment by parties outside the ISO and the Unicode Consortium. They are used by fonts internally to refer to auxiliary glyphs, for example, ligatures and building blocks for other glyphs. Such characters will have limited interoperability. Software and fonts that support Unicode will not necessarily support character assignments by other parties.

References

↑ Unicode Consortium Glossary—Supplementary Planes
↑ See Table 3.5 "UTF-16 Bit Distribution" in the Unicode Standard http://www.unicode.org/versions/Unicode6.0.0/UnicodeStandard-6.0.pdf
↑ See Table 3.6 "UTF-8 Bit Distribution" in the Unicode Standard http://www.unicode.org/versions/Unicode6.0.0/UnicodeStandard-6.0.pdf
↑ Unicode roadmaps
↑ "Unicode Data". Retrieved 17 June 2015.
↑ Roadmap to the TIP

Unicode

Code points

Characters

Special purpose	BOM Combining Grapheme Joiner Left-to-right mark / Right-to-left mark Soft hyphen Word joiner Zero-width joiner Zero-width non-joiner Zero-width space

Lists	Characters CJK Unified Ideographs Combining character Duplicate characters Numerals Scripts Spaces Symbols Halfwidth and fullwidth

Processing

Algorithms	Bi-directional text Collation ISO 14651 Equivalence Variation sequences

Comparison	BOCU-1 CESU-8 Punycode SCSU UTF-1 UTF-7 UTF-8 UTF-9/UTF-18 UTF-16/UCS-2 UTF-32/UCS-4 UTF-EBCDIC

On pairs of
code points

Usage

Related standards