Basic Latin (Unicode block)
C0 Controls and Basic Latin | |
---|---|
Range |
U+0000..U+007F (128 code points) |
Plane | BMP |
Scripts |
Latin (52 char.) Common (76 char.) |
Major alphabets |
English French Spanish German Vietnamese |
Symbol sets |
Arabic numerals Punctuation |
Assigned |
128 code points 33 Control or Format |
Unused | 0 reserved code points |
Source standards | ISO/IEC 8859, ISO 646 |
Unicode version history | |
1.0.0 | 128 (+128) |
Note: [1][2] |
The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding.
The Basic Latin block was included in its present from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.[3]
Table of characters
Code | Result | Description | Acronym |
---|---|---|---|
C0 controls | |||
U+0000 | Null character | NUL | |
U+0001 | Start of Heading | SOH | |
U+0002 | Start of Text | STX | |
U+0003 | End-of-text character | ETX | |
U+0004 | End-of-transmission character | EOT | |
U+0005 | Enquiry character | ENQ | |
U+0006 | Acknowledge character | ACK | |
U+0007 | Bell character | BEL | |
U+0008 | Backspace | BS | |
U+0009 | Horizontal tab | HT | |
U+000A | Line feed | LF | |
U+000B | Vertical tab | VT | |
U+000C | Form feed | FF | |
U+000D | Carriage return | CR | |
U+000E | Shift Out | SO | |
U+000F | Shift In | SI | |
U+0010 | Data Link Escape | DLE | |
U+0011 | Device Control 1 | DC1 | |
U+0012 | Device Control 2 | DC2 | |
U+0013 | Device Control 3 | DC3 | |
U+0014 | Device Control 4 | DC4 | |
U+0015 | Negative-acknowledge character | NAK | |
U+0016 | Synchronous Idle | SYN | |
U+0017 | End of Transmission Block | ETB | |
U+0018 | Cancel character | CAN | |
U+0019 | End of Medium | EM | |
U+001A | Substitute character | SUB | |
U+001B | Escape character | ESC | |
U+001C | File Separator | FS | |
U+001D | Group Separator | GS | |
U+001E | Record Separator | RS | |
U+001F | Unit Separator | US | |
ASCII punctuation and symbols | |||
U+0020 | Space | SP | |
U+0021 | ! | Exclamation mark | |
U+0022 | " | Quotation mark | |
U+0023 | # | Number sign | |
U+0024 | $ | Dollar sign | |
U+0025 | % | Percent sign | |
U+0026 | & | Ampersand | |
U+0027 | ' | Apostrophe | |
U+0028 | ( | Left parenthesis | |
U+0029 | ) | Right parenthesis | |
U+002A | * | Asterisk | |
U+002B | + | Plus sign | |
U+002C | , | Comma | |
U+002D | - | Hyphen-minus | |
U+002E | . | Full stop or period | |
U+002F | / | Solidus or Slash | |
ASCII digits | |||
U+0030 | 0 | Digit Zero | |
U+0031 | 1 | Digit One | |
U+0032 | 2 | Digit Two | |
U+0033 | 3 | Digit Three | |
U+0034 | 4 | Digit Four | |
U+0035 | 5 | Digit Five | |
U+0036 | 6 | Digit Six | |
U+0037 | 7 | Digit Seven | |
U+0038 | 8 | Digit Eight | |
U+0039 | 9 | Digit Nine | |
ASCII punctuation and symbols | |||
U+003A | : | Colon | |
U+003B | ; | Semicolon | |
U+003C | < | Less-than sign | |
U+003D | = | Equal sign | |
U+003E | > | Greater-than sign | |
U+003F | ? | Question mark | |
U+0040 | @ | At sign or Commercial at | |
Uppercase Latin alphabet | |||
U+0041 | A | Latin Capital letter A | |
U+0042 | B | Latin Capital letter B | |
U+0043 | C | Latin Capital letter C | |
U+0044 | D | Latin Capital letter D | |
U+0045 | E | Latin Capital letter E | |
U+0046 | F | Latin Capital letter F | |
U+0047 | G | Latin Capital letter G | |
U+0048 | H | Latin Capital letter H | |
U+0049 | I | Latin Capital letter I | |
U+004A | J | Latin Capital letter J | |
U+004B | K | Latin Capital letter K | |
U+004C | L | Latin Capital letter L | |
U+004D | M | Latin Capital letter M | |
U+004E | N | Latin Capital letter N | |
U+004F | O | Latin Capital letter O | |
U+0050 | P | Latin Capital letter P | |
U+0051 | Q | Latin Capital letter Q | |
U+0052 | R | Latin Capital letter R | |
U+0053 | S | Latin Capital letter S | |
U+0054 | T | Latin Capital letter T | |
U+0055 | U | Latin Capital letter U | |
U+0056 | V | Latin Capital letter V | |
U+0057 | W | Latin Capital letter W | |
U+0058 | X | Latin Capital letter X | |
U+0059 | Y | Latin Capital letter Y | |
U+005A | Z | Latin Capital letter Z | |
ASCII punctuation and symbols | |||
U+005B | [ | Left Square Bracket | |
U+005C | \ | Backslash [A] | |
U+005D | ] | Right Square Bracket | |
U+005E | ^ | Circumflex accent | |
U+005F | _ | Low line | |
U+0060 | ` | Grave accent | |
Lowercase Latin alphabet | |||
U+0061 | a | Latin Small Letter A | |
U+0062 | b | Latin Small Letter B | |
U+0063 | c | Latin Small Letter C | |
U+0064 | d | Latin Small Letter D | |
U+0065 | e | Latin Small Letter E | |
U+0066 | f | Latin Small Letter F | |
U+0067 | g | Latin Small Letter G | |
U+0068 | h | Latin Small Letter H | |
U+0069 | i | Latin Small Letter I | |
U+006A | j | Latin Small Letter J | |
U+006B | k | Latin Small Letter K | |
U+006C | l | Latin Small Letter L | |
U+006D | m | Latin Small Letter M | |
U+006E | n | Latin Small Letter N | |
U+006F | o | Latin Small Letter O | |
U+0070 | p | Latin Small Letter P | |
U+0071 | q | Latin Small Letter Q | |
U+0072 | r | Latin Small Letter R | |
U+0073 | s | Latin Small Letter S | |
U+0074 | t | Latin Small Letter T | |
U+0075 | u | Latin Small Letter U | |
U+0076 | v | Latin Small Letter V | |
U+0077 | w | Latin Small Letter W | |
U+0078 | x | Latin Small Letter X | |
U+0079 | y | Latin Small Letter Y | |
U+007A | z | Latin Small Letter Z | |
ASCII punctuation and symbols | |||
U+007B | { | Left Curly Bracket | |
U+007C | | | Vertical bar | |
U+007D | } | Right Curly Bracket | |
U+007E | ~ | Tilde | |
Control character | |||
U+007F | Delete | DEL |
- A The letter U+005C (\) may show up as a Yen or Won sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.[4]
Subheadings
The C0 Controls and Basic Latin block contains six subheadings.[5]
C0 controls
The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.[5]
ASCII punctuation and symbols
This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.[5]
ASCII digits
The ASCII Digits subheading contains the standard European number characters 1–9 and 0.[5]
Uppercase Latin alphabet
The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.[5]
Lowercase Latin alphabet
The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.[5]
Control character
The Control Character subheading contains the "Delete" character.[5]
Compact table
C0 Controls and Basic Latin[1] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+000x | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
U+001x | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
U+002x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
U+003x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
U+004x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
U+005x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
U+006x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
U+007x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
Notes
|
Emoji
The Basic Latin block contains twelve emoji: U+0023, U+002A and U+0030–U+0039.[6][7] They're keycap base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP).
A standardized variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0︀).
The block has 24 standardized variants defined to specify emoji-style (U+FE0F VS16) or text presentation (U+FE0E VS15) for the following twelve base characters: U+0023, U+002A and U+0030–U+0039.[8]
All of these base characters default to a text presentation.
U+ | 0023 | 002A | 0030 | 0031 | 0032 | 0033 | 0034 | 0035 | 0036 | 0037 | 0038 | 0039 |
base code point | # | * | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
base+VS15 (text) | #︎ | *︎ | 0︎ | 1︎ | 2︎ | 3︎ | 4︎ | 5︎ | 6︎ | 7︎ | 8︎ | 9︎ |
base+VS16 (emoji) | #️ | *️ | 0️ | 1️ | 2️ | 3️ | 4️ | 5️ | 6️ | 7️ | 8️ | 9️ |
See also
References
- ↑ "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
- ↑ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
- ↑ The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1.
- ↑ Sorting it all Out : When is a backslash not a backslash?
- 1 2 3 4 5 6 7 "Unicode 6.2 code charts" (PDF). The Unicode Standard. Retrieved 1 April 2013.
- ↑ "UTR #51: Unicode Emoji". Unicode Consortium. 2016-11-22.
- ↑ "UCD: Emoji Data for UTR #51". Unicode Consortium. 2016-11-14.
- ↑ "Unicode Character Database: Standardized Variation Sequences". The Unicode Consortium.