While most people think of numbers as the primary grist for the computing mill, more character data is stored in computers than any other kind. Character data in a computer is always stored as a simple substitution cipher: each character is assigned a binary number which represents the character inside the computer. Over the years, a number of character codes have been used. These include BCD (Binary Coded Decimal), Fieldata, ASCII (American Standard Code for Information Interchange), EBCDIC (Extended Binary Coded Decimal Interchange Code), Unicode and others. Of these, ASCII has become the de facto standard because of its widespread use in personal computers.
The following table gives the hexadecimal character codes used by ASCII to represent characters inside the computer:
Codes with a hexadecimal value less than 20 16 are called "control characters". These were used in the past for teletype and pre-LAN communications protocols, and only a few have much relevance today. Of these, the (carriage) return ("CR") and the line feed ("LF"), or new line, are of particular interest.
00 = NUL 01 = SOH 02 = STX 03 = ETX 04 = EOT 05 = ENQ 06 = ACK 07 = BEL 08 = BS 09 = HT 0A = LF 0B = VT 0C = FF 0D = CR 0E = SO 0F = SI 10 = DLE 11 = DC1 12 = DC2 13 = DC3 14 = DC4 15 = NAK 16 = SYN 17 = ETB 18 = CAN 19 = EM 1A = SUB 1B = ESC 1C = FS 1D = GS 1E = RS 1F = US 20 = space 21 = ! 22 = " 23 = # 24 = $ 25 = % 26 = & 27 = ' 28 = ( 29 = ) 2A = * 2B = + 2C = , 2D = - 2E = . 2F = / 30 = 0 31 = 1 32 = 2 33 = 3 34 = 4 35 = 5 36 = 6 37 = 7 38 = 8 39 = 9 3A = : 3B = ; 3C = < 3D = = 3E = > 3F = ? 40 = @ 41 = A 42 = B 43 = C 44 = D 45 = E 46 = F 47 = G 48 = H 49 = I 4A = J 4B = K 4C = L 4D = M 4E = N 4F = O 50 = P 51 = Q 52 = R 53 = S 54 = T 55 = U 56 = V 57 = W 58 = X 59 = Y 5A = Z 5B = [ 5C = \ 5D = ] 5E = ^ 5F = _ 60 = ` 61 = a 62 = b 63 = c 64 = d 65 = e 66 = f 67 = g 68 = h 69 = i 6A = j 6B = k 6C = l 6D = m 6E = n 6F = o 70 = p 71 = q 72 = r 73 = s 74 = t 75 = u 76 = v 77 = w 78 = x 79 = y 7A = z 7B = { 7C = | 7D = } 7E = ~ 7F = DEL
Despite the proliferation of proprietary and internationalized document formats, ASCII text remains the only
universally understood character data format. Even so, designers of computer operating systems such as UNIX,
Macintosh and Windows cannot seem to agree on something as simple as how to designate the end of a line of text
in a file. UNIX systems denote the separation between lines of text with a new line character; Macintosh systems
use a return, and Windows systems use both. This means that when transferring text files between various systems,
one sometimes finds that the destination system interprets the file as containing a single line. Of course,
filtering programs have been written to account for the differences, but the problem remains an eloquent argument
that a little standardization can be a good thing.
It is important to note that ASCII is a case-sensitive code: there are separate character codes for upper and
lower case characters. As a result, the phrase
20 74 6F 20 63 61 73 65 20 77 68 65 6E 20 49 20 75 73 65 20 41 53 43 49 49 21
20 54 4F 20 43 41 53 45 20 57 48 45 4E 20 49 20 55 53 45 20 41 53 43 49 49 21
You should also be aware that the key codes generated by a keyboard as you type are not the same as the ASCII
codes above. The codes generated by your keyboard are translated into ASCII by the keyboard controller hardware
in your computer.
As you can see from the ASCII table, only 7 bits
are used in the hexadecimal character codes: they range in
value from 0 to 7F 16. In contrast, BCD is a 5 bit code, Fieldata is a
6 bit code, EBCDIC is an 8 bit code and Unicode is a 16 bit code. Since computers store character data using one
byte for each character,
when ASCII is stored the
most significant bit of each byte is 0. This bit is sometimes used for a rudimentary
form of error checking when ASCII data is transferred between computers.
It is important to note that parity is not a very good error detection mechanism. If an even number of bits are
corrupted during transmission the error will not be detected. For instance, if our upper case A with odd parity
is received as
This concludes the chapter on data representations and computer arithmetic. We turn now to several topics
which have little or nothing to do with numbers, beginning with logic.
©2002, Kenneth R. Koehler. All Rights Reserved. This document may be freely
reproduced provided that this
copyright notice is included.
Please send comments or suggestions to
the author.
I will always pay attention to case when I use ASCII!
is encoded in hexadecimal as:
49 20 77 69 6C 6C 20 61 6C 77 61 79 73 20 70 61 79 20 61 74 74 65 6E 74 69 6F 6E
and not as:
49 20 57 49 4C 4C 20 41 4C 57 41 59 53 20 50 41 59 20 41 54 54 45 4E 54 49 4F 4E
(which of course was in all upper case: check it!).
Parity
The "parity" of a byte of data is defined as "odd" or "even" depending on the number of bits in the
byte which have a value of 1. When transferring data between computers using ASCII, the most significant bit
of each byte can be arbitrarily set to either 0 or 1 in order to force each byte to have odd or even parity.
As long as both computers agree on which type of parity is being used, transmission errors in which only one
bit of data is transferred incorrectly can be detected by checking the parity of each byte transferred. So
for example, if the two computers agree that all data is to have odd parity and one computer sends an upper
case A
41 16 (0 1 0 0 0 0 0 1 2),
the actual value sent will be
C1 16 (1 1 0 0 0 0 0 1 2).
If instead the two computers agree that all data is to have even parity, the upper case A will be sent unchanged.
01 16 (0 0 0 0 0 0 0 1 2)
or as
C716 (1 1 0 0 0 1 1 1 2),
the incorrectly transferred byte still has odd parity and will be accepted as correct. More robust error
detection mechanisms include checksums, CRC (Cyclical Redundancy Check) and ECC (Error Correction Codes),
all of which are outside the scope of this text.
Go to: Title Page Table of Contents Index