Base32 Encoding

Douglas Crockford

 

Symbol
Value
Decode
Symbol
Encode
Symbol
0 0 O o 0
1 1 I i L l 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 A a A
11 B b B
12 C c C
13 D d D
14 E e E
15 F f F
16 G g G
17 H h H
18 J j J
19 K k K
20 M m M
21 N n N
22 P p P
23 Q q Q
24 R r R
25 S s S
26 T t T
27 V v V
28 W w W
29 X x X
30 Y y Y
31 Z z Z

This document describes a 32-symbol notation for expressing numbers in a form that can be conveniently and accurately transmitted between humans and computer systems.

Requirements

The encoding scheme is required to

Base

Base 10 is well known and well accepted, but it produces strings that will be unacceptably long. Base 16 is only slightly more compact.

Base 64 encoding uses a large symbol set containing both upper and lower case letters and many special characters. It is more compact than Base 16, but it is difficult to type and difficult to pronounce.

Base 32 seems the best balance between compactness and error resistance. Each symbol carries 5 bits.

Symbols

The Base32 symbol set is a superset of the Base16 symbol set.

We chose a symbol set of 10 digits and 22 letters. We exclude 4 of the 26 letters: I L O U.

Excluded Letters
I Can be confused with 1
L Can be confused with 1
O Can be confused with 0
U Accidental obscenity

When decoding, upper and lower case letters are accepted, and i and l will be treated as 1 and o will be treated as 0. When encoding, only upper case letters are used.

If the bit-length of the number to be encoded is not a multiple of 5 bits, then zero-extend the number to make its bit-length a multiple of 5.

Hyphens (-) can be inserted into symbol strings. This can partition a string into manageable pieces, improving readability by helping to prevent confusion. Hyphens are ignored during decoding. An application may look for hyphens to assure symbol string correctness.

Check

Symbol
Value
Decode
Symbol
Encode
Symbol
32 * *
33 ~ ~
34 $ $
35 = =
36 U u U

An application may append a check symbol to a symbol string. This check symbol can be used to detect wrong-symbol and transposed-symbol errors. This allows for detecting transmission and entry errors early and inexpensively.

The check symbol encodes the number modulo 37, 37 being the least prime number greater than 32. We introduce 5 additional symbols that are used only for encoding or decoding the check symbol.

The additional symbols were selected to not be confused with punctuation or with URL formatting.

0123456789ABCDEFGHJKMNPQRSTVWXYZ *~$=U 2002-11-02