Base64 数据编码规则字符表实现代码

Base64 是用 64 个可打印字符表示 8 位二进制数据 (含无法显示打印字符) 的编码方式。

完整的 base64 定义可见 PEM (Privacy Enhancement for Internet Electronic Mail) 和 MIME (Multipurpose Internet Mail Extensions) 。

Base64 以 4 字符存储 3 字节二进制数据，因此长度增加 1/3。

Base64 包含下列字符：

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

或

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_

编码规则

Base64 要求把每三个 8Bit 字节转换为四个 6Bit 字节 (3*8 = 4*6 = 24)，然后再向每 6Bit 字节 (2 ⁶ = 64) 添两高位 0，组成四个 8Bit 字节。

转换前	转换后	十进制	Base64
10101101 10111010 01110110	00101011 00011011 00101001 00110110	43 27 41 54	r b p 2
01110011 00110001 00110011	00011100 00110011 00000100 00110011	28 51 04 51	c z E z

以这 4 个数字作为索引，然后查表获得相应的 4 个字符，就是编码后的字符串。

若要编码的二进制数据不是 3 的倍数，最后剩下 1 或 2 个字节，Base64 用 \x00 字节在末尾补足后，再在编码的末尾加上 1 或最多 2 个 = 号。

MIME 将 Base64 编码数据的行长度强制限制为 76 字符。MIME 继承了 PEM (Privacy Enhancement for Internet Electronic Mail) 的编码，但 PEM 使用 64 个字符的行长。MIME 和 PEM 限制都是由于 SMTP 中的限制。
除非相关规范明确指示编码器在特定数量的字符后添加换行，否则，实现不得将换行添加到编码数据中。
当无法确定传输数据的长度时，才可在编码数据末尾填充 = 字符，且会忽略如 === 多余结尾字符。

Base64 字符表

值	编码	值	编码	值	编码	值	编码
0	A	17	R	34	i	51	z
1	B	18	S	35	j	52	0
2	C	19	T	36	k	53	1
3	D	20	U	37	l	54	2
4	E	21	V	38	m	55	3
5	F	22	W	39	n	56	4
6	G	23	X	40	o	57	5
7	H	24	Y	41	p	58	6
8	I	25	Z	42	q	59	7
9	J	26	a	43	r	60	8
10	K	27	b	44	s	61	9
11	L	28	c	45	t	62	+
12	M	29	d	46	u	63	/
13	N	30	e	47	v	(pad)	=
14	O	31	f	48	w
15	P	32	g	49	x
16	Q	33	h	50	y

Base64 安全字符表 (用于 URL 和文件名)

值	编码	值	编码	值	编码	值	编码
0	A	17	R	34	i	51	z
1	B	18	S	35	j	52	0
2	C	19	T	36	k	53	1
3	D	20	U	37	l	54	2
4	E	21	V	38	m	55	3
5	F	22	W	39	n	56	4
6	G	23	X	40	o	57	5
7	H	24	Y	41	p	58	6
8	I	25	Z	42	q	59	7
9	J	26	a	43	r	60	8
10	K	27	b	44	s	61	9
11	L	28	c	45	t	62	-
12	M	29	d	46	u	63	_
13	N	30	e	47	v	(pad)	=
14	O	31	f	48	w
15	P	32	g	49	x
16	Q	33	h	50	y

MIME

在 MIME 格式的电子邮件中， base64 可以将 binary 字节序列数据编码成 ASCII 字符序列构成的文本。

在电子邮件中，根据 RFC822 规定每 76 个字符还需要加上一回车换行。编码后数据长度大约为原长的 135.1%。

特殊字符

标准的 Base64 并不适合直接放在 URL 里传输，因为 URL 编码器会把标准 Base64 中的 / 和 + 字符转变成 %XX 形式，而 % 号在存入数据库时还需再转换 (ANSI SQL 已将 % 号用作通配符)。

可采用 Base62 ，或在 URL 末尾填充 = 号，将标准 Base64 中的 + 和 / 分别改成了 - 和 _，这样可避免 URL 编解码和数据库存储时的转换，不增加数据长度，还统一了数据库表单等处标识符格式。

注意：

在正则表达式中， + 和 / 具有特殊含义。

把 + 和 / 用作编程语言标识符或关键词，会产生异常。

把 + 和 / 用于传统文本搜索索引工具，会被视为断词。

Python 实现

string 中的字符，都必须在 ASCII 字符集范围内。

def base(string:str)->str:
    base, old_string, new_string = "", "", []
    base64_list = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P",
                   "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f",
                   "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v",
                   "w", "x", "y", "z", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "+", "/"]
 
    #把原始字符串转换为二进制，bin 转换后是 0b 开头的，所以把 b 替换了，首位补 0 补齐 8 位
    for i in string:
        old_string += "{:08}".format(int(str(bin(ord(i))).replace("0b", "")))
 
    #把转换好的二进制按照 6 位一组分好，最后一组不足 6 位的后面补 0
    for j in range(0, len(old_string), 6):
        new_string.append("{:<06}".format(old_string[j:j + 6]))
 
    #在 base_list 中找到对应的字符，拼接
    for l in range(len(new_string)):
        base += base64_list[int(new_string[l], 2)]
 
    #判断base字符结尾补几个 =
    if len(string) % 3 == 1:
        base += "=="
    elif len(string) % 3 == 2:
        base += "="
    return base

JavaScript 实现

if (!Shotgun)
    var Shotgun = {};
if (!Shotgun.Js)
    Shotgun.Js = {};
Shotgun.Js.Base64 = {
    _table: [
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',
        'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',
        'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
        'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/'
    ],
    encode: function (bin) {
        var codes = [];
        var un = 0;
        un = bin.length % 3;
        if (un == 1)
            bin.push(0, 0);
        else if (un == 2)
            bin.push(0);
        for (var i = 2; i < bin.length; i += 3) {
            var c = bin[i - 2] << 16;
            c |= bin[i - 1] << 8;
            c |= bin[i];
            codes.push(this._table[c >> 18 & 0x3f]);
            codes.push(this._table[c >> 12 & 0x3f]);
            codes.push(this._table[c >> 6 & 0x3f]);
            codes.push(this._table[c & 0x3f]);
        }
        if (un >= 1) {
            codes[codes.length - 1] = "=";
            bin.pop();
        }
        if (un == 1) {
            codes[codes.length - 2] = "=";
            bin.pop();
        }
        return codes.join("");
    },
    decode: function (base64Str) {
        var i = 0;
        var bin = [];
        var x = 0, code = 0, eq = 0;
        while (i < base64Str.length) {
            var c = base64Str.charAt(i++);
            var idx = this._table.indexOf(c);
            if (idx == -1) {
                switch (c) {
                    case '=': idx = 0; eq++; break;
                    case ' ':
                    case '\n':
                    case "\r":
                    case '\t':
                        continue;
                    default:
                        throw { "message": "\u0062\u0061\u0073\u0065\u0036\u0034\u002E\u0074\u0068\u0065\u002D\u0078\u002E\u0063\u006E\u0020\u0045\u0072\u0072\u006F\u0072\u003A\u65E0\u6548\u7F16\u7801\uFF1A" + c };
                }
            }
            if (eq > 0 && idx != 0)
                throw { "message": "\u0062\u0061\u0073\u0065\u0036\u0034\u002E\u0074\u0068\u0065\u002D\u0078\u002E\u0063\u006E\u0020\u0045\u0072\u0072\u006F\u0072\u003A\u7F16\u7801\u683C\u5F0F\u9519\u8BEF\uFF01" };
            code = code << 6 | idx;
            if (++x != 4)
                continue;
            bin.push(code >> 16);
            bin.push(code >> 8 & 0xff);
            bin.push(code & 0xff)
            code = x = 0;
        }
        if (code != 0)
            throw { "message": "\u0062\u0061\u0073\u0065\u0036\u0034\u002E\u0074\u0068\u0065\u002D\u0078\u002E\u0063\u006E\u0020\u0045\u0072\u0072\u006F\u0072\u003A\u7F16\u7801\u6570\u636E\u957F\u5EA6\u9519\u8BEF" };
        if (eq == 1)
            bin.pop();
        else if (eq == 2) {
            bin.pop();
            bin.pop();
        } else if (eq > 2)
            throw { "message": "\u0062\u0061\u0073\u0065\u0036\u0034\u002E\u0074\u0068\u0065\u002D\u0078\u002E\u0063\u006E\u0020\u0045\u0072\u0072\u006F\u0072\u003A\u7F16\u7801\u683C\u5F0F\u9519\u8BEF\uFF01" };
        return bin;
    }
};

内容表

编码规则
字符表
1. Base64 字符表
2. Base64 安全字符表
MIME
特殊字符
实现代码
1. Python 实现
2. JavaScript 实现
范例
1. 编码范例
2. 使用范例

上一话题

Base62 数据编码规则

下一话题

Base85 数据编码规则

快速搜索

编码范例

输入数据	十六进制	二进制	六进制	十进制	输出
0x14fb9c03d97e	1 4 f b 9 c 0 3 d 9 7 e	00010100 11111011 10011100 00000011 11011001 11111110	000101 001111 101110 011100 000000 111101 100111 111110	5 15 46 28 0 61 37 62	F P u c A 9 l +
0x14fb9c03d9	1 4 f b 9 c 0 3 d 9	00010100 11111011 10011100 00000011 11011001	000101 001111 101110 011100 000000 111101 100100	5 15 46 28 0 61 36	F P u c A 9 k =
0x14fb9c03	1 4 f b 9 c 0 3	00010100 11111011 10011100 00000011	000101 001111 101110 011100 000000 110000	5 15 46 28 0 48	F P u c A w = =

使用范例

采用数字 IDE Shell 进行快速交互测试：

# Python2.7
>>> import base64
>>> base64.b64encode("binary\x00string")
'YmluYXJ5AHN0cmluZw=='
>>> base64.b64decode("YmluYXJ5AHN0cmluZw==")
'binary\x00string'
 
# Python3.6
>>> base64.b64encode("binary\x00string".encode("utf-8"))
b'YmluYXJ5AHN0cmluZw=='
>>> base64.b64decode(b"YmluYXJ5AHN0cmluZw==")
'binary\x00string'
 
# Python3.6
>>> base64.b64encode("简体中文".encode("utf-8"))
b'566A5L2T5Lit5paH'
>>> base64.b64decode(b"566A5L2T5Lit5paH")
b'\xe7\xae\x80\xe4\xbd\x93\xe4\xb8\xad\xe6\x96\x87'
>>> base64.b64decode(b"566A5L2T5Lit5paH").decode("utf-8")
'简体中文'
 
>>> base64.b64encode("简体中文".encode("gb18030"))
b'vPLM5dbQzsQ='

# Python2.7
>>> base64.b64encode("i\xb7\x1d\xfb\xef\xff")
'abcd++//'
>>> base64.urlsafe_b64encode("i\xb7\x1d\xfb\xef\xff")
'abcd--__'
>>> base64.urlsafe_b64decode("abcd--__")
'i\xb7\x1d\xfb\xef\xff'

由于 = 字符可能出现在 Base64 编码中，但 = 在 URL Cookie 里会造成歧义，所以，可以把 = 去掉：

# Python2.7
>>> base64.b64decode("YWJjZA==")
'abcd'
>>> base64.b64decode("YWJjZA")
Traceback (most recent call last):
  ...
TypeError: Incorrect padding
>>> base64.safe_b64decode("YWJjZA")
'abcd'

# Python3.6
>>> string = "Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure."
>>> from base64 import b64encode
>>> b64encode(string.encode("utf-8"))
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

另请参阅：

Domain Name System Security Extensions

MIME (Multipurpose Internet Mail Extensions)

The Base16, Base32, and Base64 Data Encodings

PEM (Privacy Enhancement for Internet Electronic Mail)