MessagePack specification(规范)
MessagePack 是类似 JSON 的对象序列化规范。
MessagePack 有两个概念:类型系统(type system)和格式(formats)。
序列化是通过 MessagePack 类型系统将应用程序对象转换为 MessagePack 格式。
反序列化是通过 MessagePack 类型系统将 MessagePack 格式转换为应用程序对象。
Serialization:
Application objects
--> MessagePack type system
--> MessagePack formats (byte array)
Deserialization:
MessagePack formats (byte array)
--> MessagePack type system
--> Application objects
本文档描述了 MessagePack 类型系统、MessagePack 格式以及它们的转换。
Table of contents
- MessagePack specification
Type system
- Types
- Integer represents an integer
- Nil represents nil
- Boolean represents true or false
- Float represents a IEEE 754 double precision floating point number including NaN and Infinity
- Raw
- String extending Raw type represents a UTF-8 string
- Binary extending Raw type represents a byte array
- Array represents a sequence of objects
- Map represents key-value pairs of objects
- Extension represents a tuple of type information and a byte array where type information is an integer whose meaning is defined by applications or MessagePack specification
- Timestamp represents an instantaneous point on the time-line in the world that is independent from time zones or calendars. Maximum precision is nanoseconds.
Limitation
- a value of an Integer object is limited from
-(2^63)
upto(2^64)-1
- maximum length of a Binary object is
(2^32)-1
- maximum byte size of a String object is
(2^32)-1
- String objects may contain invalid byte sequence and the behavior of a deserializer depends on the actual implementation when it received invalid byte sequence
(字符串对象可能包含无效的字节序列,反序列化器的行为取决于接收到无效字节序列时的实际实现)- Deserializers should provide functionality to get the original byte array so that applications can decide how to handle the object
(反序列化器应该提供获取原始字节数组的功能,以便应用程序可以决定如何处理对象)
- Deserializers should provide functionality to get the original byte array so that applications can decide how to handle the object
- maximum number of elements of an Array object is
(2^32)-1
- maximum number of key-value associations of a Map object is
(2^32)-1
Extension types
MessagePack allows applications to define application-specific types using the Extension type.
Extension type consists of an integer and a byte array where the integer represents a kind of types and the byte array represents data.
MessagePack 允许应用程序使用 Extension 类型定义特定于应用程序的类型。
扩展类型由一个整数和一个字节数组组成,其中整数表示一种类型,字节数组表示数据。
Applications can assign 0
to 127
to store application-specific type information. An example usage is that application defines type = 0
as the application’s unique type system, and stores name of a type and values of the type at the payload.
应用程序可以将“0”分配给“127”以存储特定于应用程序的类型信息。一个示例用法是应用程序将 type = 0
定义为应用程序的唯一类型系统,并将类型的名称和类型的值存储在有效负载中。
MessagePack reserves -1
to -128
for future extension to add predefined types. These types will be added to exchange more types without using pre-shared statically-typed schema across different programming environments.
MessagePack 保留 -1
到 -128
以供将来扩展以添加预定义类型。将添加这些类型以交换更多类型,而无需在不同的编程环境中使用预共享的静态类型模式。
[0, 127]: application-specific types
[-128, -1]: reserved for predefined types
Because extension types are intended to be added, old applications may not implement all of them. However, they can still handle such type as one of Extension types. Therefore, applications can decide whether they reject unknown Extension types, accept as opaque data, or transfer to another application without touching payload of them.
由于打算添加扩展类型,旧应用程序可能无法实现所有这些。但是,它们仍然可以处理扩展类型之一这样的类型。因此,应用程序可以决定是否拒绝未知的扩展类型,作为不透明数据接受,或者转移到另一个应用程序而不接触它们的有效负载。
Here is the list of predefined extension types. Formats of the types are defined at Formats section.
Name | Type |
---|---|
Timestamp | -1 |
Formats
Overview
format name | first byte (二进制) | first byte (十六进制) |
---|---|---|
positive fixint | 0xxxxxxx | 0x00 - 0x7f |
fixmap | 1000xxxx | 0x80 - 0x8f |
fixarray | 1001xxxx | 0x90 - 0x9f |
fixstr | 101xxxxx | 0xa0 - 0xbf |
nil | 11000000 | 0xc0 |
(never used) | 11000001 | 0xc1 |
false | 11000010 | 0xc2 |
true | 11000011 | 0xc3 |
bin 8 | 11000100 | 0xc4 |
bin 16 | 11000101 | 0xc5 |
bin 32 | 11000110 | 0xc6 |
ext 8 | 11000111 | 0xc7 |
ext 16 | 11001000 | 0xc8 |
ext 32 | 11001001 | 0xc9 |
float 32 | 11001010 | 0xca |
float 64 | 11001011 | 0xcb |
uint 8 | 11001100 | 0xcc |
uint 16 | 11001101 | 0xcd |
uint 32 | 11001110 | 0xce |
uint 64 | 11001111 | 0xcf |
int 8 | 11010000 | 0xd0 |
int 16 | 11010001 | 0xd1 |
int 32 | 11010010 | 0xd2 |
int 64 | 11010011 | 0xd3 |
fixext 1 | 11010100 | 0xd4 |
fixext 2 | 11010101 | 0xd5 |
fixext 4 | 11010110 | 0xd6 |
fixext 8 | 11010111 | 0xd7 |
fixext 16 | 11011000 | 0xd8 |
str 8 | 11011001 | 0xd9 |
str 16 | 11011010 | 0xda |
str 32 | 11011011 | 0xdb |
array 16 | 11011100 | 0xdc |
array 32 | 11011101 | 0xdd |
map 16 | 11011110 | 0xde |
map 32 | 11011111 | 0xdf |
negative fixint | 111xxxxx | 0xe0 - 0xff |
Notation in diagrams
one byte:
+--------+
| |
+--------+
a variable number of bytes(可变字节数):
+========+
| |
+========+
variable number of objects stored in MessagePack format:
(以 MessagePack 格式存储的可变数量的对象)
+~~~~~~~~~~~~~~~~~+
| |
+~~~~~~~~~~~~~~~~~+
X
, Y
, Z
and A
are the symbols that will be replaced by an actual bit.
X
、Y
、Z
和 A
是将被实际位替换的符号。
nil format
Nil format stores nil in 1 byte.
nil:
+--------+
| 0xc0 |
+--------+
bool format family
Bool format family stores false or true in 1 byte.
false:
+--------+
| 0xc2 |
+--------+
true:
+--------+
| 0xc3 |
+--------+
int format family
Int format family stores an integer in 1, 2, 3, 5, or 9 bytes.
Int 格式系列以 1、2、3、5 或 9 个字节存储整数。
positive fixint stores 7-bit positive integer
+--------+
|0XXXXXXX|
+--------+
negative fixint stores 5-bit negative integer
+--------+
|111YYYYY|
+--------+
* 0XXXXXXX is 8-bit unsigned integer
* 111YYYYY is 8-bit signed integer
uint 8 stores a 8-bit unsigned integer
+--------+--------+
| 0xcc |ZZZZZZZZ|
+--------+--------+
uint 16 stores a 16-bit big-endian unsigned integer
+--------+--------+--------+
| 0xcd |ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+
uint 32 stores a 32-bit big-endian unsigned integer
+--------+--------+--------+--------+--------+
| 0xce |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+
uint 64 stores a 64-bit big-endian unsigned integer
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xcf |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
int 8 stores a 8-bit signed integer
+--------+--------+
| 0xd0 |ZZZZZZZZ|
+--------+--------+
int 16 stores a 16-bit big-endian signed integer
+--------+--------+--------+
| 0xd1 |ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+
int 32 stores a 32-bit big-endian signed integer
+--------+--------+--------+--------+--------+
| 0xd2 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+
int 64 stores a 64-bit big-endian signed integer
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xd3 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
float format family
Float format family stores a floating point number in 5 bytes or 9 bytes.
float 32 stores a floating point number in IEEE 754 single precision floating point number format:
+--------+--------+--------+--------+--------+
| 0xca |XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|
+--------+--------+--------+--------+--------+
float 64 stores a floating point number in IEEE 754 double precision floating point number format:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xcb |YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
where
* XXXXXXXX_XXXXXXXX_XXXXXXXX_XXXXXXXX is a big-endian IEEE 754 single precision floating point number.
* XXXXXXXX_XXXXXXXX_XXXXXXXX_XXXXXXXX 是一个大端 IEEE 754 单精度浮点数。
Extension of precision from single-precision to double-precision does not lose precision.
将精度从单精度扩展到双精度不会丢失精度。
* YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY is a big-endian
IEEE 754 double precision floating point number
YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY 是大端序
IEEE 754 双精度浮点数
str format family
Str format family stores a byte array in 1, 2, 3, or 5 bytes of extra bytes in addition to the size of the byte array.
Str 格式系列除了字节数组的大小之外,还以 1、2、3 或 5 个字节的额外字节存储字节数组。
fixstr stores a byte array whose length is upto 31 bytes:
+--------+========+
|101XXXXX| data |
+--------+========+
str 8 stores a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+========+
| 0xd9 |YYYYYYYY| data |
+--------+--------+========+
str 16 stores a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+========+
| 0xda |ZZZZZZZZ|ZZZZZZZZ| data |
+--------+--------+--------+========+
str 32 stores a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+========+
| 0xdb |AAAAAAAA|AAAAAAAA|AAAAAAAA|AAAAAAAA| data |
+--------+--------+--------+--------+--------+========+
where
* XXXXX is a 5-bit unsigned integer which represents N
* YYYYYYYY is a 8-bit unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ is a 16-bit big-endian unsigned integer which represents N
* AAAAAAAA_AAAAAAAA_AAAAAAAA_AAAAAAAA is a 32-bit big-endian unsigned integer which represents N
* N is the length of data
* N是数据的长度
bin format family
Bin format family stores an byte array in 2, 3, or 5 bytes of extra bytes in addition to the size of the byte array.
bin 8 stores a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+========+
| 0xc4 |XXXXXXXX| data |
+--------+--------+========+
bin 16 stores a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+========+
| 0xc5 |YYYYYYYY|YYYYYYYY| data |
+--------+--------+--------+========+
bin 32 stores a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+========+
| 0xc6 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| data |
+--------+--------+--------+--------+--------+========+
where
* XXXXXXXX is a 8-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
* N is the length of data
array format family
Array format family stores a sequence of elements in 1, 3, or 5 bytes of extra bytes in addition to the elements.
fixarray stores an array whose length is upto 15 elements:
+--------+~~~~~~~~~~~~~~~~~+
|1001XXXX| N objects |
+--------+~~~~~~~~~~~~~~~~~+
array 16 stores an array whose length is upto (2^16)-1 elements:
+--------+--------+--------+~~~~~~~~~~~~~~~~~+
| 0xdc |YYYYYYYY|YYYYYYYY| N objects |
+--------+--------+--------+~~~~~~~~~~~~~~~~~+
array 32 stores an array whose length is upto (2^32)-1 elements:
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
| 0xdd |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| N objects |
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
where
* XXXX is a 4-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
* N is the size of an array
map format family
Map format family stores a sequence of key-value pairs in 1, 3, or 5 bytes of extra bytes in addition to the key-value pairs.
除了键值对之外,映射格式将键值对序列存储在 1、3 或 5 个字节的额外字节中。
fixmap stores a map whose length is upto 15 elements
+--------+~~~~~~~~~~~~~~~~~+
|1000XXXX| N*2 objects |
+--------+~~~~~~~~~~~~~~~~~+
map 16 stores a map whose length is upto (2^16)-1 elements
+--------+--------+--------+~~~~~~~~~~~~~~~~~+
| 0xde |YYYYYYYY|YYYYYYYY| N*2 objects |
+--------+--------+--------+~~~~~~~~~~~~~~~~~+
map 32 stores a map whose length is upto (2^32)-1 elements
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
| 0xdf |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| N*2 objects |
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
where
* XXXX is a 4-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
* N is the size of a map
* odd elements in objects are keys of a map
* the next element of a key is its associated value
ext format family
Ext format family stores a tuple of an integer and a byte array.
fixext 1 stores an integer and a byte array whose length is 1 byte
+--------+--------+--------+
| 0xd4 | type | data |
+--------+--------+--------+
fixext 2 stores an integer and a byte array whose length is 2 bytes
+--------+--------+--------+--------+
| 0xd5 | type | data |
+--------+--------+--------+--------+
fixext 4 stores an integer and a byte array whose length is 4 bytes
+--------+--------+--------+--------+--------+--------+
| 0xd6 | type | data |
+--------+--------+--------+--------+--------+--------+
fixext 8 stores an integer and a byte array whose length is 8 bytes
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xd7 | type | data |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
fixext 16 stores an integer and a byte array whose length is 16 bytes
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xd8 | type | data
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
+--------+--------+--------+--------+--------+--------+--------+--------+
data (cont.) |
+--------+--------+--------+--------+--------+--------+--------+--------+
ext 8 stores an integer and a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+--------+========+
| 0xc7 |XXXXXXXX| type | data |
+--------+--------+--------+========+
ext 16 stores an integer and a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+--------+========+
| 0xc8 |YYYYYYYY|YYYYYYYY| type | data |
+--------+--------+--------+--------+========+
ext 32 stores an integer and a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+--------+========+
| 0xc9 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| type | data |
+--------+--------+--------+--------+--------+--------+========+
where
* XXXXXXXX is a 8-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a big-endian 32-bit unsigned integer which represents N
* N is a length of data
* type is a signed 8-bit signed integer
* type < 0 is reserved for future extension including 2-byte type information
Timestamp extension type
Timestamp extension type is assigned to extension type -1
. It defines 3 formats: 32-bit format, 64-bit format, and 96-bit format.
timestamp 32 stores the number of seconds that have elapsed since 1970-01-01 00:00:00 UTC
in an 32-bit unsigned integer:
+--------+--------+--------+--------+--------+--------+
| 0xd6 | -1 | seconds in 32-bit unsigned int |
+--------+--------+--------+--------+--------+--------+
timestamp 64 stores the number of seconds and nanoseconds that have elapsed since 1970-01-01 00:00:00 UTC
in 32-bit unsigned integers:
+--------+--------+--------+--------+--------+------|-+--------+--------+--------+--------+
| 0xd7 | -1 | nanosec. in 30-bit unsigned int | seconds in 34-bit unsigned int |
+--------+--------+--------+--------+--------+------^-+--------+--------+--------+--------+
timestamp 96 stores the number of seconds and nanoseconds that have elapsed since 1970-01-01 00:00:00 UTC
in 64-bit signed integer and 32-bit unsigned integer:
+--------+--------+--------+--------+--------+--------+--------+
| 0xc7 | 12 | -1 |nanoseconds in 32-bit unsigned int |
+--------+--------+--------+--------+--------+--------+--------+
+--------+--------+--------+--------+--------+--------+--------+--------+
seconds in 64-bit signed int |
+--------+--------+--------+--------+--------+--------+--------+--------+
- Timestamp 32 format can represent a timestamp in [1970-01-01 00:00:00 UTC, 2106-02-07 06:28:16 UTC) range. Nanoseconds part is 0.
- Timestamp 64 format can represent a timestamp in [1970-01-01 00:00:00.000000000 UTC, 2514-05-30 01:53:04.000000000 UTC) range.
- Timestamp 96 format can represent a timestamp in [-292277022657-01-27 08:29:52 UTC, 292277026596-12-04 15:30:08.000000000 UTC) range.
- In timestamp 64 and timestamp 96 formats, nanoseconds must not be larger than 999999999.
Pseudo code for serialization:
struct timespec {
long tv_sec; // seconds
long tv_nsec; // nanoseconds
} time;
if ((time.tv_sec >> 34) == 0) {
uint64_t data64 = (time.tv_nsec << 34) | time.tv_sec;
if (data64 & 0xffffffff00000000L == 0) {
// timestamp 32
uint32_t data32 = data64;
serialize(0xd6, -1, data32)
}
else {
// timestamp 64
serialize(0xd7, -1, data64)
}
}
else {
// timestamp 96
serialize(0xc7, 12, -1, time.tv_nsec, time.tv_sec)
}
Pseudo code for deserialization:
ExtensionValue value = deserialize_ext_type();
struct timespec result;
switch(value.length) {
case 4:
uint32_t data32 = value.payload;
result.tv_nsec = 0;
result.tv_sec = data32;
case 8:
uint64_t data64 = value.payload;
result.tv_nsec = data64 >> 34;
result.tv_sec = data64 & 0x00000003ffffffffL;
case 12:
uint32_t data32 = value.payload;
uint64_t data64 = value.payload + 4;
result.tv_nsec = data32;
result.tv_sec = data64;
default:
// error
}
Serialization: type to format conversion
MessagePack serializers convert MessagePack types into formats as following:
MessagePack 序列化程序将 MessagePack 类型转换为如下格式
source types | output format |
---|---|
Integer | int format family (positive fixint, negative fixint, int 8/16/32/64 or uint 8/16/32/64) |
Nil | nil |
Boolean | bool format family (false or true) |
Float | float format family (float 32/64) |
String | str format family (fixstr or str 8/16/32) |
Binary | bin format family (bin 8/16/32) |
Array | array format family (fixarray or array 16/32) |
Map | map format family (fixmap or map 16/32) |
Extension | ext format family (fixext or ext 8/16/32) |
If an object can be represented in multiple possible output formats, serializers SHOULD use the format which represents the data in the smallest number of bytes.
Deserialization: format to type conversion
MessagePack deserializers convert MessagePack formats into types as following:
source formats | output type |
---|---|
positive fixint, negative fixint, int 8/16/32/64 and uint 8/16/32/64 | Integer |
nil | Nil |
false and true | Boolean |
float 32/64 | Float |
fixstr and str 8/16/32 | String |
bin 8/16/32 | Binary |
fixarray and array 16/32 | Array |
fixmap map 16/32 | Map |
fixext and ext 8/16/32 | Extension |
Future discussion
Profile
Profile is an idea that Applications restrict the semantics of MessagePack while sharing the same syntax to adapt MessagePack for certain use cases.
For example, applications may remove Binary type, restrict keys of map objects to be String type, and put some restrictions to make the semantics compatible with JSON. Applications which use schema may remove String and Binary types and deal with byte arrays as Raw type. Applications which use hash (digest) of serialized data may sort keys of maps to make the serialized data deterministic.
Implementation guidelines
Upgrading MessagePack specification
MessagePack specification is changed at this time.
Here is a guideline to upgrade existent MessagePack implementations:
- In a minor release, deserializers support the bin format family and str 8 format. The type of deserialized objects should be same with raw 16 (== str 16) or raw 32 (== str 32)
- In a major release, serializers distinguish Binary type and String type using bin format family and str format family
- At the same time, serializers should offer “compatibility mode” which doesn’t use bin format family and str 8 format
MessagePack specification
Last modified at 2017-08-09 22:42:07 -0700
Sadayuki Furuhashi © 2013-04-21 21:52:33 -0700
Q.E.D.