1Thrift Binary protocol encoding 2=============================== 3 4<!-- 5-------------------------------------------------------------------- 6 7Licensed to the Apache Software Foundation (ASF) under one 8or more contributor license agreements. See the NOTICE file 9distributed with this work for additional information 10regarding copyright ownership. The ASF licenses this file 11to you under the Apache License, Version 2.0 (the 12"License"); you may not use this file except in compliance 13with the License. You may obtain a copy of the License at 14 15 http://www.apache.org/licenses/LICENSE-2.0 16 17Unless required by applicable law or agreed to in writing, 18software distributed under the License is distributed on an 19"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 20KIND, either express or implied. See the License for the 21specific language governing permissions and limitations 22under the License. 23 24-------------------------------------------------------------------- 25--> 26 27This document describes the wire encoding for RPC using the older Thrift *binary protocol*. 28 29The information here is _mostly_ based on the Java implementation in the Apache thrift library (version 0.9.1 and 300.9.3). Other implementation, however, should behave the same. 31 32For background on Thrift see the [Thrift whitepaper (pdf)](https://thrift.apache.org/static/files/thrift-20070401.pdf). 33 34# Contents 35 36* Binary protocol 37 * Base types 38 * Message 39 * Struct 40 * List and Set 41 * Map 42* BNF notation used in this document 43 44# Binary protocol 45 46## Base types 47 48### Integer encoding 49 50In the _binary protocol_ integers are encoded with the most significant byte first (big endian byte order, aka network 51order). An `int8` needs 1 byte, an `int16` 2, an `int32` 4 and an `int64` needs 8 bytes. 52 53The CPP version has the option to use the binary protocol with little endian order. Little endian gives a small but 54noticeable performance boost because contemporary CPUs use little endian when storing integers to RAM. 55 56### Enum encoding 57 58The generated code encodes `Enum`s by taking the ordinal value and then encoding that as an int32. 59 60### Binary encoding 61 62Binary is sent as follows: 63 64``` 65Binary protocol, binary data, 4+ bytes: 66+--------+--------+--------+--------+--------+...+--------+ 67| byte length | bytes | 68+--------+--------+--------+--------+--------+...+--------+ 69``` 70 71Where: 72 73* `byte length` is the length of the byte array, a signed 32 bit integer encoded in network (big endian) order (must be >= 0). 74* `bytes` are the bytes of the byte array. 75 76### String encoding 77 78*String*s are first encoded to UTF-8, and then send as binary. 79 80### Double encoding 81 82Values of type `double` are first converted to an int64 according to the IEEE 754 floating-point "double format" bit 83layout. Most run-times provide a library to make this conversion. Both the binary protocol as the compact protocol then 84encode the int64 in 8 bytes in big endian order. 85 86### Boolean encoding 87 88Values of `bool` type are first converted to an int8. True is converted to `1`, false to `0`. 89 90### Universal unique identifier encoding 91 92Values of `uuid` type are expected as 16-byte binary in big endian (or "network") order. Byte order conversion 93might be necessary on certain platforms, e.g. Windows holds GUIDs in a complex record-like structure whose 94memory layout differs. 95 96*Note*: Since the length is fixed, no `byte length` prefix is necessary and the field is always 16 bytes long. 97 98 99## Message 100 101A `Message` can be encoded in two different ways: 102 103``` 104Binary protocol Message, strict encoding, 12+ bytes: 105+--------+--------+--------+--------+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+ 106|1vvvvvvv|vvvvvvvv|unused |00000mmm| name length | name | seq id | 107+--------+--------+--------+--------+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+ 108``` 109 110Where: 111 112* `vvvvvvvvvvvvvvv` is the version, an unsigned 15 bit number fixed to `1` (in binary: `000 0000 0000 0001`). 113 The leading bit is `1`. 114* `unused` is an ignored byte. 115* `mmm` is the message type, an unsigned 3 bit integer. The 5 leading bits must be `0` as some clients (checked for 116 java in 0.9.1) take the whole byte. 117* `name length` is the byte length of the name field, a signed 32 bit integer encoded in network (big endian) order (must be >= 0). 118* `name` is the method name, a UTF-8 encoded string. 119* `seq id` is the sequence id, a signed 32 bit integer encoded in network (big endian) order. 120 121The second, older encoding (aka non-strict) is: 122 123``` 124Binary protocol Message, old encoding, 9+ bytes: 125+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+--------+ 126| name length | name |00000mmm| seq id | 127+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+--------+ 128``` 129 130Where `name length`, `name`, `mmm`, `seq id` are as above. 131 132Because `name length` must be positive (therefore the first bit is always `0`), the first bit allows the receiver to see 133whether the strict format or the old format is used. Therefore a server and client using the different variants of the 134binary protocol can transparently talk with each other. However, when strict mode is enforced, the old format is 135rejected. 136 137Message types are encoded with the following values: 138 139* _Call_: 1 140* _Reply_: 2 141* _Exception_: 3 142* _Oneway_: 4 143 144## Struct 145 146A *Struct* is a sequence of zero or more fields, followed by a stop field. Each field starts with a field header and 147is followed by the encoded field value. The encoding can be summarized by the following BNF: 148 149``` 150struct ::= ( field-header field-value )* stop-field 151field-header ::= field-type field-id 152``` 153 154Because each field header contains the field-id (as defined by the Thrift IDL file), the fields can be encoded in any 155order. Thrift's type system is not extensible; you can only encode the primitive types and structs. Therefore is also 156possible to handle unknown fields while decoding; these are simply ignored. While decoding the field type can be used to 157determine how to decode the field value. 158 159Note that the field name is not encoded so field renames in the IDL do not affect forward and backward compatibility. 160 161The default Java implementation (Apache Thrift 0.9.1) has undefined behavior when it tries to decode a field that has 162another field-type than what is expected. Theoretically, this could be detected at the cost of some additional checking. 163Other implementation may perform this check and then either ignore the field, or return a protocol exception. 164 165A *Union* is encoded exactly the same as a struct with the additional restriction that at most 1 field may be encoded. 166 167An *Exception* is encoded exactly the same as a struct. 168 169### Struct encoding 170 171In the binary protocol field headers and the stop field are encoded as follows: 172 173``` 174Binary protocol field header and field value: 175+--------+--------+--------+--------+...+--------+ 176|tttttttt| field id | field value | 177+--------+--------+--------+--------+...+--------+ 178 179Binary protocol stop field: 180+--------+ 181|00000000| 182+--------+ 183``` 184 185Where: 186 187* `tttttttt` the field-type, a signed 8 bit integer. 188* `field id` the field-id, a signed 16 bit integer in big endian order. 189* `field-value` the encoded field value. 190 191The following field-types are used: 192 193* `BOOL`, encoded as `2` 194* `I8`, encoded as `3` 195* `DOUBLE`, encoded as `4` 196* `I16`, encoded as `6` 197* `I32`, encoded as `8` 198* `I64`, encoded as `10` 199* `BINARY`, used for binary and string fields, encoded as `11` 200* `STRUCT`, used for structs and union fields, encoded as `12` 201* `MAP`, encoded as `13` 202* `SET`, encoded as `14` 203* `LIST`, encoded as `15` 204* `UUID`, encoded as `16` 205 206## List and Set 207 208List and sets are encoded the same: a header indicating the size and the element-type of the elements, followed by the 209encoded elements. 210 211``` 212Binary protocol list (5+ bytes) and elements: 213+--------+--------+--------+--------+--------+--------+...+--------+ 214|tttttttt| size | elements | 215+--------+--------+--------+--------+--------+--------+...+--------+ 216``` 217 218Where: 219 220* `tttttttt` is the element-type, encoded as an int8 221* `size` is the size, encoded as an int32, positive values only 222* `elements` the element values 223 224The element-type values are the same as field-types. The full list is included in the struct section above. 225 226The maximum list/set size is configurable. By default, there is no limit (meaning the limit is the maximum int32 value: 2272147483647). 228 229## Map 230 231Maps are encoded with a header indicating the size, the element-type of the keys and the element-type of the elements, 232followed by the encoded elements. The encoding follows this BNF: 233 234``` 235map ::= key-element-type value-element-type size ( key value )* 236``` 237 238``` 239Binary protocol map (6+ bytes) and key value pairs: 240+--------+--------+--------+--------+--------+--------+--------+...+--------+ 241|kkkkkkkk|vvvvvvvv| size | key value pairs | 242+--------+--------+--------+--------+--------+--------+--------+...+--------+ 243``` 244 245Where: 246 247* `kkkkkkkk` is the key element-type, encoded as an int8 248* `vvvvvvvv` is the value element-type, encoded as an int8 249* `size` is the size of the map, encoded as an int32, positive values only 250* `key value pairs` are the encoded keys and values 251 252The element-type values are the same as field-types. The full list is included in the struct section above. 253 254The maximum map size is configurable. By default there is no limit (meaning the limit is the maximum int32 value: 2552147483647). 256 257# BNF notation used in this document 258 259The following BNF notation is used: 260 261* a plus `+` appended to an item represents repetition; the item is repeated 1 or more times 262* a star `*` appended to an item represents optional repetition; the item is repeated 0 or more times 263* a pipe `|` between items represents choice, the first matching item is selected 264* parenthesis `(` and `)` are used for grouping multiple items 265