2.16. Protocol Buffers (protobuf)

Protocol Buffers is a framework that generates serialization code for messages described in platform independent message description language. Individual language bindings are provided for multiple languages including C++, Java, Python, JavaScript.

Protocol Buffers exists in two major versions, 2 and 3, which differ in how messages are structured. Version 2 supports messages with both required and optional fields. Fields can have custom default values, and missing fields are distinguished from fields with default values. Version 3 considers all fields optional. Fields have standard default value of zero or null, fields with default values are serialized only with explicit presence tracking.

2.16.1. Message Description Language

The message description language defines each message as a set of fields. Each field has a type, a name, and a key, which identifies the field inside serialized messages. Standard fields are present in a message at most once, with explicit presence tracking when defined as optional. Repeated fields can be present an arbitrary number of times.

Protocol Buffers Message Specification Example

syntax = "proto3";

// File level options supported.
option optimize_for = SPEED;

message SomeMessage {

    // Field identifiers reserved after message changes.
    reserved 8, 100;

    // Many integer types with specific encodings.
    int32 aMostlyPositiveInteger = 1;
    sint64 aSignedInteger = 2;
    uint64 anUnsignedInteger = 3;
    fixed32 anOftenBigUnsignedInteger = 4;
    sfixed32 anOftenBigSignedInteger = 5;

    // String always with UTF 8 encoding.
    string aString = 10;

    // Another message type.
    AnotherMessage aMessage = 111;

    // Explicit presence tracking is optional.
    optional float aFloatWithPresenceTracking = 222;

    // Variable length content supported.
    repeated string aStringList = 333;
    map <int32, string> aMap = 444;

    // Field level options supported.
    int32 aDeprecatedInteger = 666 [deprecated = true];

    // Extension field range.
    extensions 1234 to 5678;
}

extend SomeMessage {
    // Extension field in extension field range.
    int32 anExtensionField = 1234;
}
  • A spectrum of basic types

  • Packages and nested types

  • Fields can be repeated

  • Fields can have presence tracked

  • Explicit field identifiers for versioning

  • Options tune code generation

  • Extensions reserve fields

Historically, the optional modifier is somewhat misnamed. In version 2, fields were either required or optional. In version 3, fields are always optional, and the modifier merely indicates that the field presence is tracked.

Protocol Buffers Primitive Field Types

Integer Types. 

(s)fixed(32|64)

Integers with fixed length encoding

(u)int(32|64)

Integers with variable length encoding

sint(32|64)

Integers with sign optimized variable length encoding

Floating Poing Types. 

float

IEEE 754 32 bit float

double

IEEE 754 64 bit float

Additional Primitive Types. 

bool

Boolean

bytes

Arbitrary sequence of bytes

string

Arbitrary sequence of UTF-8 characters

Protocol Buffers More Field Types

Oneof Type. 

message AnExampleMessage {
    oneof some_oneof_field {
        int32 some_integer = 1;
        string some_string = 2;
    }
}
  • Assigning one field clears others

Enum Type. 

enum AnEnum {
    INITIAL = 0;
    RED = 1;
    BLUE = 2;
    GREEN = 3;
    WHATEVER = 8;
}
  • Must include zero

Any Type. 

import "google/protobuf/any.proto";
message AnExampleMessage {
    repeated google.protobuf.Any whatever = 8;
}
  • Internally a type identifier and a value

  • Type identifier is URI string

  • Value is byte buffer

Map Type. 

message AnExampleMessage {
    map<int32, string> keywords = 8;
}

See https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/descriptor.proto for the full list of available options that tune code generation. Options can be associated with a file, a type, or a field.

2.16.2. C++ Generated Code Basics

C++ Message Manipulation

Construction. 

AnExampleMessage message;
AnExampleMessage message (another_message);
message.CopyFrom (another_message);

Singular Fields. 

cout << message.some_integer ();
message.set_some_integer (1234);
if (message.has_optional_integer ()) {
    message.clear_optional_integer ();
}

Repeated Fields. 

int size = messages.messages_size ();
const AnExampleMessage &message = messages.messages (1234);
AnExampleMessage *message = messages.mutable_messages (1234);
AnExampleMessage *message = messages.add_messages ();

Byte Array Serialization. 

char buffer [BUFFER_SIZE];
message.SerializeToArray (buffer, sizeof (buffer));
message.ParseFromArray (buffer, sizeof (buffer));

Standard Stream Serialization. 

message.SerializeToOstream (&stream);
message.ParseFromIstream (&stream);

2.16.3. Java Generated Code Basics

Java Message Manipulation

Construction. 

AnExampleMessage.Builder messageBuilder;
messageBuilder = AnExampleMessage.newBuilder ();
messageBuilder = AnExampleMessage.newBuilder (another_message);
AnExampleMessage message = messageBulder.build ();

Singular Fields. 

System.out.println (message.getSomeInteger ());
messageBuilder.setSomeInteger (1234);
if (message.hasOptionalInteger ()) {
    messageBuilder = message.toBuilder ();
    messageBuilder.clearOptionalInteger ();
}

Repeated Fields. 

int size = messages.getMessagesCount ();
AnExampleMessage message = messages.getMessages (1234);
List<AnExampleMessage> messageList = messages.getMessagesList ();
messagesBuilder.addMessages (messageBuilder);
messagesBuilder.addMessages (message);

Byte Array Serialization. 

byte [] buffer = message.toByteArray ();
try {
    AnExampleMessage message = AnExampleMessage.parseFrom (buffer);
} catch (InvalidProtocolBufferException e) {
    System.out.println (e);
}

Standard Stream Serialization. 

message.writeTo (stream);
AnExampleMessage message = AnExampleMessage.parseFrom (stream);

2.16.4. Python Generated Code Basics

Python Message Manipulation

Construction. 

message = AnExampleMessage ()
message.CopyFrom (another_message)

Singular Fields. 

print (message.some_integer)
message.some_integer = 1234
if message.HasField ('optional_integer'):
    message.ClearField ('optional_integer')

Repeated Fields. 

size = len (messages.messages)
message = messages.messages [1234]
message = messages.messages.add ()

Byte Array Serialization. 

buffer = message.SerializeToString ()
message.ParseFromString (buffer)
message = AnExampleMessage.FromString (buffer)

Standard Stream Serialization. 

file.write (message.SerializeToString ())
message.ParseFromString (file.read ())
AnExampleMessage.FromString (file.read ())

2.16.5. References

  1. The protobuf Project Home Page. https://protobuf.dev