2.16. Protocol Buffers (protobuf)

Protocol Buffers is a framework that generates serialization code for messages described in platform independent message description language. Individual language bindings are provided for multiple languages including C++, Java, Python, JavaScript.

Protocol Buffers releases new features in editions. Each edition has a default feature configuration, explicit option specification can override feature configuration. Major features include the treatment of field presence, encoding of repeated primitive fields, and encoding of nested messages. Earlier, features were associated with versions of the message description language.

2.16.1. Message Description Language

The message description language defines each message as a set of fields. Each field has a type, a name, and a key, which identifies the field inside serialized messages. Standard fields are present in a message at most once, with explicit presence tracking performed by default. Repeated fields can be present an arbitrary number of times.

Protocol Buffers Message Specification Example

edition = "2023";

// File level options supported.
option optimize_for = SPEED;

message SomeMessage {

    // Field identifiers reserved after message changes.
    reserved 8, 100;

    // Many integer types with specific encodings.
    int32 aMostlyPositiveInteger = 1;
    sint64 aSignedInteger = 2;
    uint64 anUnsignedInteger = 3;
    fixed32 anOftenBigUnsignedInteger = 4;
    sfixed32 anOftenBigSignedInteger = 5;

    // String always with UTF 8 encoding.
    string aString = 10;

    // Another message type.
    AnotherMessage aMessage = 111;

    // Variable length content supported.
    repeated string aStringList = 333;
    map <int32, string> aMap = 444;

    // Field level options supported.
    int32 aDeprecatedInteger = 666 [deprecated = true];
    float aFloatWithoutPresenceTracking = 222 [features.field_presence = IMPLICIT];

    // Extension field range.
    extensions 1234 to 5678;
}

extend SomeMessage {
    // Extension field in extension field range.
    int32 anExtensionField = 1234;
}
  • A spectrum of basic types

  • Packages and nested types

  • Fields can be repeated

  • Fields have presence tracked unless disabled

  • Explicit field identifiers for versioning

  • Options tune code generation

  • Extensions reserve fields

Historically, the field presence has three choices. The EXPLICIT presence tracking serializes fields whose value was set even if it is the default value. The IMPLICIT presence tracking serializes fields whose value is not the default value. The LEGACY_REQUIRED presence tracking always serializes fields, but is considered problematic for evolving specifications.

Protocol Buffers Primitive Field Types

Integer Types. 

(s)fixed(32|64)

Integers with fixed length encoding

(u)int(32|64)

Integers with variable length encoding

sint(32|64)

Integers with sign optimized variable length encoding

Floating Poing Types. 

float

IEEE 754 32 bit float

double

IEEE 754 64 bit float

Additional Primitive Types. 

bool

Boolean

bytes

Arbitrary sequence of bytes

string

Arbitrary sequence of UTF-8 characters

Protocol Buffers More Field Types

Oneof Type. 

message AnExampleMessage {
    oneof some_oneof_field {
        int32 some_integer = 1;
        string some_string = 2;
    }
}
  • Assigning one field clears others

Enum Type. 

enum AnEnum {
    INITIAL = 0;
    RED = 1;
    BLUE = 2;
    GREEN = 3;
    WHATEVER = 8;
}
  • Must include zero

Any Type. 

import "google/protobuf/any.proto";
message AnExampleMessage {
    repeated google.protobuf.Any whatever = 8;
}
  • Internally a type identifier and a value

  • Type identifier is URI string

  • Value is byte buffer

Map Type. 

message AnExampleMessage {
    map<int32, string> keywords = 8;
}

See https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/descriptor.proto for the full list of available options that tune code generation. Options can be associated with a file, a type, or a field.

2.16.2. C++ Generated Code Basics

C++ Message Manipulation

Construction. 

AnExampleMessage message;
AnExampleMessage message (another_message);
message.CopyFrom (another_message);

Singular Fields. 

cout << message.some_integer ();
message.set_some_integer (1234);
if (message.has_optional_integer ()) {
    message.clear_optional_integer ();
}

Repeated Fields. 

int size = messages.messages_size ();
const AnExampleMessage &message = messages.messages (1234);
AnExampleMessage *message = messages.mutable_messages (1234);
AnExampleMessage *message = messages.add_messages ();

Byte Array Serialization. 

char buffer [BUFFER_SIZE];
message.SerializeToArray (buffer, sizeof (buffer));
message.ParseFromArray (buffer, sizeof (buffer));

Standard Stream Serialization. 

message.SerializeToOstream (&stream);
message.ParseFromIstream (&stream);

2.16.3. Java Generated Code Basics

Java Message Manipulation

Construction. 

AnExampleMessage.Builder messageBuilder;
messageBuilder = AnExampleMessage.newBuilder ();
messageBuilder = AnExampleMessage.newBuilder (another_message);
AnExampleMessage message = messageBulder.build ();

Singular Fields. 

System.out.println (message.getSomeInteger ());
messageBuilder.setSomeInteger (1234);
if (message.hasOptionalInteger ()) {
    messageBuilder = message.toBuilder ();
    messageBuilder.clearOptionalInteger ();
}

Repeated Fields. 

int size = messages.getMessagesCount ();
AnExampleMessage message = messages.getMessages (1234);
List<AnExampleMessage> messageList = messages.getMessagesList ();
messagesBuilder.addMessages (messageBuilder);
messagesBuilder.addMessages (message);

Byte Array Serialization. 

byte [] buffer = message.toByteArray ();
try {
    AnExampleMessage message = AnExampleMessage.parseFrom (buffer);
} catch (InvalidProtocolBufferException e) {
    System.out.println (e);
}

Standard Stream Serialization. 

message.writeTo (stream);
AnExampleMessage message = AnExampleMessage.parseFrom (stream);

2.16.4. Python Generated Code Basics

Python Message Manipulation

Construction. 

message = AnExampleMessage ()
message.CopyFrom (another_message)

Singular Fields. 

print (message.some_integer)
message.some_integer = 1234
if message.HasField ('optional_integer'):
    message.ClearField ('optional_integer')

Repeated Fields. 

size = len (messages.messages)
message = messages.messages [1234]
message = messages.messages.add ()

Byte Array Serialization. 

buffer = message.SerializeToString ()
message.ParseFromString (buffer)
message = AnExampleMessage.FromString (buffer)

Standard Stream Serialization. 

file.write (message.SerializeToString ())
message.ParseFromString (file.read ())
AnExampleMessage.FromString (file.read ())

2.16.5. Encoding

Protocol Buffers Playground

See https://www.protobufpal.com for a Protocol Buffers playground that can convert between textual and binary representations. Apart from experimenting with basic types of various sizes, these are some other tips to try:

  • see how the same binary representation decodes to different values depending on type

  • see how legacy array encoding and packed array encoding differ

  • see how the Any type carries type name

2.16.6. References

  1. The protobuf Project Home Page. https://protobuf.dev