Protocol Buffers is a framework that generates serialization code for messages described in platform independent message description language. Individual language bindings are provided for multiple languages including C++, Java, Python, JavaScript.
Protocol Buffers releases new features in editions. Each edition has a default feature configuration, explicit option specification can override feature configuration. Major features include the treatment of field presence, encoding of repeated primitive fields, and encoding of nested messages. Earlier, features were associated with versions of the message description language.
The message description language defines each message as a set of fields. Each field has a type, a name, and a key, which identifies the field inside serialized messages. Standard fields are present in a message at most once, with explicit presence tracking performed by default. Repeated fields can be present an arbitrary number of times.
edition = "2023"; // File level options supported. option optimize_for = SPEED; message SomeMessage { // Field identifiers reserved after message changes. reserved 8, 100; // Many integer types with specific encodings. int32 aMostlyPositiveInteger = 1; sint64 aSignedInteger = 2; uint64 anUnsignedInteger = 3; fixed32 anOftenBigUnsignedInteger = 4; sfixed32 anOftenBigSignedInteger = 5; // String always with UTF 8 encoding. string aString = 10; // Another message type. AnotherMessage aMessage = 111; // Variable length content supported. repeated string aStringList = 333; map <int32, string> aMap = 444; // Field level options supported. int32 aDeprecatedInteger = 666 [deprecated = true]; float aFloatWithoutPresenceTracking = 222 [features.field_presence = IMPLICIT]; // Extension field range. extensions 1234 to 5678; } extend SomeMessage { // Extension field in extension field range. int32 anExtensionField = 1234; }
A spectrum of basic types
Packages and nested types
Fields can be repeated
Fields have presence tracked unless disabled
Explicit field identifiers for versioning
Options tune code generation
Extensions reserve fields
Historically, the field presence has three choices. The EXPLICIT presence tracking serializes fields whose value was set even if it is the default value. The IMPLICIT presence tracking serializes fields whose value is not the default value. The LEGACY_REQUIRED presence tracking always serializes fields, but is considered problematic for evolving specifications.
Integer Types.
Integers with fixed length encoding
Integers with variable length encoding
Integers with sign optimized variable length encoding
Floating Poing Types.
IEEE 754 32 bit float
IEEE 754 64 bit float
Additional Primitive Types.
Boolean
Arbitrary sequence of bytes
Arbitrary sequence of UTF-8 characters
Oneof Type.
message AnExampleMessage { oneof some_oneof_field { int32 some_integer = 1; string some_string = 2; } }
Assigning one field clears others
Enum Type.
enum AnEnum { INITIAL = 0; RED = 1; BLUE = 2; GREEN = 3; WHATEVER = 8; }
Must include zero
Any Type.
import "google/protobuf/any.proto"; message AnExampleMessage { repeated google.protobuf.Any whatever = 8; }
Internally a type identifier and a value
Type identifier is URI string
Value is byte buffer
Map Type.
message AnExampleMessage { map<int32, string> keywords = 8; }
See https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/descriptor.proto for the full list of available options that tune code generation. Options can be associated with a file, a type, or a field.
Construction.
AnExampleMessage message; AnExampleMessage message (another_message); message.CopyFrom (another_message);
Singular Fields.
cout << message.some_integer (); message.set_some_integer (1234); if (message.has_optional_integer ()) { message.clear_optional_integer (); }
Repeated Fields.
int size = messages.messages_size (); const AnExampleMessage &message = messages.messages (1234); AnExampleMessage *message = messages.mutable_messages (1234); AnExampleMessage *message = messages.add_messages ();
Byte Array Serialization.
char buffer [BUFFER_SIZE]; message.SerializeToArray (buffer, sizeof (buffer)); message.ParseFromArray (buffer, sizeof (buffer));
Standard Stream Serialization.
message.SerializeToOstream (&stream); message.ParseFromIstream (&stream);
Construction.
AnExampleMessage.Builder messageBuilder; messageBuilder = AnExampleMessage.newBuilder (); messageBuilder = AnExampleMessage.newBuilder (another_message); AnExampleMessage message = messageBulder.build ();
Singular Fields.
System.out.println (message.getSomeInteger ()); messageBuilder.setSomeInteger (1234); if (message.hasOptionalInteger ()) { messageBuilder = message.toBuilder (); messageBuilder.clearOptionalInteger (); }
Repeated Fields.
int size = messages.getMessagesCount (); AnExampleMessage message = messages.getMessages (1234); List<AnExampleMessage> messageList = messages.getMessagesList (); messagesBuilder.addMessages (messageBuilder); messagesBuilder.addMessages (message);
Byte Array Serialization.
byte [] buffer = message.toByteArray (); try { AnExampleMessage message = AnExampleMessage.parseFrom (buffer); } catch (InvalidProtocolBufferException e) { System.out.println (e); }
Standard Stream Serialization.
message.writeTo (stream); AnExampleMessage message = AnExampleMessage.parseFrom (stream);
Construction.
message = AnExampleMessage () message.CopyFrom (another_message)
Singular Fields.
print (message.some_integer) message.some_integer = 1234 if message.HasField ('optional_integer'): message.ClearField ('optional_integer')
Repeated Fields.
size = len (messages.messages) message = messages.messages [1234] message = messages.messages.add ()
Byte Array Serialization.
buffer = message.SerializeToString () message.ParseFromString (buffer) message = AnExampleMessage.FromString (buffer)
Standard Stream Serialization.
file.write (message.SerializeToString ()) message.ParseFromString (file.read ()) AnExampleMessage.FromString (file.read ())
See https://www.protobufpal.com for a Protocol Buffers playground that can convert between textual and binary representations. Apart from experimenting with basic types of various sizes, these are some other tips to try:
see how the same binary representation decodes to different values depending on type
see how legacy array encoding and packed array encoding differ
see how the Any type carries type name
The protobuf Project Home Page. https://protobuf.dev