1.2. Serialization

1.2.1. Textual

XML Object Serialization Example

Data Instance. 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<anExampleDataClass>
    <anIntField>123</anIntField>
    <aFloatField>12.34</aFloatField>
    <aDoubleField>1.234E57</aDoubleField>
    <aBoxedIntField>987</aBoxedIntField>

    <aRequiredStringField>a string</aRequiredStringField>
    <anArrayWithoutAWrapper>1</anArrayWithoutAWrapper>
    <anArrayWithoutAWrapper>2</anArrayWithoutAWrapper>
    <anArrayWithoutAWrapper>3</anArrayWithoutAWrapper>

    <anArrayWithAWrapper>
        <anArrayElement>12</anArrayElement>
        <anArrayElement>34</anArrayElement>
        <anArrayElement>56</anArrayElement>
    </anArrayWithAWrapper>

    <aListElement>
        <anIntField>0</anIntField>
        <aFloatField>0.0</aFloatField>
        <aDoubleField>0.0</aDoubleField>
    </aListElement>

    <aSetElement>
        <anIntField>0</anIntField>
        <aFloatField>0.0</aFloatField>
        <aDoubleField>0.0</aDoubleField>
    </aSetElement>

    <aMapElement>
        <entry>
            <key>456</key>
            <value>
                <anIntField>0</anIntField>
                <aFloatField>0.0</aFloatField>
                <aDoubleField>0.0</aDoubleField>
            </value>
        </entry>
        <entry>
            <key>123</key>
            <value>
                <anIntField>0</anIntField>
                <aFloatField>0.0</aFloatField>
                <aDoubleField>0.0</aDoubleField>
            </value>
        </entry>
    </aMapElement>
</anExampleDataClass>

Possible Schema. 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0">

    <xs:element name="anExampleDataClass" type="anExampleDataClass"/>

    <xs:complexType name="anExampleDataClass">
        <xs:annotation>
            <xs:documentation>
                An example class.
                Contains various field types to illustrate the mapping.
            </xs:documentation>
        </xs:annotation>

        <xs:all>
            <xs:element name="anIntField" type="xs:int"/>
            <xs:element name="aFloatField" type="xs:float"/>
            <xs:element name="aDoubleField" type="xs:double"/>

            <xs:element minOccurs="0" name="aBoxedIntField" type="xs:int"/>
            <xs:element name="aRequiredStringField" type="xs:string"/>
            <xs:element minOccurs="0" name="anOptionalStringField" type="xs:string"/>
            <xs:element default="default" minOccurs="0" name="aStringFieldWithDefaultValue" type="xs:string"/>

            <xs:element maxOccurs="unbounded" minOccurs="0" name="anArrayWithoutAWrapper" type="xs:int"/>
            <xs:element minOccurs="0" name="anArrayWithAWrapper">
                <xs:complexType>
                    <xs:sequence>
                        <xs:element maxOccurs="unbounded" minOccurs="0" name="anArrayElement" type="xs:int"/>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>

            <xs:element maxOccurs="unbounded" minOccurs="0" name="aListElement" type="anExampleDataClass"/>
            <xs:element maxOccurs="unbounded" minOccurs="0" name="aSetElement" type="anExampleDataClass"/>

            <xs:element name="aMapElement">
                <xs:complexType>
                    <xs:sequence>
                        <xs:element maxOccurs="unbounded" minOccurs="0" name="entry">
                            <xs:complexType>
                                <xs:sequence>
                                    <xs:element minOccurs="0" name="key" type="xs:int"/>
                                    <xs:element minOccurs="0" name="value" type="anExampleDataClass"/>
                                </xs:sequence>
                            </xs:complexType>
                        </xs:element>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
        </xs:all>
    </xs:complexType>
</xs:schema>

1.2.1.1. References

  1. Eclipse: JAXB Reference Implementation. https://github.com/eclipse-ee4j/jaxb-ri

JSON Object Serialization Example

Data Instance. 

{
    "an_int_field" : 123,
    "a_float_field" : 12.34,
    "a_string_field" : "a string",
    "an_array" : [1, 2, 3]
}

YAML Object Serialization Example

Data Instance. 

an_int_field: 123
a_float_field: 12.34
a_string_field: a string
an_array:
- 1
- 2
- 3
a_mapping_field:
    &some_name a_nested_field: a string
    a_reference: *some_name
...

Anchors and Aliases in YAML

Anchors and aliases in YAML are considered a serialization detail and are not generarally preserved in the representation graph. An anchor can appear at any position before the node content, an alias appears instead of the node content. The alias referes to the last anchor of that name, names can be reused.

Native Object Serialization with YAML Tags

To represent native types, YAML relies on the use of tags. YAML distinguishes local and global tags, local tags are simply application specific strings starting with an exclamation mark that can be attached to any node. For example, the following demonstrates the use of tags to distinguish Python tuples from Python lists, which would otherwise both end as the same array:

> import yaml
> print (yaml.dump ((1, 2, 3)))
!!python/tuple
- 1
- 2
- 3
> print (yaml.dump ([1, 2, 3]))
- 1
- 2
- 3

Only trusted code should be allowed to use all serialization tags:

> import yaml
> yaml.unsafe_load ('!!python/object/apply:os.system ["echo Hello from shell !"]')
Hello from shell !
0

See the list of serialization tags in the module documentation. The !!python/object/apply:module.function tag expands into the result of calling the module function.

Serialization Security Issues

General object serialization mechanisms such as Java serialization or Python pickling should only be used with trusted content. To handle arbitrary object types, such mechanisms can sometimes execute user code as a part of the serialization process, and such code can sometimes be tricked to execute arbitrary commands.

See https://github.com/frohoff/ysoserial for multiple examples of arbitrary code execution through Java serialization. The examples rely on Java serialization invoking readObject on the user type. In one of the examples, this type is AnnotationInvocationHandler, whose readObject method iterates over a collection making up its state. In turn, this collection can be an instance of LazyMap, an Apache Commons Collections class that invokes item factory when iterated upon. This factory can be another Apache Commons Collections class, an InvokerTransformer, which can invoke arbitrary methods, such as the runtime exec method. See https://github.com/frohoff/ysoserial/blob/master/src/main/java/ysoserial/payloads/CommonsCollections1.java for this particular payload example.

1.2.1.2. References

  1. Colm O'Connor: The Norway Problem. https://hitchdev.com/strictyaml/why/implicit-typing-removed

  2. Chris Frohoff: The ysoserial Project Repository. https://github.com/frohoff/ysoserial

  3. Moritz Bechler: The marshalsec Project Repository. https://github.com/mbechler/marshalsec

1.2.2. Binary

1.2.2.1. Concise Binary Object Representation (CBOR)

The CBOR format stores basic types, arrays of basic types, and maps of basic types. Basic types are null, booleans, integers, floats, byte and text strings. An item can be wrapped in a tag that specifies additional information, which can identify date and time, big num, URI and so on. References are not supported.

The CBOR data stream is a sequence of items. Each item starts with a single byte header that carries the item type (3 bits) and additional argument value (5 bits). The rest of the item data depends on the type and the value.

CBOR Serialization Examples

Integer Data Items. 

00h ~ 000-00000b ~ positive integer type (0) value 0
01h ~ 000-00001b ~ positive integer type (0) value 1
17h ~ 000-10111b ~ positive integer type (0) value 23

18h ~ 000-11000b ~ positive integer type (0) value in next byte (24)
18h              ~ value 24 (18h)
18h ~ 000-11000b ~ positive integer type (0) value in next byte (24)
19h              ~ value 25 (19h)

19h ~ 000-11001b ~ positive integer type (0) value in next two bytes (25)
01h 00h          ~ value 256 (network order)

1Ah ~ 000-11010b ~ positive integer type (0) value in next four bytes (26)
00h 01h 00h 00h  ~ value 65536 (network order)

20h ~ 001-00000b ~ negative integer type (1) value -1
21h ~ 001-00001b ~ negative integer type (1) value -2

38h ~ 001-11000b ~ negative integer type (1) value in next byte (24)
FFh              ~ value -256

CBOR Playground

See https://cbor.me for a CBOR playground that can convert between textual and binary representations. Apart from experimenting with basic types of various sizes, these are some other values with interesting serialization:

  • 0("2020-01-01T00:00Z") for a string that contains date and time

  • 18446744073709551616 for the first integer big enough to use the bignum encoding

  • 4([-1, 1]) for value 0.1 encoded as decimal fraction

  • 5([-1, 1]) for value 1/2 encoded as binary fraction

  • [_ [1, 2], [3, 4, 5]] for an indefinite length array

1.2.2.1.1. References
  1. Carsten Bormann: Concise Binary Object Representation Website. https://cbor.io

  2. IETF: Concise Binary Object Representation (CBOR) (RFC 8949). https://tools.ietf.org/html/rfc8949

  3. IETF: Concise Data Definition Language (CDDL) (RFC 8610). https://tools.ietf.org/html/rfc8610