Data Instance.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <anExampleDataClass> <anIntField>123</anIntField> <aFloatField>12.34</aFloatField> <aDoubleField>1.234E57</aDoubleField> <aBoxedIntField>987</aBoxedIntField> <aRequiredStringField>a string</aRequiredStringField> <anArrayWithoutAWrapper>1</anArrayWithoutAWrapper> <anArrayWithoutAWrapper>2</anArrayWithoutAWrapper> <anArrayWithoutAWrapper>3</anArrayWithoutAWrapper> <anArrayWithAWrapper> <anArrayElement>12</anArrayElement> <anArrayElement>34</anArrayElement> <anArrayElement>56</anArrayElement> </anArrayWithAWrapper> <aListElement> <anIntField>0</anIntField> <aFloatField>0.0</aFloatField> <aDoubleField>0.0</aDoubleField> </aListElement> <aSetElement> <anIntField>0</anIntField> <aFloatField>0.0</aFloatField> <aDoubleField>0.0</aDoubleField> </aSetElement> <aMapElement> <entry> <key>456</key> <value> <anIntField>0</anIntField> <aFloatField>0.0</aFloatField> <aDoubleField>0.0</aDoubleField> </value> </entry> <entry> <key>123</key> <value> <anIntField>0</anIntField> <aFloatField>0.0</aFloatField> <aDoubleField>0.0</aDoubleField> </value> </entry> </aMapElement> </anExampleDataClass>
Possible Schema.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0"> <xs:element name="anExampleDataClass" type="anExampleDataClass"/> <xs:complexType name="anExampleDataClass"> <xs:annotation> <xs:documentation> An example class. Contains various field types to illustrate the mapping. </xs:documentation> </xs:annotation> <xs:all> <xs:element name="anIntField" type="xs:int"/> <xs:element name="aFloatField" type="xs:float"/> <xs:element name="aDoubleField" type="xs:double"/> <xs:element minOccurs="0" name="aBoxedIntField" type="xs:int"/> <xs:element name="aRequiredStringField" type="xs:string"/> <xs:element minOccurs="0" name="anOptionalStringField" type="xs:string"/> <xs:element default="default" minOccurs="0" name="aStringFieldWithDefaultValue" type="xs:string"/> <xs:element maxOccurs="unbounded" minOccurs="0" name="anArrayWithoutAWrapper" type="xs:int"/> <xs:element minOccurs="0" name="anArrayWithAWrapper"> <xs:complexType> <xs:sequence> <xs:element maxOccurs="unbounded" minOccurs="0" name="anArrayElement" type="xs:int"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element maxOccurs="unbounded" minOccurs="0" name="aListElement" type="anExampleDataClass"/> <xs:element maxOccurs="unbounded" minOccurs="0" name="aSetElement" type="anExampleDataClass"/> <xs:element name="aMapElement"> <xs:complexType> <xs:sequence> <xs:element maxOccurs="unbounded" minOccurs="0" name="entry"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" name="key" type="xs:int"/> <xs:element minOccurs="0" name="value" type="anExampleDataClass"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:all> </xs:complexType> </xs:schema>
Eclipse: JAXB Reference Implementation. https://github.com/eclipse-ee4j/jaxb-ri
Data Instance.
{ "an_int_field" : 123, "a_float_field" : 12.34, "a_string_field" : "a string", "an_array" : [1, 2, 3] }
Data Instance.
an_int_field: 123 a_float_field: 12.34 a_string_field: a string an_array: - 1 - 2 - 3 a_mapping_field: &some_name a_nested_field: a string a_reference: *some_name ...
Anchors and aliases in YAML are considered a serialization detail and are not generarally preserved in the representation graph. An anchor can appear at any position before the node content, an alias appears instead of the node content. The alias referes to the last anchor of that name, names can be reused.
To represent native types, YAML relies on the use of tags. YAML distinguishes local and global tags, local tags are simply application specific strings starting with an exclamation mark that can be attached to any node. For example, the following demonstrates the use of tags to distinguish Python tuples from Python lists, which would otherwise both end as the same array:
> import yaml > print (yaml.dump ((1, 2, 3))) !!python/tuple - 1 - 2 - 3 > print (yaml.dump ([1, 2, 3])) - 1 - 2 - 3
Only trusted code should be allowed to use all serialization tags:
> import yaml > yaml.unsafe_load ('!!python/object/apply:os.system ["echo Hello from shell !"]') Hello from shell ! 0
See the list of serialization tags in the module documentation.
The !!python/object/apply:module.function
tag
expands into the result of calling the module function.
General object serialization mechanisms such as Java serialization or Python pickling should only be used with trusted content. To handle arbitrary object types, such mechanisms can sometimes execute user code as a part of the serialization process, and such code can sometimes be tricked to execute arbitrary commands.
See https://github.com/frohoff/ysoserial for multiple examples of arbitrary code execution through Java serialization.
The examples rely on Java serialization invoking readObject
on the user type.
In one of the examples, this type is AnnotationInvocationHandler
, whose
readObject
method iterates over a collection making up its state.
In turn, this collection can be an instance of LazyMap
, an
Apache Commons Collections class that invokes item factory when iterated upon.
This factory can be another Apache Commons Collections class,
an InvokerTransformer
, which can
invoke arbitrary methods, such as the runtime
exec
method.
See https://github.com/frohoff/ysoserial/blob/master/src/main/java/ysoserial/payloads/CommonsCollections1.java for this particular payload example.
Colm O'Connor: The Norway Problem. https://hitchdev.com/strictyaml/why/implicit-typing-removed
Chris Frohoff: The ysoserial Project Repository. https://github.com/frohoff/ysoserial
Moritz Bechler: The marshalsec Project Repository. https://github.com/mbechler/marshalsec
The CBOR format stores basic types, arrays of basic types, and maps of basic types. Basic types are null, booleans, integers, floats, byte and text strings. An item can be wrapped in a tag that specifies additional information, which can identify date and time, big num, URI and so on. References are not supported.
The CBOR data stream is a sequence of items. Each item starts with a single byte header that carries the item type (3 bits) and additional argument value (5 bits). The rest of the item data depends on the type and the value.
Integer Data Items.
00h ~ 000-00000b ~ positive integer type (0) value 0 01h ~ 000-00001b ~ positive integer type (0) value 1 17h ~ 000-10111b ~ positive integer type (0) value 23 18h ~ 000-11000b ~ positive integer type (0) value in next byte (24) 18h ~ value 24 (18h) 18h ~ 000-11000b ~ positive integer type (0) value in next byte (24) 19h ~ value 25 (19h) 19h ~ 000-11001b ~ positive integer type (0) value in next two bytes (25) 01h 00h ~ value 256 (network order) 1Ah ~ 000-11010b ~ positive integer type (0) value in next four bytes (26) 00h 01h 00h 00h ~ value 65536 (network order) 20h ~ 001-00000b ~ negative integer type (1) value -1 21h ~ 001-00001b ~ negative integer type (1) value -2 38h ~ 001-11000b ~ negative integer type (1) value in next byte (24) FFh ~ value -256
See https://cbor.me for a CBOR playground that can convert between textual and binary representations. Apart from experimenting with basic types of various sizes, these are some other values with interesting serialization:
0("2020-01-01 00:00:00")
for a string that contains date and time
18446744073709551616
for the first integer big enough to use the bignum encoding
4([-1, 1])
for value 0.1
encoded as decimal fraction
5([-1, 1])
for value 1/2
encoded as binary fraction
[_ [1, 2], [3, 4, 5]]
for an indefinite length array
Carsten Bormann: Concise Binary Object Representation Website. https://cbor.io
IETF: Concise Binary Object Representation (CBOR) (RFC 8949). https://tools.ietf.org/html/rfc8949
IETF: Concise Data Definition Language (CDDL) (RFC 8610). https://tools.ietf.org/html/rfc8610