I have written about serialization mechanism of Protocol Buffers previously. Similarly, Apache Avro provides a better serialization framework.
It provide features like:
- Independent Schema - use different schemas for serialization and de-serialization
- Binary serialization - compact data encoding, and faster data processing
- Dynamic typing - serialization and deserialization without code generation
We can encode data when serializing with Avro: binary or JSON. In the binary file schema is included at the beginning of file. In JSON, the type is defined along with the data. Switching JSON protocol to a binary format in order to achieve better performance is pretty straightforward with Avro. This means less type information needs to be sent with the data and it stores data with its schema means any program can de-serialize the encoded data, which makes a good candidate for RPC.
In Avro 1.5 we have to use (this is different from previous versions which had no factory for encoders)
- org.apache.avro.io.EncoderFactory.binaryEncoder(OutputStream out, BinaryEncoder reuse) for binary
- org.apache.avro.io.EncoderFactory.jsonEncoder(Schema schema, OutputStream out) for JSON
The values (Avro supported value types) are put for the schema field name as the key
in a set of name-value pairs called GenericData.Record
Avro supported value types are
Primitive Types - null, boolean, int, long, float, double, bytes, string
Complex Types - Records, Enums, Arrays, Maps, Unions, Fixed
you can read more about them here
An encoded schema definition to be provided for the record instance. To read/write data, just use put/get methods
I have used this serialization mechanism to provide a layout for log4j. The logs will be serialized to avro mechanism.
github project is here - https://github.com/harisgx/avro-log4j
Add the libraries to your project and add new properties to log4j.properties
log4j.appender.logger_name.layout=com.avrolog.log4j.layout.AvroLogLayout
log4j.appender.logger_name.layout.Type=json
log4j.appender.logger_name.layout.MDCKeys=mdcKey
Provide the MDC keys as comma seperated values
This is the schema
No comments:
Post a Comment