Using Avro to serialize logs in log4j

I have written about serialization mechanism of Protocol Buffers previously. Similarly, Apache Avro provides a better serialization framework. 

It provide features like:

 - Independent Schema -  use different schemas for serialization and de-serialization
 - Binary serialization - compact data encoding, and faster data processing
 - Dynamic typing - serialization and deserialization without code generation

 We can encode data when serializing with Avro: binary or JSON. In the binary file schema is  included at the beginning of file. In JSON, the type is defined along with the data. Switching JSON protocol to a binary format in order to achieve better performance is pretty straightforward with Avro. This means less type information needs to be sent with the data and it stores data with its schema means any program can de-serialize the encoded data, which makes a good candidate for RPC.

 In Avro 1.5 we have to use (this is different from previous versions which had no factory for encoders)
 - out, BinaryEncoder reuse) for binary
 - schema, OutputStream out) for JSON

 The values (Avro supported value types) are put for the schema field name as the key
 in a set of name-value pairs called  GenericData.Record

 Avro supported value types are
  Primitive Types - null, boolean, int, long, float, double, bytes, string
  Complex Types - Records, Enums, Arrays, Maps, Unions, Fixed
  you can read more about them  here

  An encoded schema definition to be provided for the record instance. To read/write data, just use put/get methods
   I have used this serialization mechanism to provide a layout for log4j. The logs will be serialized to avro mechanism.

github project is here -
   Add the libraries to your project and add new properties to

 Provide the MDC keys as comma seperated values
   This is the schema


No comments:

Post a Comment