#bytescrolls: Using Avro to serialize logs in log4j

I have written about serialization mechanism of Protocol Buffers previously. Similarly, Apache Avro provides a better serialization framework.

It provide features like:

- Independent Schema - use different schemas for serialization and de-serialization
- Binary serialization - compact data encoding, and faster data processing
- Dynamic typing - serialization and deserialization without code generation

We can encode data when serializing with Avro: binary or JSON. In the binary file schema is included at the beginning of file. In JSON, the type is defined along with the data. Switching JSON protocol to a binary format in order to achieve better performance is pretty straightforward with Avro. This means less type information needs to be sent with the data and it stores data with its schema means any program can de-serialize the encoded data, which makes a good candidate for RPC.

In Avro 1.5 we have to use (this is different from previous versions which had no factory for encoders)
- org.apache.avro.io.EncoderFactory.binaryEncoder(OutputStream out, BinaryEncoder reuse) for binary
- org.apache.avro.io.EncoderFactory.jsonEncoder(Schema schema, OutputStream out) for JSON

The values (Avro supported value types) are put for the schema field name as the key
in a set of name-value pairs called GenericData.Record

Avro supported value types are
Primitive Types - null, boolean, int, long, float, double, bytes, string
Complex Types - Records, Enums, Arrays, Maps, Unions, Fixed

you can read more about them here

An encoded schema definition to be provided for the record instance. To read/write data, just use put/get methods

I have used this serialization mechanism to provide a layout for log4j. The logs will be serialized to avro mechanism.

github project is here - https://github.com/harisgx/avro-log4j

Add the libraries to your project and add new properties to log4j.properties

log4j.appender.logger_name.layout=com.avrolog.log4j.layout.AvroLogLayout
log4j.appender.logger_name.layout.Type=json
log4j.appender.logger_name.layout.MDCKeys=mdcKey

Provide the MDC keys as comma seperated values

This is the schema

#bytescrolls

Pages

Using Avro to serialize logs in log4j

No comments:

Post a Comment

Popular Posts