ETags - Roles in Web Application to Cloud Computing

A web server returns a value in the response header known as ETag (entity tag) helps the client to know if there is any change in content at a given URL which requested.When a page is loaded in the browser, it is cached.It knows the ETag of that page.The browser uses the value of ETag as the value of the header key "If-None-Match".The server reads this http header value and compares with the ETag of the page.If the value are same ie the content is not changed, a status
code 304 is returned ie. 304:Not Modified. These HTTP meta data can be very well used for predicting the page downloads thereby optimizing the bandwidth used.But a combination of a checksum (MD5) of the data as the ETag value and a correct time-stamp of modification could possible give quality result in predicting the re-download. An analysis of the effectiveness of chosing the value of ETag is described in this paper.

According to

A resource is eligible for caching if:

  • There is caching info in HTTP response headers
  • Non secure response (HTTPS wont be cached)
  • ETag or LastModified header is present
  • Fresh cache representation

Entity tags can be strong or weak validators.The strong validator provide the uniqueness of representation.If we use MD5 or SHA1, entity value changes when one bit of data is changed, while a weak value changes whenever the meaning of an entity(which can be a set of semantically related) changes.

More info on conditional requests explaining strong and weak ETags in here

In Spring MVC, Support for ETags is provided by the servlet filter ShallowEtagHeaderFilter. If you see the source here

String responseETag = generateETagHeaderValue(body);
.... ......

protected String generateETagHeaderValue(byte[] bytes) {
StringBuilder builder = new StringBuilder("\"0");
Md5HashUtils.appendHashString(bytes, builder);
return builder.toString();

The default implementation generates an MD5 hash for the JSP body it generated.So whenever the same page is requested, this checks for If-None-Match, a 304 is send back.

String requestETag = request.getHeader(HEADER_IF_NONE_MATCH);
if (responseETag.equals(requestETag)) {
if (logger.isTraceEnabled()) {
logger.trace("ETag [" + responseETag + "] equal to If-None-Match, sending 304");

This reduces the processing and bandwidth usage.Since it is a plain Servlet Filter, and thus can be used in combination any web framework.A MD5 hash assures that the actual etag is only 32 characters long, while ensuring that they are highly unlikely to collide.A deeper level of ETag implementation penetrating to the model layer for the uniqueness is also possible.It could be realted to the revisions of row data. Matching them for higher predicatability of lesser downloads of data will be an effective solution.

As per JSR 286 portlet specification Portlet should set Etag property (validationtoken) and expiration-time when rendering. New render/resource requests will only be called after expiration-time is reached.New request will be sent the Etag. Portlet should examine it and determine if cache is still good if so, set a new expiration-time and do not render.This specification is implemented in Spring MVC.(see JIRA )

A hypothetical model for REST responses using deeper Etags could be effective while an API is exposed or two applications are integrated.I have seen such an implementation using Python here

When cloud computing is considered, for Amazon S3 receives a PUT request with the Content-MD5 header, Amazon S3 computes the MD5 of the object received and returns a 400 error if it doesn't match the MD5 sent in the header.Here Amazon or Azure uses Content-MD5 which is of 7 bytes.

According to the article here in S3 for some reason the entity was updated with the exact same bits that it previously had, the ETag will not have changed, but then, that's probably ok anyway.

According to S3 REST API,

Amazon S3 returns the first ten megabytes of the file, the Etag of the file, and the total size of the file (20232760 bytes) in the Content-Length field.

To ensure the file did not change since the previous portion was downloaded, specify the if-match request header. Although the if-match request header is not required, it is recommended for content that is likely to change.

The ETag directive in the HTTP specification makes available to developers to implement caching, which could be very effective at the transport level for REST services as well as web applications.The trade-off would be, there may be security implications to having data reside on the transport level.

But in the case of static files which is having a large "Expires" value and clustered files, Etag will not be effective because of the unique checksum for files that are distributed will be transported to client for each GET requests.By removing the ETag header, you disable caches and browsers from being able to validate files, so they are forced to rely on your Cache-Control and Expires header.Thus by reducing the header size which was having the checksum value.

No comments:

Post a Comment