Consistent Hashing
A consistent hash function is one which changes minimally as the range of function changes.
What's the advantage of such functions?
This is ideal when set of buckets change over time. Two users with inconsistent but overlapping sets of buckets will map items to the same bucket with high probability. So this eliminates the need of "maintaining" a consistent "state" among all nodes in a network. The algorithm can be used for making consistent assignments or relationships between different sets of data in such a way that if we add or remove items, the algorithm can be recalculated on any machine and produce the same results.
Theory
A view V is a set of buckets where user is aware. A client uses a consistent hash function, f(V,i), maps an object to one of the buckets in the view. Say, assign each of hash buckets to random points on mod 2^n circle (virtually!) where hash key size = n. The hash of object= closest clockwise bucket. These small sets of buckets lie near the object. In this case, all the buckets get roughly same number of items. When kth bucket is added only a 1/k fraction of items move. This means when new node is added only minimum reshuffle is needed, which is the advantage of having a view. There can be a hash structure for the key lookup (a balanced tree) which stores the hash of all nodes (in the view). When a new node is added its hash value is added to the table.
Suppose there are two nodes A and B three objects 1–3 (mapped to a hash-function’s result range). The objects 3 and 1 are mapped to node A, object 2 to node B. When a node leaves the system, data will get mapped to their adjacent node (in clockwise direction) and when a node enters the system it will get hashed onto the ring and will overtake objects.
As an example, (refer link1, link2), the circle denotes a range of key values. Say, the points in circle represents 64 bit numbers. Hash the data to get the 64 bit number, which is a point in the circle. Take the IPs of nodes and hash them into 64 bit number and point in the circle. Associate the data to the nodes in the clockwise direction (ie. closest, which can be retrieved from the node in the hash structure). When a new node is inserted into the hash tree, the data will always be assigned to the closest one only. Everything between this number and one that's next in the ring and that has been picked by a different node previously, is now belong to this node.
The basic idea of consistent hash function is to hash both objects and buckets using the same function. It's one of the best ways to implement APIs that can dynamically scale out and rebalanced. The client applications can calculate which node to contact in order to request or write the data with no metadata server required.
Used by
memcached cluster.
Typically, multiple memcached daemons are started, on different hosts. The clients are passed a list of memcached addresses (IP address and port) and pick one daemon for a given key. This is done via consistent hashing, which always maps the same key K to the same memcached server S. When a server crashes, or a new server is added, consistent hashing makes sure that the ensuing rehashing is minimal. Which means that most keys still map to the same servers, but keys hashing to a removed server are rehashed to a new server. - from A memcached implementation in JGroups
Amazon's Dynamo uses consistent hashing along with replication as a partitioning scheme.
Data is partitioned and replicated using consistent hashing [10], and consistency is facilitated by object versioning [12]. The consistency among replicas during updates is maintained by a quorum-like technique and a decentralized replica synchronization protocol. - from Dynamo: Amazon's Highly Available Key-value Store
Data of a Cassandra table gets partitioned and distributed among the nodes by a consistent hashing function.
Cassandra partitions data across the cluster using consistent hashing [11] but uses an order preserving hash function to do so. In consistent hashing the output range of a hash function is treated as a circular space or "ring" (i.e. the largest hash value wraps around to the smallest hash value). Each node in the system is as-signed a random value within this space which represents its position on the ring. Each data item identified by a key is assigned to a node by hashing the data item's key to yield its position on the ring, and then walking the ring clockwise to fi nd the first node with a position larger than the item's position. This node is deemed the coordinator for this key. The application specifi es this key and the Cassandra uses it to route requests. Thus, each node becomes responsible for the region in the ring between it and its predecessor node on the ring. The principal advantage of consistent hashing is that departure or arrival of a node only aff ects its immediate neighbors and other nodes remain una ffected. - from Cassandra - A Decentralized Structured Storage SystemVoldemort automatic sharding of data. Nodes can be added or removed from a database cluster, and the system adapts automatically. Voldemort automatically detects and recovers failed nodes. [refer]
References:
http://www.akamai.com/dl/technical_publications/ConsistenHashingandRandomTreesDistributedCachingprotocolsforrelievingHotSpotsontheworldwideweb.pdf
http://sharplearningcurve.com/blog/2010/09/27/consistent-hashing/
http://weblogs.java.net/blog/tomwhite/archive/2007/11/consistent_hash.html
Bloom Filters
The data-structure can be used to provide membership queries ie. checkIfDataPresentInStore() If it is to check an element is already inserted in the filter then it will return true, there are no false negatives. But there can be chance if the element not inserted may return true. But the check for that element can be done in the original store ie. the overhead is associated with the rate of false positives. This is different from dictionary in which the hit/miss is deterministic.
For a set of n elements, a bloom filter can be a vector of size m.Initially, all bits are set to 0. For each element e, k hash functions will set k bits in the bit vector to 1. When a query for membership executed, it will check for the bit positions for the set value. If matches all, the queried element is possibly present in the store else, it is sure not present.Each hash function returns the index to set. This means we have to store these m bits per key. So a total of m * N bits of space required. The use of different hash functions results less collision.
Uses
- Design a spell checker.
- Database join implementation (Oracle)
- Peer to peer (P2P) communication and routing
- In HBase, the Bloom filter is stored as meta block in the HFile. When a HFile is opened, the bloom filter is loaded into memory and used to determine if a given key is in that store file. This can avoid the scanning region for the key.
- and more
I found a java implementation here
Cassandra's java implementation here
Reference
http://en.wikipedia.org/wiki/Bloom_filter
https://issues.apache.org/jira/browse/HBASE-1200
http://wiki.squid-cache.org/SquidFaq/CacheDigests
http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf
Accessing data from storage system using XAM API
XAM stands for "eXtensible Access Method". The XAM API will allow application developers to store content on a class of storage systems known as “fixed-content” storage systems.
This is a basic vendor specific architecture. They provides the SDK/API to acces the data. So the industry came up with standard in which XAM API provides the capability to access these systems independently.

// connectString xri = “snia-xam://" XSystem sys = s_xam.connect(xri);//open the XSetXSet xset = sys.openXSet( xuid, XSet.MODE_READ_ONLY);//getting metdata value for the keyString value = xset.getString( “com.example.key” );XStream stream = xset.openStream( “com.example.data”,XStream.MODE_READ_ONLY);stream.close();xset.close();sys.close()
eg:-“application/vnd.snia.xam.xuid" for the an 80-element byte array, xam_xuid.
The XAM URI (XRI) is the XAM resource identifier specification associated to a vendor identifying the storage.
An account of Open Source Summit 2008 Hyderabad
The first session I attended was on BeleniX, opensolaris LiveCD project - Moinak Ghosh,
Next session was more interesting was, by Mahesh Patil from
On to the stage, was a geek – Arun Raghavan from Nvidia.He is a developer in Gentoo Linux community.I hasn't tried this Linux variant before. It's a linux for developers!! Any application can be customized for performance, maintainability creating ebuilds which will make it so flexible for developers. I think it will have a good learning curve as most of the installations and customizations can be done by a user .He demonstrated creating ebuilds for gtwitter a twitter client.He demonstrated the ease of using Portage which is a package management system used by Gentoo Linux.Visit Gentoo.org to know more about this linux.I really liked the article written by Daniel Robbins(Architect of Gentoo Linux) about the birth of it; read here .
I attended another session which was on Hadoop by Venkatesh from Yahoo research team.Hadoop is an opensource project for large data centers .I was looking forward for this presentation as it is about the web 2.0 (cloud computing) and large scale computing (blogged before). It is a framework written in Java that supports data intensive distributed applications.To know more about large scale data processing using Hadoop you can read this paper.It has a file system called HDFS filesystem ( pure-Java file system !!) stores replicated data as chunks across the unRaided SATA disks.There is a name node and a cluster of data nodes like a master-slave system.Name node stores an inverted index of data stored across the file system.Concept is similar to Google File System and its cluster features.More about the concept here (Nutch) , here This framework can be used for processing high volume data integrated with lucene will help to create a quality search engine of our own.This framework is used by Facebook. In one of the Engineering @ Facebook's Notes explains why Hadoop was integrated . Read here. It is used by IBM(IBM MapReduce Tools for Eclipse), Amazon,Powerset (which was acquired by Microsoft recently),Last.fm ... Read more More hadoop in data intensive scalable computing.Related projects Mahout (machine learning libraries),tashi (cluster management system).
So it was worth as I was able to attend these sessions .... Thanks to twnicnling.org