#bigdata
Came across an interesting presentation on Using Data to Understand Brain.
Is it possible to read your brain? hmmm
I am a little two-faced with these riddles....
#bigdata
Came across an interesting presentation on Using Data to Understand Brain.
Is it possible to read your brain? hmmm
I am a little two-faced with these riddles....
Unicode | Javascript | ᴘʜᴘ | Go | Ruby | Python | ☕ Java | Perl |
---|---|---|---|---|---|---|---|
Internally | UCS‐2 or UTF‐16 |
UTF‐8⁻ | UTF‐8 | varies | UCS‐2 or UCS‐4 |
UTF‐16 | UTF‐8⁺ |
Identifiers | ─ | ✔ | ✔ | ✔ | ✅∓ | ✔ | ✔ |
Casefolding | none | simple | simple | full | none | simple | full |
Casemapping | simple | simple | simple∓ | full | simple | full | full |
Graphemes | ─ | ✅ | ─ | ─ | ─ | ─ | ✔ |
Normalization | ─ | ✔ | ─⁺ | ─ | ✔ | ✔ | ✔ |
UCA Collation | ─ | ─ | ─ | ─ | ─ | ─ | ✔⁺ |
Named Characters | ─ | ─ | ─ | ─ | ✅ | ─ | ✔⁺ |
Properties | ─ | two | (non‐regex)⁻ | three | (non‐regex)⁻ | two⁺ | every⁺ |
Normalization - courtesy |
courtesy- link |
courtesy- link |
CREATE INDEX idx ON TABLE tbl(col_name) AS 'Index_Handler_QClass_Name' IN TABLE tbl_idx;As to make pluggable indexing algorithms, one has to mention the associated class name that handles indexing say for eg:-org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"]
Prometheus film, going viral... like a fire that danced at the end of the match.
Aha! cybernetic life-forms...
The only "purpose'' (in the biological sense) of this identity is to preserve its own existence in time, that is to survive in current, specific environmental conditions, as well as to produce as many copies of itself as possible. The entire network of negative feedback mechanisms is ultimately directed at the latter task. Within the cybernetic paradigm, however, reproduction is nothing but a positive feedback.
Collusion!
A secret agreement between two or more parties for a fraudulent, illegal, or deceitful purpose.
In this battleground of privacy wars and illusionary consumer willpower, there comes another wizard to show you the goblins who steal your data.. Collusion from Mozilla.
Collusion is an experimental add-on for Firefox and allows you to see all the third parties that are tracking your movements across the Web. It will show, in real time, how that data creates a spider-web of interaction between companies and other trackers.
Oh yeah, thanks mozilla for helping us to find the hooligans steal our cookies! Yeah we can now haplessly stare at the red devils and haloing thieves
What the heck! We don't have time for tracking everything in our life. Anyway, the stuff looks cool... collusion, interesting word.
As semantic web and big data integration gaining its fus-ro-dah, enterprises are finding a way to harness any available form of information swarming the web and the world
I came across some interesting artcles which gives a concise idea of harnessing metadata from unstructured data....
Lee Dallas says
In some respects it is analogous to hieroglyphics where pictographs carry abstract meaning. The data may not be easily interpretable by machines but document recognition and capture technologies improve daily. The fact that an error rate still exists in recognition does not mean that the content lacks structure. Simply that the form it takes is too complex for simple processes to understand.
more here : http://bigmenoncontent.com/2010/09/21/the-myth-of-unstructured-data/
A lot of data growth is happening around these so-called unstructured data types. Enterprises which manage to automate the collection, organization and analysis of these data types, will derive competitive advantage.
Every data element does mean something, though what it means may not always be relevant for you.
more here : http://bigdataintegration.blogspot.in/2012/02/unstructured-data-is-myth.html
Typically, multiple memcached daemons are started, on different hosts. The clients are passed a list of memcached addresses (IP address and port) and pick one daemon for a given key. This is done via consistent hashing, which always maps the same key K to the same memcached server S. When a server crashes, or a new server is added, consistent hashing makes sure that the ensuing rehashing is minimal. Which means that most keys still map to the same servers, but keys hashing to a removed server are rehashed to a new server. - from A memcached implementation in JGroups
Data is partitioned and replicated using consistent hashing [10], and consistency is facilitated by object versioning [12]. The consistency among replicas during updates is maintained by a quorum-like technique and a decentralized replica synchronization protocol. - from Dynamo: Amazon's Highly Available Key-value Store
Cassandra partitions data across the cluster using consistent hashing [11] but uses an order preserving hash function to do so. In consistent hashing the output range of a hash function is treated as a circular space or "ring" (i.e. the largest hash value wraps around to the smallest hash value). Each node in the system is as-signed a random value within this space which represents its position on the ring. Each data item identified by a key is assigned to a node by hashing the data item's key to yield its position on the ring, and then walking the ring clockwise to fi nd the first node with a position larger than the item's position. This node is deemed the coordinator for this key. The application specifi es this key and the Cassandra uses it to route requests. Thus, each node becomes responsible for the region in the ring between it and its predecessor node on the ring. The principal advantage of consistent hashing is that departure or arrival of a node only aff ects its immediate neighbors and other nodes remain una ffected. - from Cassandra - A Decentralized Structured Storage SystemVoldemort automatic sharding of data. Nodes can be added or removed from a database cluster, and the system adapts automatically. Voldemort automatically detects and recovers failed nodes. [refer]