#orientdb #graph #etl #java #database
As someone who is familiar with graph data structures would like to know how we can map real-world models to a graph and process them. If you are trying to build them programmatically and approach them using traversal algorithms, you are going to have a hard time. If your application use a relational database to store data mapped to these models, then it will become complex while trying to link them with more relationships. How will you design the relationships between domains in a better semantic way? How would you query them like a sql-like or DSL language? Graph databases should be a right candidate. Here I am trying to test out Orient DB.
In relational databases, we have primary and foreign-key columns references that helps joins that are computed at query time which is memory and compute intensive. Also we use junctions tables for many-to-many relationships with highly normalized tables which will increase the query execution time and complexity. Graph databases are like relational databases, but with first class support for “relationships” defined by edges (stored as list) connected nodes (vertex/entity). Whenever you run a join operation, the database just uses this materialized list and has direct access to the connected nodes, eliminating the need for a expensive search / match computation.
Consider following tables,
Author Table
Book Table
Buyer Table
In graph database like orient db, we can define the relationships in amore semantic way. Graph databases operate on 3 structures: Vertex(sometimes called Node), Edge(or Arc) and Property(sometimes called Attribute).
You have to define configuration files for loading certain data into the graph store.
In the above sample configuration, you are defining,
Import the csv files and configuration from the github repo. Please change the location of files and conf with respective to your environment.
Simply execute the oetl.sh tool from $ORIENTDB_HOME as sh oetl.sh ‘location of conf file’
You have to execute all the configurations to load all the data.
After loading all the data you can query out and visualize them in the Orient DB’s web based console.
Here you can see the links between the entities.
how do you find the books bought by your friends?
As someone who is familiar with graph data structures would like to know how we can map real-world models to a graph and process them. If you are trying to build them programmatically and approach them using traversal algorithms, you are going to have a hard time. If your application use a relational database to store data mapped to these models, then it will become complex while trying to link them with more relationships. How will you design the relationships between domains in a better semantic way? How would you query them like a sql-like or DSL language? Graph databases should be a right candidate. Here I am trying to test out Orient DB.
In relational databases, we have primary and foreign-key columns references that helps joins that are computed at query time which is memory and compute intensive. Also we use junctions tables for many-to-many relationships with highly normalized tables which will increase the query execution time and complexity. Graph databases are like relational databases, but with first class support for “relationships” defined by edges (stored as list) connected nodes (vertex/entity). Whenever you run a join operation, the database just uses this materialized list and has direct access to the connected nodes, eliminating the need for a expensive search / match computation.
Consider following tables,
Author Table
id | name |
---|---|
1 | Stephen King |
2 | George R. R. Martin |
3 | John Grisham |
id | author_id | title |
---|---|---|
1 | 1 | Carrie |
2 | 1 | The Shining |
id | name | knows_id | book_id |
---|---|---|---|
1 | Hary | 2 | 2 |
2 | Mary | 1 | 2 |
In graph database like orient db, we can define the relationships in amore semantic way. Graph databases operate on 3 structures: Vertex(sometimes called Node), Edge(or Arc) and Property(sometimes called Attribute).
- Vertex. It’s data: Author, Book etc
- Edge is physical relation between Vertices. Each Edge connects two different vertices, no more, no less. Additionally Edge has label and Direction, so If you label your edge as likes you know that Hary bought the book The Shining. The direction of relationship cane be either Out or In.
- Property - it’s a value related to Vertex or Edge.
You have to define configuration files for loading certain data into the graph store.
In the above sample configuration, you are defining,
- “source”: { “file”: { “path”: “csv file location” } } // the source of file input for a model/entity
- in transformer
- vertex as the model or table
- edge will define the edges in and out of the table
- In the loader definition we define all the entities and constraints
Import the csv files and configuration from the github repo. Please change the location of files and conf with respective to your environment.
Simply execute the oetl.sh tool from $ORIENTDB_HOME as sh oetl.sh ‘location of conf file’
You have to execute all the configurations to load all the data.
After loading all the data you can query out and visualize them in the Orient DB’s web based console.
Here you can see the links between the entities.
how do you find the books bought by your friends?
select expand( both('Knows').out('Bought')) from Buyer where name = 'Hary'
Excellent information with unique content and it is very useful to know about the information based on blogs
ReplyDeleteInformatica Training Institute in Chennai
Informatica Training Center in Chennai
I feel awesome for sharing the wonderful post with us. keep doing on more.
ReplyDeleteAngularJS training in chennai | AngularJS training in anna nagar | AngularJS training in omr | AngularJS training in porur | AngularJS training in tambaram | AngularJS training in velachery