LGP modeling (1)
Graph modeling diverges from traditional relational modeling
Once the domain has been captured
Relational: definition of schemas
Graph: enrich the domain with information needed by applications (entities as nodes, roles as labels, connections to neighboring entities as relationships , attributes as properties)
Two main goals
data-centric designacorrect labels an properties to nodes
semantic contextanamed and directed relationships between nodes
LGP modeling (2)
Graphs are naturally addic1ve
New kinds of rela1onships, nodes, labels, and subgraphs can be added to an exis1ng structure without impac1ng exis1ng queries and applica1on func1onali1es
Structures and schemas can emerge incrementally with the understanding of the problem space
Data are connected as the domain dictates
LGP modeling (3)
Applicaon needs always in mind
AS A reader who likes a book, I WANT to know which books other readers who like the same book have liked, SO THAT I can find other books to read.
LGP modeling (4)
There is not a unique set of rules, but generally
En55es of interest: nodes (can be labeled and grouped)
Connec5ons between en55es and seman5c context of each en5ty: rela5onships
They give structure to the domain
Clarifica5on of the seman5cs of a rela5onship: direc5on and proper5es of the rela5onship
For bidirec5onal rela5onships, a rela5onship exists for each direc5on
En5ty aCributes and metadata: node proper5es
Timestamps, version numbers
Strength, weight, or quality of a rela5onship and metadata: rela5onship proper5es
Granularity of relationships
Relationships can be defined at different granularity levels
Example: relationships between a user, her delivery address and her billing
address can be represented via Two relationships only
One relationship address specified by properties only Both solutions
Emerging facts
When two or more domain en..es interact for a period of .me, a fact can emerge
It is possible to represent it as a separate node with connec.ons to each of the en..es engaged in that fact
The addi.onal node can represent the result of the interac.on among the en..es
Analyzing LPGs
Graph analy*cs: use of any graph-based approach to analyze connected data
Graph algorithms use the rela*onships between nodes to infer the organiza*on and dynamics of complex systems
Pa=ernbased querying is used for local data analysis Rela*onships in a graph naturally form paths
Querying or traversing the graph involves following paths
Pattern-based queries: Cypher (1)
Declarave query language for graphs
Idea: query for pa7ers represenng what should be returned
This pattern describes in ASCII a path that connects nodes Basic pattern idea: (node)-[relationship]->(node) emil,jimandianareuse asidentifiers
Pa#ern-based queries: Cypher (2)
Declarave query language for graphs
Idea: query for pa7ers represenng what should be returned
(emil:Person {name:Emil})
<-[:KNOWS]- (jim:Person {name:’Jim’})-[:KNOWS]->(ian:Person {name:Ian}) -[:KNOWS]->(emil)
Idenfiers now bounded to graph data through their Name property
Pattern-based queries: Cypher (3)
A Cypher query
Anchors one or more parts of a pa3ern to specific loca7ons in a graph using
Flexes the unanchored parts around to find local matches
The anchor points in the real graph are determined based on the labels and property predicates in the query.
Metainforma7on about exis7ng indexes, constraints, and predicates can be used
A Cypher query is composed of clauses
Pattern-based queries: Cypher (4)
The match clause represents nodes and relationships we are interested into
specification by example
MATCH (a:Person{name:Jim})-[:KNOWS]->(b)-[:KNOWS]->(c), (a)-[:KNOWS]->(c)
Nodes are represented with parenthesis
Node labels prefixed by colon
Rela:onships are represented with arrows
Rela:onship names prefixed by colon and within squared brackets Key-value pair proper:es are specified within curly brackets
Pattern-based queries: Cypher (5)
WHERE: specifies filter for pa.ern matching results
CREATE [UNIQUE]: creates of nodes and rela5onships
MERGE: returns a pa.ern or adds it to the graph it does not exist
DELETE: removes nodes, rela5onships, proper5es
SET: sets property values
FOREACH: executes an update opera5on for all provided elements UNION: merges results from queries
Pattern-based queries: Cypher (examples 1)
(shakespeare:Author {firstname:William, lastname:Shakespeare}),
(juliusCaesar:Play {title: }),
(shakespeare)-[:WROTE_PLAY{year:1599}]-> (juliusCaesar),
Pattern-based queries: Cypher (examples 2)
MATCH (p:Play) RETURN p.title
MATCH (a:Author)-[w:WROTE_PLAY]-> (p:Play{title: })
MATCH (a:Author)-[w:WROTE_PLAY]->(p:Play) WHERE p.title =
RETURN a,w,p
RETURN a,w,p
(theater:Venue {name:Theatre Royal}), (newcastle:City {name:Newcastle}), (bard:Author {lastname:Shakespeare}), (newcastle)<-[:STREET|CITY*1..2]-(theater)<-[:VENUE]-()-[:PERFORMANCE_OF]->() -[:PRODUCTION_OF]->(play)<-[:WROTE_PLAY] -(bard) RETURN DISTINCT play.title AS playGraph algorithms Besides pa(ern-based local queries, graphs can be analyzed for more general analysis Examples: Pathfinding Community detecAon Centrality To this end, graph DBMSs can be equipped with libraries that support advanced analyAcs, typically based on graph theory BFS, DFS, single source/all sources shortest path, minimum cost spanning tree, random walksNeo4J: Transactions Transac’ons are supported by Neo4j Writes occur within a transac’on context, write locks on any nodes and rela’onshipsinvolved in the transac’on On successful comple’on of a transac’on, changes are flushed to disk and the write locks released On failure, writes are discarded and the write locks releasedaconsistent state of graph If two transac’ons aCempt to write same graph elements concurrently, Neo4j detect a poten’al deadlock and serialize the transac’ons Writes within a single transac’onal context are not visible to other transac’ons Neo4j is ACID-compliant Graph databases: use casesSuitable for: Rou/ng, dispatch, and loca/on-based services Financial services, fraud detec/on Social domains Recommenda/on engines Contact tracingNot suitable for: Whenever the graph model does not represent well the data Resource Description Framework Goal: provide a framework for describing resources Main elements: Resources to be describedaanything that can have a unique identifier (URI) Properties associated with resources Classes used to bucket resources RDF data composed as triples
RDF (example)
Alice is a person
Alices userID is 123 Alice
Bob likes likes
RDF can be serialized in different ways
Examples: RDF/XML, RDF/ JSON, Turtle, N-Triples
[Legend: subject, predicate, object]
2014 European Commission
RDF characteris-cs
Nodes do not have a structure They are either literals or URIs
Data are broken into atomic triples
Cannot model the same rela
