A ValuePath is an implementation of the Path interface that is used to
represent a walk through a graph within the context of remoting. It is
particularly important to ensure a compact (though obviously lossless)
on-wire representation of a Path as a naive implementation may contain
duplicate information thereby increasing the number of bytes required
to be transmitted.
The representation used by Neo4jPack and expected by the ValuePath
constructor here consists of three parts: a unique set of nodes within
the path, a similar unique set of unbound relationships (i.e. without
start and end information) and a list of path sequence information that
refers to the entities themselves. In PackStream structural representation,
this is as follows:
Structure Path 'P' {
List<Node> nodes
List<UnboundRelationship> relationships
List<Integer> sequence
}
Structure UnboundRelationship 'r' {
Identity identity
String type
Map<String,Object> properties
}
Within the Path structure, several caveats apply. Firstly, while the nodes
may generally be listed in any order, a valid Path must contain at least
one node and the first such node must correspond to the path's start node.
The relationship list however may be empty (for zero length paths) and
relationships may appear in any order with no restriction.
The sequence information contains alternating pointers to nodes and
relationships in the order in which they appear in the path. Each pointer
takes the form of an integer, which in the case of nodes simply references
the zero-based index of that node.
Relationship pointers are less straightforward, using a one-based indexing
system (i.e. relationship #1 refers to the item 0 from the relationship
list. Additionally, the sign of the index is relevant: a positive index
denotes that the relationship was traversed with its direction, a
negative index denotes that the relationship was traversed against its
direction. Note that relationship items do not contain endpoint information,
this is described by this sequential context.
Finally, the first sequence value will always be 0 by definition. Therefore,
this value is never actually transmitted and is instead reconstituted by the
receiving agent.
Take as an example, the following path:
(a)-->(b)<--(a)-->(c)-->(d)
Here, the nodes would be uniquely listed as [(a), (b), (c), (d)] and the
relationships as [(a:b), (a:c), (c:d)]. Applying zero-based and one-based
indexing to these lists respectively, we get:
0 1 2 3 1 2 3
[(a), (b), (c), (d)] [(a:b), (a:c), (c:d)]
Applying these indexes to the original path then gives us:
(a)-->(b)<--(a)-->(c)-->(d)
Node ID : 0 1 0 2 3
Rel ID : 1 1 2 3
And negating relationships that are traversed backwards, we get:
(a)-->(b)<--(a)-->(c)-->(d)
Node ID : 0 1 0 2 3
Rel ID : +1 -1 +2 +3
Finally, we can drop the leading 0 from the Path structure and combine the
sequence information with the other path data. This results in an overall
Path representation as below:
Structure Path 'P' {
nodes = [(a), (b), (c), (d)]
relationships = [(a:b), (a:c), (c:d)]
sequence = [+1, 1, -1, 0, +2, 2, +3, 3]
}
Therefore, for the original path containing five nodes and four relationships,
we end up transmitting only four nodes, three unbound relationships and eight
bytes of sequence data (small integers require only a single byte in PackStream).