The VLDB Journal (2024) 33:101–129 https://doi.org/10.1007/s00778-023-00799-9 REGULAR PAPER BatchHL+: batch dynamic labelling for distance queries on large-scale networks Muhammad Farhan1 · Henning Koehler2 ·Qing Wang1 Received: 18 June 2022 / Accepted: 15 May 2023 / Published online: 7 June 2023 © The Author(s) 2023 Abstract Many real-world applications operate on dynamic graphs to perform important tasks. In this article, we study batch-dynamic algorithms that are capable of updating distance labelling efficiently in order to reflect the effects of rapid changes on such graphs. To explore the full pruning potentials, we first characterize the minimal set of vertices being affected by batch updates. Then, we reveal patterns of interactions among different updates (edge insertions and edge deletions) and leverage them to design pruning rules for reducing update search space. These interesting findings lead us to developing a new batch-dynamic method, called BatchHL+, which can dynamize labelling for distance queries much more efficiently than existing work. We provide formal proofs for the correctness and minimality of BatchHL+ which are non-trivial and require a delicate analysis of patterns of interactions. Empirically, we have evaluated the performance of BatchHL+ on 15 real-world networks. The results show that BatchHL+ significantly outperforms the state-of-the-art methods with up to 3 orders of magnitude faster in reflecting updates of rapidly changing graphs for distance queries. Keywords Shortest-path distance · Batch-dynamic graphs · 2-Hop cover · High-way cover · Distance labelling maintenance · Graph algorithms 1 Introduction Batch-dynamic algorithms on graphs have attracted consid- erable interest in recent years, for both fundamental research and practical applications [2, 3, 14, 15, 18, 29, 41]. Dif- ferent from traditional dynamic algorithms which handle single changes sequentially—one at a time, batch-dynamic algorithms focus on dynamically maintaining graph proper- ties, albeit graphs may undergo rapid changes through large batches of updates. Nowadays, many real-world applications operate on rapidly changing graphs, such as communica- tion networks [34], context-aware search in web graphs [37], B Muhammad Farhan muhammad.farhan@anu.edu.au Henning Koehler h.koehler@massey.ac.nz Qing Wang qing.wang@anu.edu.au 1 School of Computing, Australian National University, Canberra, Australia 2 School of Mathematical and Computational Sciences, Massey University, Palmerston North, New Zealand social network analysis [7, 38], route-planning in road net- works [1, 13], and management of resources in computer networks [8]. Given a graph, a distance query computes the shortest path distance between two vertices in the graph, which is a fundamental problem both theoretically and in practice. A classical approach to answer distance queries is to run the Dijkstra’s algorithm for non-negative weighted graphs or breadth-first search algorithm for unweighted graphs [36]. However, these algorithms are inefficient for large graphs. A well-established technique for speeding up query response time is to pre-compute and store distance labelling such as 2-hop labelling [11] and in particular pruned landmark labelling [4]. Although these labelling techniques have been shown to be effective in improving query response time on static graphs, the use of distance labelling poses an additional challenge for dynamic graphs—“How to efficiently update distance labelling in order to reflect changes on graphs, par- ticularly when changes occur rapidly?” Several studies have attempted to address this challenge by developing dynamic 2-hop labellings (in particular pruned landmark labellings) in the sequential setting which process one single update (edge insertion or edge deletion) at a time 123 http://crossmark.crossref.org/dialog/?doi=10.1007/s00778-023-00799-9&domain=pdf http://orcid.org/0000-0002-1239-2107 102 M. Farhan et al. Fig. 1 Ahigh-level overview of our batch-dynamicmethodBatchHL+: (left) A graph G with a landmark 5 on which a batch update (a set of edge insertions and deletions) is performed; (middle) A compari- son of affected vertices being identified by the state-of-the-art method BatchHL and our proposed method BatchHL+; (right) Three different interaction patterns in which v can be pruned by BatchHL+ but cannot be pruned by BatchHL [5, 12, 23, 35]. However, the size of such distance labellings can grow quadratically with the size of a graph. This not only limits the applicability of these distance labellings to graphs with up to millions of nodes and edges, but also increases the difficulty and complexity of updating labellings. That is to say, even if such distance labellings can be constructed on graphs, the computational cost of updating them to reflect rapid changes is still unbearably high. Recently, a batch-dynamic method to answer distance queries, namely BatchHL, has been proposed [18]. The key idea of BatchHL is to combine pre-computed labelling with online searches so as to exploit the merits of both sides— accelerating query response time through a partial distance labelling that is of limited size but can bound online search space. Further, BatchHL can efficiently dynamize a partial distance labelling to reflect large batches of updates on a graph. It has been shown [18] that BatchHL offers significant performance gains in comparison with other state-of-the-art methods for answering distance queries on dynamic graphs and can scale to billion-scale graphs without compromising query and update performance. In this work, we further explore possible ways to advance the design of batch-dynamic algorithms for distance queries. We first analyse how different types of updates (edge inser- tion and edge deletion) interact with each other. Then, based on that, we unearth new patterns of update interactions that can be leveraged to design pruning rules for reducing update search space. These interesting findings lead us to devel- oping a new batch-dynamic algorithm, called BatchHL+, which can dynamize labelling for distance queries much more efficiently than BatchHL on graphs that undergo rapid changes. Figure 1 presents a high-level overview of com- paring BatchHL and BatchHL+ in terms of several typical update interaction patterns. We can see that the search space of BatchHL, the state-of-the-art method, is more exhaus- tive than BatchHL+ when updating labelling to reflect batch updates on a graph. This is because BatchHL+ further prunes search space by exploiting interactions between different types of updates. Concretely, in the lower right-hand corner of Fig. 1, three update interaction patterns are presented, and vertex v cannot be pruned by BatchHL but can be pruned by BatchHL+ successfully. This enables BatchHL+ to perform much more efficiently than BatchHL. Novelty. This article is an extended version of the previous work [18]. The following contributions are novel. – We explore update interaction patterns and characterize the minimal set of vertices affected by batch updates, for which a “perfect” batch-dynamic algorithm may wish to identify. We show that algorithms for only edge inser- tions or only edge deletions can precisely identify such vertices in each separate case. However, combining these two algorithms fails to compute the exact set of vertices affected by batch updates in the general case (Sect. 5). – We propose an improved batch-dynamicmethod, namely BatchHL+, which is equipped with optimized batch search and batch repair algorithms. In the original work of BatchHL, a vertex is flagged as affected if any short- est path is eliminated. In contrast, BatchHL+ proposed in this work only flags a vertex as affected if every short- est path is eliminated. As a result, edge deletions can be processed almost as quickly as edge insertions (Sect. 6). – We present a comprehensive discussion on distinctive characteristics of different batch-dynamic algorithms and their relationships in terms of their capacity of identi- 123 BatchHL+: batch dynamic labelling for distance queries on large-scale networks 103 fying vertices that are “affected” by batch updates on dynamic graphs. This leads to several different notions of “affected” vertices and the connections between these notions manifest the source of efficiency differences among different batch-dynamic algorithms. For example, BatchHL+ handles a much smaller subset of “affected” vertices than BatchHL (Sect. 7). – We provide detailed proofs for the correctness and min- imality of BatchHL+. We note that the proof of correct- ness for the batch search algorithm (i.e. Algorithm 6) of BatchHL+ is non-trivial and requires a delicate analysis of interaction patterns of batch updates. We also con- duct the time and space complexity analysis ofBatchHL+ (Sect. 8). Empirically, we have evaluated our method on 15 real-world networks to verify efficiency and scalability. The results show that ourmethod significantly improves update time compared to the state-of-the-art methods. It can maintain labelling of a very small size, while still answering distance queries in the order ofmilliseconds, even on large-scale graphswith several billions of edges that undergo large batches of updates. The average update times on all these real-world networks are less than one millisecond, and in general, our method is up to 3 orders ofmagnitude faster than the recent state-of-the-art method BatchHL. 2 Related work Traditionally, distance queries can be answered using Dijk- stra’s search, breadth-first search (BFS), or a bidirectional scheme combining two such searches: one from the source vertex and the other from the destination vertex [33, 36]. However, these algorithms may traverse an entire network when two query vertices are far apart from each other and become too slow for large networks. To accelerate response time in answering distance queries, labelling-based methods have emerged as an attractive way, which precompute a data structure, called distance labelling [1, 4, 6, 10, 11, 13, 19, 22, 24, 28, 39, 40]. For example, Akiba et al. [4] proposed the pruned landmark labelling (PLL) to pre-compute a 2-hop distance labelling [11] by performing a pruned breadth- first search from every vertex, and Li et al. [27] developed a parallel algorithm for constructing PLL which achieved the state-of-the-art results for answering distance queries on static graphs. Previously, several attempts have been made to study dis- tance queries over dynamic graphs [5, 12, 16, 20, 23, 31, 35, 42] which only considered the unit-update setting, i.e. to perform updates one at a time. Akiba et al. [5] studied the problem of updating PLL for incremental updates (i.e. edge additions). This work, however, does not remove out- dated entries because the authors considered it too costly. Qin et al. [35] and D’angelo et al. [12] studied the problem of updating PLL for decremental updates (i.e. edge dele- tions). Note that, in the decremental case, outdated distance entries have to be removed; otherwise, distance queries can- not be correctly answered. Their methods suffer from high time complexities and cannot scale to large graphs, e.g. the average update time of an edge deletion on a network with 19M edges is 135s in [35] and on a network with 16M edges is 19 s in [12]. D’angelo et al. [12] combined the algo- rithm for incremental updates proposed in [5] with their method for decremental updates to form a fully dynamic algorithm, which however does not scale beyond networks with around 20M of edges. Hayashi et al. [23] proposed a fully dynamic method which combines a distance labelling with online search to answer distance queries. Their method pre-computes bit-parallel shortest-path trees (SPTs) rooted at each r ∈ R for a small subset of vertices R and dynami- cally maintain the correctness of these bit-parallel SPTs for every edge insertion and deletion. Then, an online search is performed under an upper distance bound computed via the bit-parallel SPTs on a sparsified graph. Very recently, several methods have attempted to perform updates in the batch-update setting (i.e. to perform multiple updates in a batch) [17, 18]. For instance, BatchHL [18] has shown to be much faster in performing batch updates as compared to the other state-of-the-art methods. In our present work, we further advance the design of BatchHL by studying patterns that were left unexplored in the design of BatchHL. Another line of research studied streaming graph algo- rithms. In the streaming setting, a rapidly changing graph is often modelled using certain compressed data structures due to space constraints. Updates are received as a stream, but may be accumulated into batches through a sliding win- dow and applied to the underlying graph. In this setting, a number of methods [21, 30, 32] have been proposed to address distance queries. However, these methods operate under certain constraints, e.g. limited amount of memory and accuracy of graph structure. Different from these streaming graph methods, our work considers applications which oper- ate on batch-dynamic graphs that are explicitly stored and can be processed in the main memory of a single machine. Nev- ertheless, the ideas of our algorithm can be easily extended to deal with batch updates in the streaming setting. 3 Preliminaries Without loss of generality, we focus our discussion on unweighted, undirected graphs in this paper and discuss the extension to directed graphs in Sect. 9. Table 1 summarises the frequently used notations. 123 104 M. Farhan et al. Table 1 Summary of notations Notation Description G = (V , E) A graph G with the set of vertices V and the set of edges E R A subset of vertices, called landmarks L(v) Label of a vertex v � = (H , L) A highway cover labelling, consisting of a highway H and a distance labelling L N (v) Set of vertices adjacent to v in a graph G dG(u, v) Shortest path distance between u and v in a graph G PG(u, v) Set of all shortest paths between u and v in a graph G δH (u, v) Highway distance between u and v δL (u, v) Labelling distance between u and v G[V \R] A sparsified graph after removing R from G d� uv An upper distance bound between u and v Q(u, v) An exact query between u and v dL G(r , v) Landmark distance between r and v in G Vaff/Vaff+ Set of affected vertices, for some notion of “affected” B A batch update G ′ = (V ′, E ′) Graph after batch update B �′ = (H ′, L ′) An updated highway cover labelling N/B Set of all natural numbers / Boolean values ⊕ Append operator to update the landmark length of a path dL G(r , v) Landmark distance between r and v in G dc(r , v) Composite distance between r and v |S| Number of elements in a set S dbou(v, S) Distance bound of a vertex v w.r.t. a set S dL bou(v, S) Landmark distance bound of a vertex v w.r.t. a set S Let G = (V , E) be a graph where V is a set of vertices and E ⊆ V × V is a set of edges. The distance between two vertices s and t in G, denoted as dG(s, t), is the length of a shortest path between s and t . If there does not exist any path between s and t , dG(s, t) = ∞. We use PG(s, t) to denote the set of all shortest paths between s and t in G, and N (v) the set of neighbours of a vertex v, i.e. N (v) = {v′ ∈ V |(v, v′) ∈ E}. There are two types of edge updates on graphs: edge inser- tion and edge deletion. A batch update is a set of edge insertions and deletions. Node insertion or deletion can be treated as a batch update containing only edge insertions or only edge deletions, respectively. In the case that the same edge is being inserted and deleted within one batch update, we simply eliminate both of them. An update is valid if it makes a change on a graph, i.e. inserting an edge (a, b) into G when (a, b) /∈ E , and deleting an edge (a, b) fromG when (a, b) ∈ E . Without loss of generality, we ignore invalid updates. Let R ⊆ V be a subset of special vertices inG, called land- marks. A label L(v) for a vertex v is a set of distance entries {(ri , δL(ri , v))}ni=1 where ri ∈ R, δL(ri , v) = dG(ri , v) and n ≤ |R|. We call (ri , δL(ri , v)) the ri -label of vertex v. A distance labelling overG is the set of labels for all vertices in V . The size of a distance labelling is defined as ∑ v∈V |L(v)|. In the literature, a distance labelling is often constructed fol- lowing the 2-hop cover property [11] which requires at least one common vertex in L(u) and L(v) to be on a shortest path between any two vertices u and v. Definition 1 (2-hop cover labelling) A distance labelling L over G = (V , E) is a 2-hop cover labelling if for any s, t ∈ V , dG(s, t) = min{δL(ri , s) + δL(ri , t) | (ri , δL(ri , s)) ∈ L(s), (ri , δL(ri , t)) ∈ L(t)}. 3.1 Highway cover labelling Our work considers the highway cover labelling [19]. Definition 2 (Highway) A highway H = (R, δH ) consists of a set R of landmarks and a distance decoding function δH : R × R → N + s.t. δH (r1, r2) = dG(r1, r2) for any two landmarks r1, r2 ∈ R. 123 BatchHL+: batch dynamic labelling for distance queries on large-scale networks 105 Definition 3 (Highway cover labelling) A highway cover labelling� = (H , L) consists of a highway H and a distance labelling L satisfying that, for any v ∈ V \R and r ∈ R, dG(r , v) = min{δL(ri , v) + δH (r , ri ) | (ri , δL(ri , v)) ∈ L(v)}. Ahighway cover labelling requires that every label L(v)must contain a distance entry to each landmark r ∈ R unless there is another landmark on a shortest path between r and v. We shall refer to this distance entry (or its absence) as the r -label of v. Unlike a 2-hop cover labelling that can answer distance queries for any two vertices in a graph, i.e. a full distance labelling, a highway cover labelling can only answer distance queries between any landmark and any vertex in a graph, i.e. a partial distance labelling. As discussed in the original paper [19], highway cover labelling enjoys several nice theoretical properties, such as minimality and order independence. A highway cover labelling � = (H , L) over G is minimal if, for any highway cover labelling �′ = (H , L ′) over G, si ze(L ′) ≥ si ze(L) holds. A highway cover labelling � = (H , L) over G is order-independent if � remains the same, regardless of the order of applying landmarks in R. Given any fixed set of landmarks, there exists a unique minimal highway cover labelling, which is contained in every highway cover labelling [19]. 3.2 Query answering A distance query can be answered via bounding online searches on a sparsified search space based on the highway cover labelling. Specifically, given a highway cover labelling � = (H , L), an upper bound on the distance between any pair of vertices s, t ∈ V in a graph G is computed as d� st = min{δL(ri , s) + δH (ri , r j ) + δL(r j , t) | (ri , δL(ri , s)) ∈ L(s), (r j , δL(r j , t)) ∈ L(t)}. Here, d� st is the minimal length amongst all paths between s and t that pass through the highway. Since there may exist a shorter path not passing through the highway, we conduct a distance-bounded shortest-path search over a spar- sified graph G[V \R] (i.e. removing all landmarks in R from G) under the upper bound d� st to answer the distance query Q(s, t) such that Q(s, t) = min(dG[V \R](s, t), d� st ). In the implementation, dG[V \R](s, t) can be computed by conducting a bidirectional BFS search from both s and t [19] which terminates either after d� st − 1 steps or when the searches from both directions meet. 4 BatchHL: basic algorithm In this section, we start by presenting the basic algorithm of BatchHL, a batch-dynamic method that can efficiently maintain a highway cover labelling for dynamic graphs [18]. Generally, BatchHL involves two phases: Batch Search and Batch Repair, as described in Algorithm 1. Batch Search identifies vertices for which labels may need to be updated, while Batch Repair updates these vertices. These two phases are done independently for each landmark. Algorithm 1: BatchHL (Basic Algorithm) 1 Function BatchHL(G ′, B, R, �) 2 �′ ← � 3 foreach r ∈ R do 4 Vaff ← BatchSearch(G ′, B, r , �) 5 BatchRepair(G ′, Vaff, r , �, �′) 6 return �′ 4.1 Batch search LetG = (V , E) be a graph, R ⊆ V a set of landmarks and B a batch update resulting in the updated graph G ′ = (V ′, E ′). We denote the unique minimal highway labellings on G and G ′ by � and �′, respectively. Definition 4 (P-affected) A vertex v ∈ V is path-affected (P-affected) by a batch update B w.r.t. a landmark r ∈ R iff PG(r , v) �= PG ′(r , v). We use Vaff(r , B) = {v ∈ V |PG(v, r) �= PG ′(v, r)} to denote the set of all P-affected vertices by a batch update B w.r.t. a landmark r . An edge insertion or deletion (a, b) can create or eliminate shortest paths starting from r and passing through (a, b). Any update on an edge (a, b) with dG(r , a) = dG(r , b) is trivial w.r.t. a landmark r , since such an update does not affect any vertices w.r.t. the landmark r . For a non-trivial update (a, b), we call the vertex u ∈ {a, b} further away from r in G the anchor of (a, b), and the other one its pre-anchor. We denote the anchor distance of (a, b) by dG(r , u) = dG(r , u′)+1,where u′ is the pre-anchor of (a, b). As every newly created or eliminated path must pass through an updated edge, we can use anchors to identify P-affected vertices. Lemma 1 For every P-affected vertex v we have dG(r , v) ≥ (dG(r , u′) + 1) + dG ′(u, v). (1) 123 106 M. Farhan et al. Algorithm 2: Batch Search 1 Function BatchSearch(G ′, B, r , �) 2 foreach (a, b) ∈ B do 3 if dG(r , a) < dG(r , b) then 4 add (dG(r , a) + 1, b) to Q 5 else if dG(r , a) > dG(r , b) then 6 add (dG(r , b) + 1, a) to Q 7 while Q is not empty do 8 remove minimal (d, v) from Q 9 if v /∈ Vaff+ then 10 add v to Vaff+ 11 foreach w ∈ NG′ (v) do 12 if d + 1 ≤ dG(r , w) then 13 add (d + 1, w) to Q 14 return Vaff+ for some non-trivial update (a, b) ∈ B with anchor u and pre-anchor u′. We make use of this characterization in Algorithm 2. Specifically, we perform a search with anchors as starting points, compute the minimal value of the right hand side of Eq.1, and collect all vertices for which Eq.1 holds. Example 1 Consider the graph below, where edges marked by + are inserted and edges marked by − are deleted in one batch update. r a b c d e f v x y z + + - - + - Insertion of (a, d) triggers a search starting from the anchor d with the anchor distance dG(r , a) + 1 computed from �. The search then follows edges in G ′, including the newly inserted edge (d, f ), andfinds thatd, c, f andv satisfy (1). Insertion of (d, f ) is trivial since d and f are equidistant from r in G, and does not lead to any search. Deletion of (b, e) triggers a search starting from the anchor e with the anchor distance dG(r , b) + 1 computed from �. Combining the searches for (a, d) and (b, e) into one search means that the edge ( f , v) is traversed only once. Algorithm 2 eliminates unnecessary search paths by not following vertices violating Eq.1 and also avoids traversing vertices affected by multiple updates more than once. How- ever, Algorithm 2 does not precisely compute the set of all P-affected vertices, but the set of all vertices satisfying Eq.1, which is a superset. The following example illustrates why the characterization in Lemma 1 is not precise, and why pre- cise discovery of P-affected vertices is difficult. Example 2 Consider the graph below, where the dotted edge between r and u indicates a long path between them, and the dotted edge between r and v indicates another long path. r u v - + When both edge deletion (r , u) and edge insertion (u, v) occur, the distance between r and u in G is used to compute the anchor distance of v for the update (u, v), ignoring that the distance between r and u has changed. It is difficult to identify whether v is P-affected—it hinges on whether the long path between r and v is longer (or equal) than the long path between r and u plus 1, which cannot be ascertained by �. The set of vertices returned by Algorithm 2 can be pre- cisely characterized as follows. Definition 5 (Composite path) A path from r to v in G ∪ G ′ is a composite path iff it consists of two parts: a part that lies in G followed by a part in G ′. A composite path is significant iff it passes through at least one deleted and at least one inserted edge. Consider Example 1 again. r − x is an insignificant composite path, r − x − y is a significant composite path, and r − x − y − z is not a composite path. To see the latter, consider that any inserted edge must belong to theG ′ part, and thus later edges would need to lie in G ′ as well, i.e. cannot be deleted. Definition 6 (CP-affected) A vertex v is composite-path- affected (CP-affected) w.r.t. r ∈ R iff (i) v is P-affected w.r.t. r , or (ii) there exists a significant composite path from r to v of length dG(r , v) or less. Algorithm 2 returns the set of all CP-affected vertices which includes all P-affected vertices. Additional vertices due to condition (ii) are undesirable but hard to avoid, as illustrated in Example 2. This happens because the starting distance is calculated w.r.t. G, and we are thus effectively considering paths forwhich the first part (from r to an anchor) lies in G, and the rest in G ′. For example, the vertex y in Example 1 is not P-affected but CP-affected, andAlgorithm2 returns it. 123 BatchHL+: batch dynamic labelling for distance queries on large-scale networks 107 Lemma 2 A vertex v is CP-affected w.r.t. r iff there exists a composite path from r to v of length at most dG(r , v) that passes through an updated edge. Proof We show each direction separately. – (if) Let p be a composite path from r to v of length at most dG(r , v) that passes through at least one edge in B. If p lies in G then it lies in PG(r , v) but not in PG ′(r , v), so v is P-affected. If p lies in G ′ then either it lies in PG ′(r , v), or there exists a strictly shorter path p′ in PG ′(r , v). Neither p not p′ lies in PG(r , v), so v is P-affected. If p lies neither in G nor in G ′ then it must be significant. – (only if) Let v be CP-affected. If PG(r , v) � PG ′(r , v) then there exists a path p in G of length dG(r , v) that passes through a deleted edge. If PG(r , v) � PG ′(r , v), then there exists a path p inG ′ of length at most dG(r , v) that passes through an inserted edge. Otherwise, there exists a significant composite path of length at most dG(r , v), which passes through edges in B by definition. �� Lemma 3 Algorithm 2 returns the set of all CP-affected ver- tices. Proof We show that a vertex v is added to Vaff+ iff there exists a composite path p of length at most dG(r , v) that passes through an updated edge. – (if) Let v ∈ Vaff+. Vertices are only added to Q and thus to Vaff+ if they pass one of three distance checks. In each case, this compares the length of a composite path passing through an updated edge to the distance from r in G. – (only if) Let (a, b) be either the last deleted edge that p passes through, or the first inserted edge, with dG(r , a) < dG(r , b). Then, p can be split into pra from r to a, (a, b) and pbv from b to v such that pra lies in G and pbv in G ′. The search in Algorithm 2 starting at b will use |prb| = dG(r , a)+1 as distance bound for b, and proceed along pbv . Thus, for every vertex w ∈ pbv , including v, it will obtain |prw| ≤ dG(r , w) as distance bound for w, and add w to Vaff+. �� In particular, it follows that (1) characterizes CP-affected vertices, rather than P-affected ones. 4.2 Batch repair Algorithm 3 describes the approach for repairing the labels of affected vertices returned by Algorithm 2. At its core, the algorithm starts with boundary vertices that lie on the bound- ary of affected and unaffected vertices, and whose distances to r are computed from neighbouring vertices whose dis- tance did not change. Importantly, even though a vertex may be affected by multiple edge updates in a batch, its r -label only needs to be updated once. Definition 7 (Boundary vertex) A vertex v is a boundary ver- tex w.r.t. a landmark r if v is an affected vertex (for some version of “affected”, e.g. CP-affected) but has an unaffected neighbour w. Denote by Vaff+ the set of affected vertices, and let v ∈ Vaff+. For every neighbour w of v in G ′, dG ′(r , v) must be upper-bounded by dG ′(r , w) + 1. If such a neighbour lies outside of Vaff+, the value of dG ′(r , w) = dG(r , w) can easily be obtained – while we employ different variants of “affected” throughout the paper, “unaffected” vertices will always retain their distance to r . By taking the minimum of such known upper bounds, we get a readily available distance bound for v. Definition 8 (Distance bound) Let S ⊂ V \ {r} be a set of vertices. The distance bound of v w.r.t. S is: dbou(v, S) := min{dG ′(r , w) + 1 | w ∈ NG ′(v) \ S}. The following lemma allows us to compute the distance of vertices in Vaff+ from r in G ′ using their distance bounds. Lemma 4 Let S ⊂ V \{r} and v ∈ S with minimal distance bound. Then dG ′(r , v) = dbou(v, S). Proof For dG ′(r , v) = ∞ this is trivial. Otherwise, let p be a shortest-path from r to v in G ′, v′ be the first vertex in p that lies in S, and w its predecessor in p. Since w /∈ S we have dbou(v ′, S) ≤ dG ′(r , w) + 1 = dG ′(r , v′). If v′ �= v then dG ′(r , v′) < dG ′(r , v) ≤ dbou(v, S), and therefore dbou(v ′, S) < dbou(v, S). This contradicts the minimality of dbou(v, S), so v′ = v. It follows that dbou(v, S) = dG ′(r , v). �� Note that dG ′(r , v) = dbou(r , S) does not generally hold for every boundary vertex v. Based on Lemma 4, we repair labels by starting with boundary vertices that have the small- est distance bound. After each iteration, we treat affected vertices with repaired labels as being unaffected and find boundary vertices that have the smallest distance bound again. This process terminates only when the labels of all affected vertices are repaired. Algorithm 3 shows the pseudo-code of the batch repair algorithm used by BatchHL. Given a graphG ′ and a set of all affected vertices Vaff+, we first compute the distance bounds of vertices in Vaff+ using their unaffected neighbours. We 123 108 M. Farhan et al. Algorithm 3: Batch Repair 1 Function BatchRepair(G ′, Vaff+, ri , �, �′) 2 foreach v ∈ Vaff+ do 3 Dbou[v] ← dbou(v, Vaff+) // use � to compute 4 while Vaff+ is not empty do 5 Vmin ← {v ∈ Vaff+ | Dbou[v] is minimal} 6 remove Vmin from Vaff+ 7 foreach v ∈ Vmin do 8 set r -label of �′(v) to (ri , Dbou[v]) 9 foreach w ∈ NG′ (v) ∩ Vaff+ do 10 Dbou[w] ← min(Dbou[w], Dbou[v] + 1) then find vertices in Vaff+ with the minimal distance bounds and remove them from Vaff+. By Lemma 4, their distances to r in G ′ equal their distance bounds. For each v ∈ Vmin, we set the r -label of v to (ri , Dbou[v]) (Line 8) and assign new distances to the affected children of v in Vaff+ (Lines 9-10). We continue this process until Vaff is empty. Note that Algorithm 3 does not eliminate redundant r - labels. We will address this issue in Sect. 6.2. Remark 1 We note that the work in [18] has proposed sev- eral optimization techniques to reduce the set of CP-affected vertices returned by Algorithm 2 and can thus efficiently repair the affected vertex set. These optimization techniques have been shown to significantly improve the performance of updating labelling. Thus, in our experiments (Sect. 10), we will compare the performance of our proposed method BatchHL+ against the method of BatchHL that has incorpo- rated these optimization techniques. 5 Separate algorithms—insertion and deletion We observe that changes to shortest paths between r and v do not always cause a change in distance. Based on this observa- tion,we shall differentiate betweennewand eliminated paths, and strengthen the pruning condition d + 1 ≤ dG(r , w) in Line 12 of Algorithm 2 to d + 1 < dG(r , w) for new paths. For eliminated paths, we examine all neighbouring vertices to ensure that all shortest paths have been eliminated. However, things get a little trickier as we may need to eliminate redundant labels, or restore previously eliminated labels when they become non-redundant. Even if the distance between r and v does not change, the highway labelling may need to be updated. Example 3 Consider the followinggraphs andupdates,where the landmarks are circled. In all cases, vertex v is affected, but the distance between r and v does not change. For case (a) adding the edge (b, v) does not cause a label change for v. It does however for case (b) where b is a landmark, causing the r -label of v to be deleted. Deletion of (b, v) does not cause a change on the label of v in case (c), but causes a change in case (d) where an r -label needs to be inserted. r a b v + (a) no change r a b v + (b) change r a b v - (c) no change r a b v - (d) change A core difficulty in identifying whether affected vertices have changes on their labels is that label changes can happen far away from updates, and computing the changed labels of such verticesmay require the consideration of verticeswhose labels do not change due to being redundant, as illustrated by the example below. Example 4 Consider the graph below, where r and b are land- marks and the edge (r , b) is deleted. a b c v r d e - r -label changes The distance between r and c changes, but the label of c does not change. That is because the shortest path between r and c goes through the landmark b without changes. At the same time the label of v does change, as the edge (r , b) eliminates a shortest path between r and v that passes through the landmark b, similar to case (d) in Example 3. Although the label of c does not change, the changed distance between r and c is needed for computing the changed label of v. Therefore, c needs to be captured as well. We thuswant to return (at least) all those vertices forwhich either label or distance changes. Definition 9 (LD-affected) A vertex v is landmark-distance- affected (LD-affected) by a batch update B w.r.t. a landmark r ∈ R iff there is a (i) label-change: the r -label of v changes, or (ii) distance-change: distance between r and v changes. As seen in Example 3, changes to r -label without changes to distance happen whenever a new shortest path passing through another landmark is created where none existed pre- viously, or when the last such path is deleted. To identify such cases, we track whether a shortest path to r passes through another landmark. Definition 10 (Landmark length) The landmark length of a path p starting from r ∈ R is a tuple (d, l) ∈ N × B where 123 BatchHL+: batch dynamic labelling for distance queries on large-scale networks 109 – d is the length of p (number of edges), and – l is the landmark flag, with l = True iff p passes through a landmark other than r . We denoted this landmark length as |p|l. The landmark dis- tance between r and v in G is the minimal landmark length of paths between them, denoted as dL G(r , v) := min {|p|l | pis a path betweenrandvinG } The ordering used to compare landmark length tuples is the lexicographical one, with True < False. The latter ensures that the landmark flag of dL G(r , v) is set iff any of the shortest paths between r and v passes through another landmark. Lemma 5 Let dL G ′(r , v) = (d, l). If d = ∞ or l = True, then v has no r-label in �′. Otherwise, v has the r-label (r , d). Proof If v has any r -label in �′ it must be (r , d). As �′ is minimal, this r -label exists iff it is not redundant. For d = ∞ redundancy of (∞, r) is obvious. Otherwise (d, r) is redun- dant iff the correct distance could also be computed using the highway. This happens iff a shortest path between r and v passes through another landmark, which is indicated by the landmark flag. �� Lemma 6 A vertex v is LD-affected iff it satisfies: dL G(r , v) �= dL G ′(r , v). Proof Let lG and lG ′ denote the landmark flags of dL G(r , v) and dL G ′(r , v), respectively. Condition (ii) of Definition 9 states dG(r , v) �= dG ′(r , v). It suffices to show that for dG(r , v) = dG ′(r , v) condition (i) holds iff lG �= lG ′ . This is trivial for dG(r , v) = dG ′(r , v) = ∞. For finite distances, it follows from Lemma 5. �� 5.1 Insertion-only batch search To identify LD-affected vertices in the case of edge inser- tions, for an inserted edge (a, b), we need to compute the minimal landmark length of paths from r to b passing through a as an upper bound for the landmark distance of b. Since this may lead to frequently updating the landmark length of a path when appending another vertex, we define an operator for this: (d, l) ⊕ w := { (d + 1,True) ifw is a landmark, (d + 1, l) otherwise. Batch search for insertions is described in Algorithm 4. Its correctness is straightforward. Lemma 7 Algorithm4 returns the set of LD-affected vertices. Algorithm 4: Insertion-Only Batch Search 1 Function BatchSearchInsert(G ′, B, r , �) 2 foreach (a, b) ∈ B do 3 if dL G(r , a) ⊕ b < dL G(r , b) then 4 add ( dL G(r , a) ⊕ b, b ) to Q 5 else if dL G(r , b) ⊕ a < dL G(r , a) then 6 add ( dL G(r , b) ⊕ a, a ) to Q 7 while Q is not empty do 8 remove minimal (d, l, v) from Q 9 if v /∈ Vld- aff then 10 add v to Vld- aff 11 foreach w ∈ NG′ (v) do 12 if (d, l) ⊕ w < dL G(r , w) then 13 add ((d, l) ⊕ w,w) to Q 14 return Vld- aff Proof Any vertex added to Vld- aff must be LD-affected, as its landmark distance in G ′ is lower than in G. Conversely, if v is LD-affected, there must exist a strictly shorter path p in G ′. Let p be the shortest such path, b be the first LD-affected vertex in p and a its predecessor. Then the edge (a, b)must be inserted, and all vertices from b onwards must be LD-affected. As vertices are processed in order of distance, the computed distance boundwill beminimal, caus- ing b and all later vertices in p to be added to Vld- aff. �� 5.2 Deletion-only batch search The key idea for identifying an LD-affected vertex v in the case of edge deletions is to examine its neighbourhood. While this will not always give us its landmark distance in G ′ as neighbours may not have been updated yet, it does provide a lower bound that is sufficient for determining LD- affectedness. In case v turns out to be LD-affected, we will need to examine its neighbours anyways to identify other LD- affected vertices, and some of the required computations can be reused. To identify a vertex v as LD-affected, we use two criteria: 1. There must exist a path between r and v in G of length dL G(r , v) that passes through a deleted edge. 2. There must not exist a path between r and v in G ′ of length dL G(r , v). The first criterion provides us with candidates to examine further, while the second criterion is used to confirm or reject candidates. However, simply storing whether a vertex is LD- affected is not enough for this purpose. Example 5 Consider the graph below, where r , b and v are landmarks and edge (b, c) is getting deleted. 123 110 M. Farhan et al. r a b c v- Here, c is LD-affected as its landmark distance reduces from (2, L) to (2,−). However, v is not LD-affected as its landmark distance remains as (3, L). Deciding the latter is impossible if all we know is that its only neighbour is LD- affected. While the issue could be solved by storing the new land- mark distance for LD-affected vertices, we do not have this information available during batch search (e.g. for vertex a in Example 6). However, we can decide whether a path of equal length—but not passing through another landmark— still exists in G ′, and storing this information is sufficient for deciding LD-affectedness of other landmarks. Another key observation concerns the neighbours of v through which a path violating the second criterion (path in G ′ of equal length) might pass. By processing vertices in order of their distance from r in G, we ensure that these will already have been processed. Example 6 Consider the graph below, where edges (r , a) and (b, v) are getting deleted. r a b v c - - To confirm that v is LD-affected, we examine its neighbours a and c inG ′. Here a has already been processed and marked as LD-affected, as it is closer to r than v. Vertex c may not have been processed yet, and thus may wrongly suggest a path between r and v of length 3 in G ′. However, this is irrelevant for deciding whether a path of length dG(r , v) or dL G(r , v) exists in G ′. The resulting algorithm for batch search in case of deletion is given as Algorithm 5. Here Vd- aff contains distance- affected vertices. We show its correctness next. Lemma 8 Algorithm 5 returns the set of LD-affected vertices. Proof Let v be added to Vld- aff. This only happens if (dv, lv, v) was previously added to Q, at which point it was checked that (dv, lv) equals the landmark distance dL G(r , v). Additionally, line 16 must not have been reached, meaning the checks for a path in G ′ of landmark length that equal to (dv, lv) must have failed. For the correctness of this check, consider that any such path would need to pass through some neighbour of v in G ′, and this neighbour would need to be closer to r and not distance-affected. It follows that G ′ contains no path from r to v of landmark length dL G(r , v), meaning that v is LD-affected. Algorithm 5: Deletion-Only Batch Search 1 Function BatchSearchDelete(G ′, B, r , �) 2 foreach (a, b) ∈ B do 3 if dL G(r , a) ⊕ b = dL G(r , b) then 4 add ( dL G(r , a) ⊕ b, b ) to Q 5 else if dL G(r , b) ⊕ a = dL G(r , a) then 6 add ( dL G(r , b) ⊕ a, a ) to Q 7 while Q is not empty do 8 remove minimal (dv, lv, v) from Q 9 if v /∈ Vld- aff then 10 N ← ∅, v_daff ← True 11 foreach w ∈ NG′ (v) \ Vd- aff do // check for path of equal (landmark) length 12 if dv = dG(r , w) + 1 then 13 v_daff ← False 14 if (dv, lv) = dL G(r , w) ⊕ v then 15 if w /∈ Vld- aff ∨ v ∈ R then 16 continue from line 7 // check if w might be LD-affected 17 else if (dv, lv) ⊕ w = dL G(r , w) then 18 add ((dv, lv) ⊕ w,w) to N 19 add v to Vld- aff 20 if v_daff then 21 add v to Vd- aff 22 add tuples in N to Q 23 return Vld- aff Conversely, let v be LD-affected and p be some shortest path from r to v in G w.r.t. landmark length. As p does not lie in G ′, it must pass through one or more deleted edges. Let (a, b) be the deleted edge in p closest to v, with b closer to v than a. Then, all vertices in p between b and v must be LD- affected. By inductionwemay assume that all of them except v will be added to Vld- aff. As all neighbours of vertices in Vld- aff are checked for being LD-affected, v will be checked as well and added to Vld- aff. �� Remark 2 Both insertion-only anddeletion-onlybatch search algorithms can compute LD-affected vertices when they handle only edge insertions or only edge deletions. How- ever, when applying these two algorithms together on batch updates with mixed inserted and deleted edges, e.g. apply- ing insertion-only algorithm on edge insertions first and then deletion-only algorithm on edge deletions, or the reversed order, they fail to compute the exact set of LD-affected ver- tices. The next section will discuss the reasons in detail, which motivates our development of an improved algorithm BatchHL+. 123 BatchHL+: batch dynamic labelling for distance queries on large-scale networks 111 6 BatchHL+: improved algorithm In principle, one could handle any batch update consisting of both insertions and deletions by processing insertions first and deletions after, or vice versa. However, the issue with such an approach is that vertices may be affected by both insertions and deletion, causing them to be processed twice. It is even possible that insertions and deletions cancel each other out so that vertices are processed twice even though they would not need to be processed at all. Example 7 Consider the following graphs and updates.When considering insertions and deletions together, v is not LD- affected in either of the three cases. However, for cases (a) and (b), processing insertions first and deletions after will cause v to be LD-affected at both times. Similarly, for case (c), processing deletions first and insertions after will cause v to be LD-affected at both times. r a v - + (a) Ins + Del r a v + - (b) Ins + Del r a b v - + (c) Del + Ins While it is challenging to identify all cases where inser- tions and deletions cancel each other out, we will be able to avoid some of them. For example, we can avoid returning v for cases (b) and (c) in Example 7, but not for case (a). For case (a), the difficulty lies in not knowing the distance between r and a in the changed graph G ′. As illustrated in Example 6, computing this during batch search is tricky. Even though we will present an improved algorithm for processingmixed batches in the next section, it may be worth pointing out a small trick when splitting a batch B into inser- tions B+ and deletions B−. Instead of running Algorithm 4 on the graph obtained by applying B+ to G, we can run it on the graph obtained by apply the whole batch B. This prevents us from following paths that pass through a deleted edge after first passing through an inserted edge, as depicted in case (b) of Example 7. Clearly, there are still cases where this trick cannot stop an unaffected vertex from getting flagged as affected. Case (a) of Example 6 was one such case. There are other cases as shown in the example below. Example 8 Consider the following graph and updates. r a b v - + Here, v is not LD-affected but still returned by Algorithm 4, even with all of B used in constructing G ′. The improved algorithm presented next will avoid return- ing v, as the insertion of (a, v) will only be processed after the deletion of (r , a) has already been considered. Here the existence of the alternate path r − b− v, which has the same length as r − a − v, is crucial in avoiding the difficulties illustrated in Example 6. By the time v is being processed, we can be sure that b is unaffected and can thus conclude that v is unaffected. 6.1 Improved batch search To combine Algorithms 4 and 5, consider their search pro- cess. Both algorithms track the lengths of paths r . . . a − b . . . v passing through an updated edge (a, b). As the dis- tance between r and a is computed using the highway labelling �, and edges used to extend the paths lie in G ′, we are effectively considering paths that lie partially in G and partially in G ′. Therefore, we define the notion of composite distance. Definition 11 (Composite distance) The composite distance between r and v over (G,G ′) is the length of the shortest composite paths between them. We denote it as dc(r , v) := min {|p| | p is composite path from r to v } Example 9 Consider again the cases from Example 7. Here r − a − v is a composite path for case (a) but not for case (b). The composite distance between r and v is 2 for case (a) and ∞ for case (b). Our improved algorithm is presented as Algorithm 6. A detailed discussion is deferred until Sect. 8, where we prove that the vertex set identified by it can be characterized as follows. Definition 12 (LCD-affected) A vertex v is landmark- composite-distance-affected (LCD-affected) w.r.t. a land- mark r iff one of the following conditions is satisfied: – Landmark distance changes: dL G ′(r , v) �= dL G(r , v); – Composite distance changes: dc(r , v) �= dG(r , v). Clearly, this includes all LD-affected vertices, but may also include others, e.g. case (a) in Example 7, which is unde- sirable. The reason for those additional vertices is that we cannot easily compute dL G ′(r , v) during batch search. How- ever, we can compute dc(r , v) and decide whether a path of length dc(r , v) exists in G ′, essentially the same way as this was done in Algorithm 5. We call vertices where such a path exists stable, and use this to resolve situations similar to Example 8. We note that whenever B contains only inserted or only deleted edges, vertices are LCD-affected iff they are LD- affected. That is because edge deletions do not change the 123 112 M. Farhan et al. composite distance, while the absence of edge deletions means that a change in composite distance implies a change in landmark distance. 6.2 Improved batch repair Now, we introduce improved batch repair algorithm to repair LCD-affected vertices Vaff+ returned by Algorithm 6. To eliminate redundant r -labels, we need to track landmark distance, which requires the following notion of landmark distance bound. Definition 13 (Landmark distance bound) Let S ⊂ V \ {r} be a set of vertices. The landmark distance bound of vertex Algorithm 6: Improved Batch Search 1 Function BatchSearch(G ′, B, r , �) 2 Q,Q+ ← InitQueues(B, r , �) 3 while Q ∪ Q+ is not empty do 4 if minimal tuple in Q ∪ Q+ lies in Q then 5 remove minimal (dv, lv, stable, v) from Q 6 else 7 remove minimal (dv, lv, stable, v, a) from Q+ 8 if a ∈ Vaff+ then 9 continue 10 if v /∈ Vaff+ then 11 N ← ∅ 12 if ¬lv ∧ dv = dG(r , v) then // check for path of equal (landmark) length 13 foreach w ∈ NG′ (v) \ Vunstable do 14 (dw, lw) ← GetLD(w) 15 if dv = dw + 1 then 16 stable ← True 17 if dL G(r , v) = (dw, lw) ⊕ v then 18 continue from line 3 19 foreach w ∈ NG′ (v) \ Vaff+ do // check if w might be LCD-affected 20 if dv + 1 ≤ dG(r , w) then 21 add w to N 22 add v to Vaff+ 23 if stable then 24 LD[v] ← (dv, lv) 25 foreach w ∈ N do 26 if dL G(r , v) ⊕ w = dL G(r , w) < (dv, lv) ⊕ w or (dv, lv) ⊕ w < dL G(r , w) then 27 add ( (dv, lv) ⊕ w, True, w ) to Q 28 else 29 add v to Vunstable 30 foreach w ∈ N do 31 if dv + 1 < dG(r , w) or dL G(r , v) ⊕ w = dL G(r , w) then 32 add (dv + 1, False, False, w) to Q 33 return Vaff+ Algorithm 6: (Continued) 1 Function InitQueues(B, r , �) 2 foreach inserted (a, b) ∈ B do 3 if dL G(r , a) ⊕ b < dL G(r , b) then 4 add ( dL G(r , a) ⊕ b, True, b, a ) to Q+ 5 else if dL G(r , b) ⊕ a < dL G(r , a) then 6 add ( dL G(r , b) ⊕ a, True, a, b ) to Q+ 7 foreach deleted (a, b) ∈ B do 8 if dL G(r , a) ⊕ b = dL G(r , b) then 9 add ( dG(r , b), False, False, b ) to Q 10 else if dL G(r , b) ⊕ a = dL G(r , a) then 11 add ( dG(r , a), False, False, a ) to Q 12 return (Q,Q+) 13 Function GetLD(w) 14 if w ∈ Vaff+ then 15 return LD[w] 16 else 17 return dL G(r , w) v w.r.t. S is: dL bou(v, S) := min{dL G ′(r , w) ⊕ v | w ∈ NG ′(v) \ S}. The following lemma allows us to compute the land- mark distances of vertices in Vaff+ from a landmark r in the changed graph G ′ using their landmark distance bounds. Lemma 9 Let S ⊂ V \{r} and v ∈ S with minimal landmark distance bound. Then dL G ′(r , v) = dL bou(v, S). Proof For dG ′(r , v) = ∞ this is trivial. Otherwise, let p be a shortest-path from r to v in G ′ w.r.t. landmark length, v′ the first vertex in p that lies in S, andw its predecessor in p. Since w /∈ S, we have dL bou(v ′, S) ≤ dL G ′(r , w) ⊕ v = dL G ′(r , v′). If v′ �= v then dG ′(r , v′) < dG ′(r , v) ≤ dbou(v, S), and thus dbou(v ′, S) < dbou(v, S). This contradicts the minimality of dbou(v, S), so v′ = v. It follows that dL bou(v, S) = dL G ′(r , v). �� Note that again dL G ′(r , v) = dL bou(r , S) does not generally hold for every boundary vertex v. Algorithm 7 shows the pseudo-code of our improved batch repair algorithm. For a graphG ′ and a set of all affected vertices Vaff+, we first com- pute the landmark distance bounds of vertices in Vaff+ using their unaffected neighbours. We then find vertices in Vaff+ with minimal distance bounds and remove them from Vaff+. By Lemma 9 their landmark distances to r in G ′ equal their landmark distance bounds. We use these landmark distances to update their r -labels, as well as their highway distances in the case of landmarks. Finally, we update the landmark dis- tance bounds of neighbouring vertices in Vaff+. We continue this process until Vaff+ is empty. 123 BatchHL+: batch dynamic labelling for distance queries on large-scale networks 113 Algorithm 7: Improved Batch Repair 1 Function BatchRepair(G ′, Vaff, ri , �, �′) 2 foreach v ∈ Vaff do 3 Dbou[v] ← dL bou(v, Vaff) // use � to compute 4 while Vaff is not empty do 5 Vmin ← {v ∈ Vaff | Dbou[v].d is minimal} 6 remove Vmin from Vaff 7 foreach v ∈ Vmin do 8 if Dbou[v].d = ∞ ∨ Dbou[v].l then 9 remove r -label from �′(v) 10 else 11 set r -label of �′(v) to (ri , Dbou[v].d) 12 if v is a landmark then 13 δ′ H (ri , v) ← Dbou[v].d 14 foreach w ∈ NG′ (v) ∩ Vaff do 15 Dbou[w] ← min(Dbou[w], Dbou[v] ⊕ w) 6.3 An illustrative example The following example illustrates the individual steps which our improved algorithm BatchHL+ runs through. Example 10 Consider the following graph and updates: a b r1 c r2 d e f g h i - + + The initial highway labelling � = (H , L) looks like this: H = {δH (r1, r2) = 2}, L = a b c d e f g h i (r1,1) (r1,1) (r1,1) (r1,2) (r1,1) (r1,2) (r1,3) (r2,1) (r2,1) (r2,2) (r2,1) (r2,2) (r2,2) BatchHL+ initializes �′ as � and then runs BatchSearch (Algorithm 6) and BatchRepair (Algorithm 7) for both land- marks r1 and r2. For landmark r1, Algorithm 6 returns the following set of affected vertices: Vaff+ = { f , g, h} Due to the inserted edge (a, r2), the new paths from a to r2, d and i have the same landmark length as existing ones and are thus pruned. The eliminated path r1 − f − g− h − i has strictly greater landmark length than the existing path through r2 and is also pruned. For vertex e, the stable path r1 − b − e ensures us that e is not LCD-affected. For comparison, the BatchSearch of BatchHL described by Algorithm 2 would return the following larger set of ver- tices: Vaff+ = {r2, d, e, f , g, h, i} Note that, here, vertex e is not P-affected, but still returned due to the composite path r1 − f − e. Now,we runAlgorithm7on Vaff+ = { f , g, h}. The initial landmark distance bounds for this set are dL bou(r1, . . .) = f g h (3,False) (3,True) (5,True) In the first iteration f and g have minimal distance bounds. Thus, the r1-label in L ′( f ) is updated to (r1, 3) and the r1- label in L ′(g) is removed. After that, dL bou(r1, h) is updated to (4,True) and the r1-label in L ′(h) is removed. This leaves L ′ as L ′ = a b c d e f g h i (r1,1) (r1,1) (r1,1) (r1,2) (r1,3) (r2,1) (r2,1) (r2,2) (r2,1) (r2,2) (r2,2) For landmark r2, Algorithm 6 and Algorithm 2 return Vaff+ = {a, e} and Vaff+ = {r1, a, b, e}, respectively. Then, running Algorithm 7 on Vaff+ = {a, e} inserts (r2, 1) into L ′(a) and (r2, 2) into L ′(e) for the updated high- way labelling as L ′ = a b c d e f g h i (r1,1) (r1,1) (r1,1) (r1,2) (r1,3) (r2,1) (r2,1) (r2,1) (r2,3) (r2,2) (r2,1) (r2,2) (r2,2) 7 Comparison of batch search algorithms In this section, we provide a brief theoretical comparison between different batch search algorithms, including the algorithms used in BatchHL (Sect. 4), the algorithms of han- dling insertions and deletions separately (Sect. 5), and the improved algorithm used in BatchHL+ (Sect. 6). Recall that, ideally, we would only want to return vertices that are LD- affected w.r.t. an entire batch update, but all these algorithms fail at that and return additional vertices. Figure 2 summarizes the relationships between different concepts defined andvertex sets returnedby these algorithms. Here, BatchHL was originally introduced in [18], BHLI+D and BHLD+I denote the sequential applications of insertions before deletions or deletions before insertions as described by Algorithms 4 and 5, and BHLI+D trick sequential application with the trick described in Sect. 5, and BatchHL+ refers to 123 114 M. Farhan et al. LD-affected BHLD+I P-affected LCD-affected (BatchHL+) BHLI+D trick BHLI+D BatchHL CP-affected Fig. 2 Relationships between “affected” vertex sets defined and/or returned by different batch search algorithms. Each arrow indicates a subset containment r a v - + (a) r a v + - (b) r a b v + (c) r a b v - (d) r a b v+ - (e) r a b v - + (f) r a b c v+ - (g) Fig. 3 Graphs illustrating differences between “affected” vertex sets defined and/or returned by batch search algorithms Table 2 Containment of vertex v in Fig. 3 in the vertex sets of different batch search algorithms (a) (b) (c) (d) (e) (f) (g) P-affected No No Yes Yes Yes No Yes LD-affected No No No No No No No CP-affected Yes No Yes Yes Yes Yes Yes LCD-affected Yes No No No No No No (BatchHL+) BatchHL Yes No No Yes Yes Yes No BHLD+I No No No No Yes No Yes BHLI+D Yes Yes No No No Yes No BHLI+D trick Yes No No No No Yes No our improved algorithm as described by Algorithms 6 and 7 in Sect. 6. Figure 3 illustrates cases that differentiate the vertex set definitions and algorithm outputs. For each of these cases, Table 2 lists whether vertex v can be identified by different “affected” notions and algorithms. We note that case (g) in Fig. 3 is designed to trigger the corner case of Algorithm BatchHL, described in the proof of Lemma 5.18 in [18], where the search follows a path whose extended landmark length is not minimal. This corner case preventsBHLD+I fromalways returning a subset ofBatchHL. In the following, we denote by VI+D the set of vertices returned by eitherAlgorithm4 or 5 if insertions are processed first, and byVD+I if deletions are processedfirst.Wenote that VD+I does not need to contain all LCD-affected vertices, as evidenced by case (a) of Example 7, where v is LCD-affected but not returned. However, it may also contain vertices that are not LCD-affected, as seen in case (c) of Example 7, where v is returned but it is not LCD-affected. Thus we can only compare it against Algorithm 6 experimentally. However, we can easily establish a relationship between VD+I and the set of P-affected vertices as in Definition 4. Lemma 10 VD+I contains only P-affected vertices. Proof If v is returned by Algorithm 5, then by Lemma 8 v is LD-affected w.r.t. deletions. Thus, a shortest path in G is eliminated, and v is P-affected. If v is not returned by Algorithm 5 but returned by Algo- rithm 4, then by Lemma 7 v is LD-affected w.r.t. insertions into the intermediate graph G−, the result of applying edge deletions to G. Thus the landmark distance between r and v is strictly smaller in G ′ than it is in G−. Since v was not returned by Algorithm 5, the landmark distance between r and v in G is the same as in G−. Thus, v is LD-affected and P-affected w.r.t. the combined batch update. �� For VI+D , the situation is nonetheless different. As seen in case (b) of Example 7, VI+D may contain vertices that are not LCD-affected. However, as we will show below, it will never return fewer. Lemma 11 VI+D contains all LCD-affected vertices. Proof Let v be LCD-affected. Definition 12 requires dc(r , v) < dG(r , v) or dL G ′(r , v) �= dL G(r , v) If dc(r , v) < dG(r , v) then v is LD-affected w.r.t. inser- tions only, and thus returned by Algorithm 4. If dL G ′(r , v) �= dL G(r , v) then v is LD-affected w.r.t. the combined batch update, and thus returned. �� We note that Lemma 11 still holds if the final graph G ′ is passed to Algorithm 4 (the “trick” mentioned in Sect. 5). This can be seen by considering that for dc(r , v) < dG(r , v) the algorithm will use the last inserted edge in a shortest composite path as a starting point, and by Definition 5 all subsequent edges must lie in G ′. Example 8 shows that non- LCD-affected vertices may still be returned. 123 BatchHL+: batch dynamic labelling for distance queries on large-scale networks 115 8 Theoretical discussion 8.1 Proof of correctness Now we prove that Algorithm 6 returns precisely the set of LCD-affected vertices. As illustrated in Example 8, we are particularly interested in vertices for which the distance to r in G ′ is precisely the composite distance. In these cases we can compute the landmark distance to r in G ′, which enables us to determine with certainty whether or not its landmark distance changes. Definition 14 (Stable path and stable vertex) We call a path from r to v stable iff it lies in G ′ and has length dc(r , v). We call v stable w.r.t. r iff such a stable path exists, i.e. iff dG ′(r , v) = dc(r , v). As mentioned before, we can determine the updated land- mark distance of stable vertices during batch search. For others we can only determine the composite distance. This motivates the following definition. Definition 15 (Composite landmark distance) Given a com- posite path p, the composite landmark length of p is its landmark length if it lies in G ′, and (|p|,False) otherwise. Accordingly, the composite landmark distance of a vertex is defined as dL c (r , v) := { dL G ′(r , v) if v is stable; (dc(r , v),False) otherwise. Definition 16 (Stable landmark distance) The stable land- mark distance additionally stores whether v is stable: dS c (r , v) := { (dL G ′(r , v),True) if v is stable; (dc(r , v),False,False) otherwise. The distance component of dL c (r , v) is dc(r , v) in both cases, while the landmark flag indicates the existence of a stable path that passes through another landmark.When com- paring stable landmark length values, we use lexicographical ordering with True < False, as for landmark length. This is needed to ensure that stable paths are processed first, so that the stable variable is set correctly if the check in line 12 fails in Algorithm 6. We first prove the following lemma. Lemma 12 A vertex v is LCD-affected iff (i) v is unstable, or (ii) dL c (r , v) �= dL G(r , v) Proof We prove the “if” and “only if” below. – (if ) Let v be unstable, meaning dc(r , v) �= dG ′(r , v). If dc(r , v) �= dG(r , v) then its composite distance changes. If dc(r , v) = dG(r , v) then dG ′(r , v) �= dG(r , v) and its landmark distance changes. Otherwise v must be stable and dL c (r , v) �= dL G(r , v). This means dL G ′(r , v) �= dL G(r , v) and its landmark distance changes. – (only if ) Let v be LCD-affected but stable so that dL c (r , v) = dL G ′(r , v). If dL G ′(r , v) �= dL G(r , v) then dL c (r , v) �= dL G(r , v) follows. Otherwise dc(r , v) �= dG(r , v) and dL c (r , v) �= dL G(r , v) follows. �� Note that together with the landmark distance, the stable landmark distance provides precisely the information needed to check the conditions of Lemma 12. The following terminology will be useful for identifying paths through which vertices are visited by Algorithm 6. In particular, this is restricted by the checks in lines 26 and 31 of Algorithm 6. Definition 17 (LCD-parent) We call v an LCD-parent of w, and w an LCD-child of v, iff (a) v and w are LCD-affected, and (b) w is a neighbour of v in G ′, and (c) dL c (r , w) = { dL c (r , v) ⊕ w if v is stable, (dc(r , v) + 1,False) otherwise, and (d) dc(r , v) + 1 < dG(r , w), or dL G ′(r , v) ⊕ w < dL G(r , w), or dL G(r , v) ⊕ w = dL G(r , w) < dL G ′(r , w). We use the terms LCD-ancestor and LCD-descendant to denote the reflexive and transitive closures of the LCD-parent and LCD-child relationships. Note that conditions (2) and (3) mean that v is the pre- decessor of w on a path from r to w of minimal composite landmark length. The conditions of (4) correspond precisely to those of Definition 12: either a path through v reduces the composite distance to w, or it reduces the landmark dis- tance to w, or the landmark distance of w increases due to elimination of a shortest path passing through v. Example 11 Consider the following graphs and updates. r a b c v - - r a b c v + + For the graph on the left, the LCD-ancestors of v are a, c and v itself. Here, b is not an LCD-parent of c as the conditions of (4) are violated: dL G(r , c) < dL G ′(r , c) but dL G(r , b) ⊕ c �= dL G(r , c). For the graph on the right, the LCD-ancestors of v 123 116 M. Farhan et al. are again a, c and v itself. b is not an LCD-parent of c as the conditions of (3) are violated. The following lemma shows that all LCD-affected ver- tices can be traced back to a vertex initially enqueued in Algorithm 6. Lemma 13 Let b beLCD-affectedwith noLCD-parent. Then, there exists a vertex a such that either (a) edge (a, b) is inserted, dL G(r , a) ⊕ b = dL c (r , b) < dL G(r , b), and a is not LCD-affected, or (b) edge (a, b) is deleted, dL G(r , a) ⊕ b = dL G(r , b), and (dG(r , b),False) = dL c (r , b). Proof Since b is LCD-affected, Definition 12 requires – dc(r , b) < dG(r , b) or – dL G ′(r , b) < dL G(r , b) or – dL G ′(r , b) > dL G(r , b). As b has no LCD-parents, every vertex a must violate one of the conditions (1)-(4) of Definition 17. (1) Assume dc(r , b) < dG(r , b). Let p be a composite path from r to b of minimal composite landmark length, and a the predecessor of b in p. Then, dc(r , a) + 1 = dc(r , b), so condition (4) holds. If a is stable, then dL c (r , a) ⊕ b = dL c (r , b); otherwise, the landmark flag of dL c (r , b) is False by Definition 15. Thus, condition (3) holds in both cases. Condition (2) must hold as well—otherwise the last edge (a, b) in the composite path p is a deleted edge,which implies that p lies inG, contradicting dc(r , b) < dG(r , b). This only leaves condition (1) to be violated, so a cannot be LCD- affected. Thus, dL G(r , a) ⊕ b = dL c (r , a) ⊕ b = dL c (r , b), and therefore dG(r , a) + 1 < dG(r , b). It follows that (a, b) must be inserted and dL G(r , a) ⊕ b = dL c (r , b) < dL G(r , b) holds. (2) Assume dc(r , b) = dG(r , b) and dL G ′(r , b) < dL G(r , b). Then, dc(r , b) ≤ dG ′(r , b) ≤ dG(r , b) = dc(r , b) so b is stable. Let p be a stable path from r to b of mini- mal landmark length, and a the predecessor of b in p. Then, conditions (2) and (3) clearly hold. Since dL G ′(r , a) ⊕ b = dL G ′(r , b) < dL G(r , b) condition (4) holds as well, so con- dition (1) must be violated and a cannot be LCD-affected. Thus, dL G(r , a) ⊕ b = dL c (r , a) ⊕ b = dL c (r , b) < dL G(r , b), implying that (a, b) is inserted. (3) Assume dc(r , b) = dG(r , b) and dL G ′(r , b) > dL G(r , b). Ifb is unstablewehavedL c (r , b) = (dc(r , b),False) = (dG(r , b),False) by Definition 15. Otherwise dG(r , b) = dc(r , b) = dG ′(r , b), so dL G ′(r , b) > dL G(r , b) is only pos- sible if dL G ′(r , b) = (dG(r , b),False). Thus dL c (r , b) = dL G ′(r , b) = (dG(r , b),False), so the last condition in case (b) of Lemma 13 holds in both cases. Let p be a path from r to b in G of minimal land- mark length, and a the predecessor of b in p. This implies dL G(r , a) ⊕ b = dL G(r , b), so it only remains to show that (a, b) is deleted. Assume to the contrary that (a, b) lies in G ′, so condition (2) holds immediately. Since p has length dc(r , b) we have dc(r , b) = dc(r , a) + 1. Together with earlier results this gives us dL c (r , b) = (dc(r , b),False) = (dc(r , a)+1,False). Thus condition (3) holds if a is unstable. If a is stable, then dL c (r , b) ≤ dL c (r , a)⊕b. This is only pos- sible if the landmark flag of dL c (r , a)⊕b is False, so condition (3) holds again. Finally dL G(r , a)⊕b = dL G(r , b) < dL G ′(r , b) so condition (4) holds as well. Thus, condition (1) must be violated, so a cannot be LCD-affected. It follows that dL G ′(r , a) = dL G(r , a), and thus dL G ′(r , b) ≤ dL G ′(r , a) ⊕ b = dL G(r , a) ⊕ b = dL G(r , b) violating dL G ′(r , b) > dL G(r , b), a contradiction. �� The next lemma shows that every LCD-child meets the pruning conditions in lines 20, 26 and 31 of Algorithm 6. Lemma 14 Let w be an LCD-child of v. Then, (i) dc(r , v) + 1 ≤ dG(r , w) (ii) if v is stable, then dL c (r , v) ⊕ w < dL G(r , w) or dL G(r , v) ⊕ w = dL G(r , w) < dL c (r , v) ⊕ w (iii) if v is unstable, then dc(r , v) + 1 < dG(r , w) or dL G(r , v) ⊕ w = dL G(r , w) Proof Consider the cases in condition (4) of Definition 17. If dc(r , v) + 1 < dG(r , w) then conditions (1), (2) and (3) follow immediately. If dL G ′(r , v) ⊕ w < dL G(r , w) then dG ′(r , v) + 1 ≤ dG(r , w), and condition (1) follows from dc(r , v) ≤ dG ′(r , v). If v is stable, then dL c (r , v) = dL G ′(r , v) and con- dition (2) follows. If v is unstable, then dc(r , v) < dG ′(r , v) and thus dc(r , v) + 1 < dG ′(r , v) + 1 ≤ dG(r , w), showing condition (3). If dL G(r , v) ⊕ w = dL G(r , w) < dL G ′(r , w) then condition (3) follows immediately, and condition (1) from dc(r , v) + 1 ≤ dG(r , v) + 1 = dG(r , w). For condition (2) stability of v gives us dL G(r , v) ⊕ w = dL G(r , w) < dL G ′(r , w) ≤ dL G ′(r , v) ⊕ w = dL c (r , v) ⊕ w showing condition (2). �� We are now ready to prove our main result. Theorem 1 Algorithm 6 returns the set of all LCD-affected vertices. 123 BatchHL+: batch dynamic labelling for distance queries on large-scale networks 117 Proof We prove it by showing the following: (a) For each (d, l, s, v) ∈ Q, we have dL c (r , v) ≤ (d, l) and d ≤ dG(r , v). If l = True, then s = True and (d, l) < dL G(r , v). (b) From line 10 onwards, we have dL c (r , v) ≤ (dv, lv) and dv ≤ dG(r , v). If lv = True, stable = True and (dv, lv) < dL G(r , v). (c) From line 11, we have dv = dc(r , v). (d) In line 14 either dw ≥ dc(r , w) ≥ dv , or (dw, lw) = dL c (r , w) and w is stable. (e) If line 18 is reached, then v is not LCD-affected. (f) If line 19 is reached, then v is LCD-affected. (g) From line 19, we have (dv, lv, stable) = dS c (r , v). (h) The while loop has the following invariant: If v is LCD-affected with v /∈ Vaff+, then Q ∪ Q+ con- tains a tuple (d, l, . . .) with (d, l) ≤ dL c (r , v). Once shown, (h) guarantees that every LCD-affected vertex is included in Vaff+, while (f) ensures that these are the only ones. By considering only the first time any condition would be violated, we may assume they all hold for earlier steps. Case (a): For insertions prior to thewhile loop, the conditions clearly hold. Let (d, l, s, w) be the first tuple inserted during the while loop that violates the condition, and consider the iteration duringwhich this insertion occurred. By (g)we have (dv, lv) = dL c (r , v), and the check in line 23 succeeds iff v is stable. If v is stable, then insertion occurs in line 27. Thus, (d, l) = (dv, lv) ⊕ w = dL c (r , v) ⊕ w = dG ′(r , v) ⊕ w ≥ dG ′(r , w) ≥ dL c (r , w). Otherwise, insertion occurs in line 32, and we have (d, l) = (dv + 1,False) = (dc(r , v) + 1,False) ≥ (dc(r , w),False) ≥ dL c (r , w). This shows the first inequality. The check in line 20 ensures the second inequality d ≤ dG(r , w). Finally, let l = True. Then, (d, l, s, w) must have been inserted in line 27. Thus, we have (d, l) = (dv, lv) ⊕ w, s = True and either (dv, lv) ⊕ w < dL G(r , w) or dL G(r , w) < (dv, lv) ⊕ w. In the first case (d, l) < dL G(r , w) follows immediately. Otherwise, we have dL G(r , w) < (d, l) and dG(r , w) ≤ d, which together with the second inequality implies d = dG(r , w). Thus, dL G(r , w) < (d, l) is only pos- sible for l = False, a contradiction. Case (b): If (dv, lv, . . .) comes from Q, the claim follows from (a). Note that variable stable will never be updated to False.Otherwise, (dv, lv, stable, v, a) comes fromQ+. Then, stable is True, (a, v) lies in G ′, and (dv, lv) = dL G(r , a) ⊕ v < dL G(r , v). Thus, dv ≤ dG(r , v) holds. Since the check in line 8 succeeded a /∈ Vaff+. By (h) and minimality of dv > dG(r , a) ≥ dc(r , a), it follows that a is not LCD- affected. Thus dL c (r , v) ≤ dL c (r , a) ⊕ v = (dv, lv). Case (c): The check in line 10 must have succeeded, so v /∈ Vaff+. If v is LCD-affected, then (dv, lv) ≤ dL c (r , v) by (h) and the claim follows from (b). If v is not LCD-affected, then dL c (r , v) = dL G(r , v) by Lemma 12 and the claim follows from (b). Case (d): If w ∈ Vaff+, then (dw, lw) = LD[w] = dL c (r , w) by (g), and w is stable by (g); thus, (d) holds. Otherwise, (dw, lw) = dL G(r , w); thus, dw = dG(r , w) ≥ dc(r , w). If dc(r , w) ≥ dv , then (d) holds. Let dc(r , w) < dv . Since dv was minimal in Q ∪ Q+ and w /∈ Vaff+, w cannot be LCD- affected by (h). By Lemma 12 w is stable and dc(r , w) = dL G(r , w) = (dw, lw), so (d) holds. Case (e): Assume to the contrary that v is LCD-affected. Then, (h) implies (dv, lv) ≤ dL c (r , v) as (dv, lv) is minimal inQ∪Q+, and thus (dv, lv) = dL c (r , v) by (b). Denote byw the neighbour of v for which line 18 is reached. The checks in lines 12 and 15 ensure that dG(r , v) = dw + 1. Thus, w is stable and (dw, lw) = dL c (r , w) by (d). This implies dc(r , v) = dc(r , w) + 1, so v is stable. The check in line 17 ensures that dL G(r , v) = dL c (r , w) ⊕ v ≥ dL c (r , v). Hence, either dL c (r , v) < dL G(r , v) or v is not LCD-affected by Lemma 12. Assume the former. Since dG(r , v) = dv = dc(r , v), this requires dc(r , v) = (dv,True). But that means lv = True and the check in line 12 would have failed. Case (f): Assume to the contrary that v is not LCD-affected. Wewill show that execution would have reached line 18 first. If lv = True, we would have dL c (r , v) ≤ (dv, lv) < dL G(r , v) by (b) which contradicts Lemma 12, and thus lv = False. By (c) and Lemma 12, we have dv = dc(r , v) = dG(r , v), and the check in line 12 succeeds. By Lemma 12, v is stable. Hence, there must exist a stable path p from r to v of landmark length dL c (r , v). Let w be the predecessor of v in p so that w is stable and dL c (r , w) ⊕ v = dL c (r , v). From (g) it follows that Vunstable contains precisely the unstable vertices in Vaff+. Hence, w lies in NG ′(v) \ Vunstable. By (d) and dc(r , w) < dc(r , v) = dv , we have (dw, lw) = dL c (r , w). It follows that dv = dc(r , v) = dc(r , w) + 1 = dw + 1 and dL G(r , v) = dL c (r , v) = dL c (r , w) ⊕ v = (dw, lw) ⊕ v. Thus, w passes all checks required to reach line 18, and line 19 is not reached. 123 118 M. Farhan et al. Case (g): By (f), v is LCD-affected. Hence, (h) implies (dv, lv) ≤ dL c (r , v), and (dv, lv) = dL c (r , v) by (b). It remains to show that variable stable is True iff v is stable. If tv came fromQ+ with anchor a, then stable is True and dc(r , a) ≤ dG(r , a) < dG(r , a) + 1 = dv . Since the check in line 8 must have succeeded, we have a /∈ Vaff+. By (h) it follows that a is not LCD-affected. Since dc(r , v) = dv = dG(r , a)+1 = dc(r , a)+1wemay conclude that v is stable. Let tv have come fromQ. If stable was initially True, then the tuple must have been inserted in line 27 while examining a parent vertex u. As (g) holds for all earlier iterations, umust have been stable with dc(r , u)+ 1 = dv = dc(r , v), and v is stable. So let stable have been initially False. lv = False by (a). We distinguish two cases. (i) If dv = dG(r , v) then the check in line 12 succeeds. By Definition 15 v is stable iff v has a stable neighbourw in G ′ with dc(r , w)+1 = dc(r , v). By (d) line 16 is reached iff such a neighbour exists, so the claim holds. (ii) If dv �= dG(r , v), then the check in line 12 fails. Thus, stable will still be false by the time when line 19 is reached. Assume to the contrary that v is stable. Then, it must have a stable neighbour a in G ′ with dc(r , a) + 1 = dc(r , v) = dv < dG(r , v). If a is not LCD-affected, then (a, v) must have been inserted. Hence, a tuple ta = (dc(r , v), l,True, v, a) must have been inserted into Q+. Minimality of tv in Q ∪ Q+ means that ta must have been removed from Q+ earlier. At this point the checks in lines 8 and 12 would have failed, causing v to be added to Vaff+, a contradiction. If a is LCD-affected, then a ∈ Vaff+ by (h) and min- imality of dv in Q ∪ Q+. Since a is stable, a tuple ta = (dc(r , v), l,True, v) must have been inserted inQ in line 27 when processing a. Minimality of (dv,False,False) in Q ∪ Q+ means that ta must have been removed from Q earlier. At this point the check in line 12 would have failed, causing v to be added to Vaff+, a contradiction. Case (h): We will show a stronger invariant, namely that if v is LCD-affected with v /∈ Vaff+ then either – (Q)Q contains a tuple (d, l, s, b) with (d, l) ≤ dL c (r , v) where b /∈ Vaff+ is an LCD-ancestor of v, or – (Q+) Q+ contains a tuple (d, l, s, b, a) with (d, l) ≤ dL c (r , v) where b /∈ Vaff+ is an LCD-ancestor of v and a is not LCD-affected. For every LCD-affected vertex v, there exists an LCD- ancestor of v that has no LCD-parent, possibly v itself. Thus InitQueues will add a tuple satisfying either (Q) or (Q+) by Lemma 13, so the invariant holds initially. Consider now an iteration of the while loop where v is LCD-affected with v /∈ Vaff+. We need to show that the invariant is maintained for its LCD-descendants. Denote by tv the tuple removed from Q ∪ Q+. If tv ∈ Q+, then by (f) a /∈ Vaff+, and the check in line 8 fails. Thus, execution reaches line 10 in both cases, and the check here passes by assumption. By (e) line 18 is never reached, and execution reaches line 22where v is added toVaff+. Hence, the invariant is maintained for v. Finally, letw be an LCD-child of v. By Lemma 14 and (g), the checks in lines 20 and 26 (if v is stable) or 31 (otherwise) succeed. Thus, a tuple (dw, lw, s, w) is added toQ in line 27 or 32. By condition (3) of Definition 17, we have (dw, lw) = dL c (r , w), and the invariant is maintained for w and its LCD- descendants. �� 8.2 Proof of minimality Theorem 2 The highway labelling�′ that BatchHL+ returns is the minimal highway labelling for G ′. Proof By Lemma 6 and Theorem 1, the vertex set Vaff+ returned by Algorithm 6 contains all LD-affected vertices. By Lemma 6, for vertices outside of Vaff+ the landmark dis- tance to ri does not change. In line 3 of Algorithm 7 the value of dL bou(v, Vaff+) can be computed from �. From Lemma 9, it follows that Dbou[v] = dL G ′(ri , v) whenever vertex v lies in Vmin. For each landmark r and each LD-affected vertex v w.r.t. r , we update the r -label of v in �′ based on its landmark distance to r in G ′. By Lemma 5, these updates are correct. As the r -labels of vertices outside of Vaff+ do not change, and we initialized�′ using�, this leave all verticeswith correct r - labels, for all r ∈ R, so the distance labelling of �′ is correct and minimal. Highway is updated for vertices in Vaff+ for all r ∈ R, and do not change for others by Definition 9. �� 8.3 Complexity analysis We analyse the time and space complexity of BatchHL+. Let a be the total number of affected vertices, l be the maximum label size, and d be the maximum degree. We perform |R| BFSs to construct highway labelling in O(|R| · |V |) time and space. The time complexity of Algorithm 6 is O(a · d · l), where it visits O(a) vertices and for each affected vertex per- forms d queries to check its affected neighbours in O(d · l) time. Note that a in Algorithm 6 refers to the total num- ber of LCD-affected vertices which is much smaller than the total number of CP-affected vertices being referred to as a by Algorithm 2 of BatchHL. In practice, l and d are closer to the average values, and a is usually orders of magnitudes smaller than the total number of vertices in a graph. Further, in con- trast to Algorithm 3 of BatchHL which repairs CP-affected vertices returned by Algorithm 2, Algorithm 7 of BatchHL+ repairs LCD-affected vertices returned byAlgorithm6. In the worse case, both Algorithm 3 and Algorithm 7 could repair the labels of all affected vertices. When repairing the label of an affected vertex, we check its neighbours in O(d). Thus, 123 BatchHL+: batch dynamic labelling for distance queries on large-scale networks 119 the time complexity of Algorithm 7 is (a · d), and the over- all time complexity of BatchHL+ is O(|R| · a · d · l) using O(V ) space. We omit l from the time complexity of Algo- rithms 3 and 7 because we store distances for all unaffected neighbours of affected vertices in Algorithms 2 and 6. 9 Implementation optimizations When implementingAlgorithm6, landmark length and stable landmark length values can be encoded as single integers to save storage space and speedup computation. Further, storing computed landmark distances in G for later reuse can also speed up processing. For undirected graphs, the for loop in line 19 can be inte- grated into the for loop in line 13 if the latter is reached. For directed graphs, it can be eliminated altogether, in which case the for loops in lines 25 and 30 iterate over the outgoing neighbours. Case (g) in Theorem 1 ensures that LD[v] = dL G ′(r , v) for all stable LCD-affected vertices. Hence the highway labels for these can be updated directly, and batch repair can be restricted to unstable vertices. Eliminating duplicate values inQ or checking whether a value has already been processed in line 10 stops vertices fromgetting processedmultiple times (whereas checking containment in Vaff+ only prevents this for LCD-affected vertices). Case (e) shows that this will not cause any LCD-affected vertices to be missed. However, the overhead of tracking visited vertices may not be worth it in situations where repeated visits are rare. 10 Experiments We have implemented our proposed method BatchHL+ to answer the following questions: (1) How efficiently can BatchHL+ perform in comparison with the state-of-the-art dynamic algorithms for answer- ing distance queries? (2) How does the number of landmarks affect the perfor- mance of BatchHL+? (3) How does the number of affected vertices effect the performance of BatchHL+ in comparison with the state- of-the-art dynamic algorithms? (4) What is the effect of the size of batch updates on the performance of BatchHL+? Experimental setup. In our experiments, all algorithms are implemented in C++11 and compiled using g++ 5.5.0 with the -O3option.All the experiments are performedusing a sin- gle thread on a Linux server (Intel Xeon W-2175 (2.50GHz CPU) with 28 cores and 512GB of main memory). Baseline methods. We compare our proposed method BatchHL+ with the following state-of-the-art methods. – BatchHL [18]: A batch-dynamicmethodwhich performs graph changes in the batch-update setting to efficiently maintain a highway cover distance labelling. Then, it combines the updated labelling with graph traversal algorithm to answer distance queries. We compare our proposed method with the optimised version of BatchHL in our experiments. – FulHL [20]:A fully dynamicmethodwhich presents two algorithms IncHL and DecHL to maintain a highway cover distance labelling as a result of graph changes in the single-update setting. Then, it combines the updated labelling with graph traversal algorithm to answer dis- tance queries. – FulFD [23]: A fully dynamic method that incorporates two algorithms IncFD and DecFD to maintain a dis- tance labelling in the single-update setting as a result of graph changes. Then, it combines the updated distance labelling with a graph traversal algorithm to answer dis- tance queries. – PSL* [27]: A parallel algorithmwhich constructs pruned landmark labelling for static graphs to answer distance queries. – BiBFS [23]: An online bidirectional BFS algorithm which answers distance queries using an optimized strat- egy to expand searches from the direction with fewer vertices. The code for FulFD, FulPLL and PSL* was provided by their authors and implemented in C++. We use the same parameter settings as suggested by their authors unless other- wise stated. For a fair comparison, we also select high degree landmarks and set them to 20 in the same way as FulFD for our methods. We set the number of threads to 20 for PSL*. Datasets.Weuse 15 large real-world networks from a variety of domains to verify the efficiency, scalability and robustness of our algorithm. Among them, Italianwiki and Frenchwiki are two real dynamic networks whose topology evolves over time. We treat these networks as undirected and unweighted graphs, and their statistics are summarized in Table 3. These datasets are accessible at Stanford Network Analysis Project [26], Laboratory for Web Algorithmics [9], and the Koblenz Network Collection [25]. Test data generation. For batch-dynamic algorithms, we generate 10 batches of updates for all the 13 datasets, where each batch update contains 1000 edges that are randomly selected from E . We consider three batch update settings for testing: (1) Decremental—delete these batches of updates and mea- sure the average deletion time; 123 120 M. Farhan et al. Table 3 Datasets, where si ze(G) denotes the size of a graph G with each edge appearing in the forward and reverse adjacency lists and being represented by 8 bytes Dataset Network n m m/n Avg. deg Max. deg Avg. dist si ze(G) Youtube social (u) 1.1M 3M 2.63 5.265 28754 5.3 23 MB Skitter comp (u) 1.7M 11M 6.54 13.08 35455 5.0 85 MB Flickr social (u) 1.7M 16M 9.07 18.13 27224 5.3 119 MB Wikitalk comm (d) 2.4M 5M 1.95 3.890 100029 3.9 36 MB Hollywood social (u) 1.1M 114M 49.5 98.91 11467 3.9 430 MB Orkut social (u) 3.1M 117M 38.1 76.28 33313 4.2 894 MB Enwiki social (d) 4.2M 101M 21.9 43.75 432260 3.4 701 MB Livejournal social (d) 4.8M 69M 8.84 17.68 20333 5.6 327 MB Indochina web (d) 7.4M 194M 20.4 40.73 256425 7.7 1.1 GB IT web (d) 41M 1.2B 24.9 49.77 1326744 7.0 7.7 GB Twitter social (d) 42M 1.5B 28.9 57.74 2997487 3.6 9.0 GB Friendster social (u) 66M 1.8B 27.4 55.06 5214 5.0 13 GB UK web (d) 106M 3.7B 31.4 62.77 979738 6.9 25 GB Italianwiki web (d) 1.2M 35M 16.6 33.25 81090 3.5 153 MB Frenchwiki web (d) 2.2M 59M 13.2 26.36 137021 3.9 223 MB Fig. 4 Distance distribution of batch updates (2) Incremental—add these batches of updates after decre- mental updates and measure the average insertion time; (3) Fully dynamic—randomly select 50% updates from each of 10 batches of decremental updates as insertions and the remaining 50% as deletion. We report the average update time. For the two datasets that are real-world dynamic networks, we select 10 batches in the order of their timestamps, each containing 1000 real-world inserted/deleted edges and measure the average update time after applying them in a streaming fashion. For themethods FulHL and FulFD, we randomly sample 1000 edges and follow the same update settings as above to measure the update time of performing updates one by one. These settings enable us to explore the impacts of edge insertions and edge deletions, respectively, in addition to their combined effect. Figure 4 shows the distance distribution of edges in these batches after deleting. The distances in all datasets are small ranging from 1 to 6. This shows that the updates are mostly fromdensely connected components of these networkswhich may cause fewer vertices to be affected in the incremental setting. Only a small number of updates are disconnected in most of these datasets. For queries, we randomly sample 100,000 pairs of ver- tices in each dataset to evaluate the average querying time on graphs being changed as a result of fully dynamic batch updates. We also report the average labelling size produced by our batch-dynamicmethodBHL++ after performing fully dynamic batch updates. 11 Experimental results 11.1 Performance comparison We compare our methods against the baseline methods in terms of update time, labelling size and query time. 11.1.1 Update time Table 4 presents the average update time in the fully dynamic, incremental, and decremental settings of our pro- posed method and the baseline methods. 123 BatchHL+: batch dynamic labelling for distance queries on large-scale networks 121 Ta bl e 4 C om pa ri ng up da te tim e of ou r m et ho ds B at ch H L + an d B H L D w ith th e st at e- of -t he -a rt dy na m ic m et ho ds D at as et Fu lly D yn am ic B at ch U pd at e T im e [s ] In cr em en ta lB at ch U pd at e T im e [s ] D ec re m en ta lB at ch U pd at e T im e [s ] B at ch H L + B at ch H L Fu lH L Fu lF D B at ch H L + B at ch H L In cH L In cF D B at ch H L + B H L D B at ch H L D ec H L D ec FD Y ou tu be 0. 01 0 0. 07 0 0. 07 8 1. 25 0 0. 00 9 0. 00 8 0. 11 3 0. 15 5 0. 01 0 0. 00 9 0. 16 9 0. 03 2 3. 18 1 Sk itt er 0. 00 9 0. 60 1 0. 91 4 5. 96 8 0. 00 7 0. 00 6 1. 34 7 0. 11 7 0. 01 2 0. 01 0 0. 75 1 0. 55 1 14 .1 5 Fl ic kr 0. 01 1 0. 02 6 0. 05 9 2. 15 3 0. 01 0 0. 00 8 0. 05 2 0. 05 4 0. 01 2 0. 01 0 0. 04 1 0. 06 6 3. 36 5 W ik ita lk 0. 00 6 0. 02 5 0. 04 9 2. 92 7 0. 00 6 0. 00 5 0. 05 2 0. 03 0 0. 00 5 0. 00 5 0. 04 4 0. 05 5 5. 67 4 H ol ly w oo d 0. 00 5 0. 01 4 0. 05 8 4. 42 4 0. 00 3 0. 00 2 0. 05 5 0. 09 1 0. 00 7 0. 00 6 0. 03 1 0. 06 2 8. 40 1 O rk ut 0. 02 4 1. 77 5 2. 90 0 13 .3 0 0. 01 2 0. 01 4 4. 08 7 0. 36 8 0. 03 4 0. 03 1 0. 03 5 1. 27 4 23 .9 5 E nw ik i 0. 02 1 1. 68 1 8. 40 1 12 1. 8 0. 01 0 0. 01 2 10 .2 8 0. 31 7 0. 02 6 0. 02 5 3. 07 9 6. 27 5 25 1. 3 L iv ej ou rn al 0. 01 5 0. 30 6 0. 38 3 4. 73 6 0. 01 1 0. 01 0 0. 50 5 0. 24 4 0. 02 0 0. 01 9 0. 57 0 0. 33 1 6. 81 6 In do ch in a 0. 01 6 1. 18 1 2. 08 5 20 .6 3 0. 01 1 0. 01 1 3. 62 8 0. 14 2 0. 01 8 0. 01 6 1. 34 6 0. 81 0 44 .9 3 IT 0. 02 6 4. 50 2 6. 47 1 12 9. 6 0. 02 0 0. 03 3 10 .2 3 0. 14 7 0. 03 5 0. 02 7 4. 69 9 3. 23 8 28 5. 6 Tw itt er 0. 01 4 49 .6 2 20 8. 1 51 03 0. 00 5 0. 02 4 20 8. 4 0. 26 4 0. 02 1 0. 02 0 68 .8 5 20 7. 4 94 60 Fr ie nd st er 0. 01 4 0. 41 0 0. 46 1 23 .2 8 0. 00 7 0. 03 5 0. 42 2 0. 25 5 0. 02 4 0. 02 2 0. 73 8 0. 48 2 30 .3 8 U K 0. 02 7 41 .4 6 13 .8 8 11 0. 1 0. 01 7 0. 05 5 26 .7 6 0. 25 8 0. 03 3 0. 03 0 42 .2 9 1. 68 6 25 7. 4 It al ia nw ik i 0. 01 8 0. 16 3 0. 62 3 6. 62 3 – – – – – – – – – Fr en ch w ik i 0. 00 6 0. 18 9 0. 96 5 5. 28 9 – – – – – – – – – T he ba tc h si ze is 10 00 ,a nd th us ,t he up da te tim e re po rt ed fo r ev er y m et ho d is fo r 10 00 up da te s. [s ] is an ab br ev ia tio n of se co nd 123 122 M. Farhan et al. Fully dynamic setting. Table 4 shows that BatchHL+ sig- nificantly outperforms BatchHL.We observe that BatchHL+ is up to 3 orders of magnitude faster than BatchHL on large datasets. BatchHL+ also significantly outperforms FulHL and FulFD on all datasets. In particular, BatchHL+ is over 15 times faster than FulHL on most of the datasets and several orders of magnitude faster than FulFD. Further, the perfor- mancedifferenceofBatchHL+ andBatchHL is due to the fact that our improved batch search in BatchHL+ further prunes affected vertices that do not need a repair in their labels, and in practice they are significant in amount as can be seen in Table 6. Our method BatchHL+ is also much faster than BatchHL, FuLHL and FuLFD on the real-world dynamic networks: Italianwiki and Frenchwiki. We can observe that the aver- age update time of BatchHL+ is always by far smaller than recomputing labelling from scratch, i.e. construction time of BatchHL+ in Table 5. Further, the construction time of BatchHL+ is significantly smaller than the construction of the baseline methods on all datasets. We can also see that the parallel variant of PLL (PSL*) still failed to construct labelling for the largest three datasets. Incremental setting. Table 4 also shows that BatchHL+ is comparablewith BatchHLbecause of no technical difference in the batch search approach of both methods. On the other hand, it significantly outperforms IncHL and IncFD. This is because IncHL and IncFD involve in repeated and unnec- essary computations while performing updates and require extra usage of resource for each single update. Decremental setting. In Table 4, we can also see that BatchHL+ is much faster than BatchHL in this setting. This is because it further categorises and prunes out affected ver- tices whose labels do not need to be changed. In contrary, BatchHL failed to recognise such vertices. The difference in the amount of affected vertices repaired by BatchHL+ and BatchHL is shown in Table 6. It is clear that BatchHL+ has to repair much smaller fraction of affected vertices in the decremental setting. Furthermore, BatchHL+ significantly outperforms DecHL and DecFD on all the datasets. Espe- cially, BatchHL+ can achieve outstanding performance on networks which have a high average degree such as Twit- ter, Flickr and Hollywood. Due to inherent complexity of edge deletion on graphs (i.e. increasing distances), DecHL and DecFD take very long in identifying and updating labels of affected vertices. BatchHL+ outperforms these methods because it leverages the benefit of handling updates in a batch and significantly reduce repeated computations during iden- tifying and repairing the labels of affected vertices. 11.1.2 Labelling size Table 5 shows that BatchHL+ yields significantly smaller labelling sizes than FulFD and PSL* on all the datasets.Note that BatchHL+, BatchHL and FulHL produce labelling of equal sizes after performing update operations because they are designed using highway cover property. When an update occurs, the labelling size of FulFD remains unchanged because they store complete shortest-path trees at all times. In contrast, BatchHL+ similar to state-of-the-art BatchHL and FulHL stores pruned shortest-path trees based on highway cover property. Nonetheless, the labelling size of BatchHL+ remains stable in practice because the average label size is bounded by a constant, i.e. the number of landmarks. The parallel variant of PLL (PSL*) which exploit PLL properties to reduce labelling size still produces labelling of very large size as compared to BatchHL+. 11.1.3 Query time Table 5 shows that the average query time of BatchHL+ is comparablewithFulFD. Itwas reported [12] that the average query time is largely dependent on labelling size. Since the dynamic operations do not considerably affect the labelling size for BatchHL+ and FulFD, their query times remain sta- ble. On Twitter, the query time of BatchHL+ underperforms FulFD. This is because FulFD also maintains the shortest- path information for the neighbourhood of landmarks and the maximum degree of Twitter is very high which might cause many pairs to be covered by landmarks. Although the query time of PSL* in Table 5 is better than BatchHL+ on some datasets, it only handles static graphs. For dynamic graphs, it has several limitations including: (1) the cost of re-constructing labelling from scratch after each batch update is too high to afford, particularly when batch updates are frequent or when underlying dynamic graph is large which is evident from Table 5, (2) the labelling size is much larger than BatchHL+. As we can see in Table 5, PSL* produces the labelling of size almost 99% larger than the labelling of BatchHL+ for Orkut thus possess a high query cost as well. Considering the overall performance w.r.t. three main factors, i.e. query time, labelling size and construction time, BatchHL+ stands out in claiming the best trade-off between these factors. 11.2 Performance under varying landmarks Figure 8 and Fig. 9 show how the update time of our methodBatchHL+ in the fully dynamic setting behaveswhen increasing the number of landmarks. We can see that the update time in Fig. 8 for almost all datasets grows to 30 land- marks and then either decreases or grows linearly. This is because selecting a larger number of landmarks can better leverage the pruning power of our method. Figure 9 shows that the query time decreases or remains the same for almost all datasets with the increased number of landmarks. Partic- ularly, the query time of Twitter decreases because of a very high average degree and selecting a larger number of high 123 BatchHL+: batch dynamic labelling for distance queries on large-scale networks 123 Ta bl e 5 C om pa ri ng pe rf or m an ce of ou r m et ho d B at ch H L + w ith th e ba se lin e m et ho ds Fu lF D ,F u lP L L an d PS L ∗ in te rm s of co ns tr uc tio n tim e (C T ), qu er y tim e (Q T ) an d la be lli ng si ze (L S) D at as et C on st ru ct io n T im e (C T ) [s ] Q ue ry T im e (Q T ) [m s] L ab el lin g Si ze (L S) B at ch H L + Fu lF D Fu lP L L PS L * B at ch H L + Fu lF D Fu lP L L PS L * B at ch H L + Fu lF D Fu lP L L PS L * Y ou tu be 2 4 84 4 0. 00 5 0. 01 0 0. 04 5 0. 00 2 20 M B 83 M B 3. 14 G B 31 8 M B Sk itt er 3 8 51 1 21 0. 02 9 0. 02 0 0. 08 2 0. 00 7 42 M B 15 3 M B 11 .9 G B 1. 01 G B Fl ic kr 3 10 54 6 23 0. 00 7 0. 01 3 0. 10 2 0. 00 5 34 M B 15 2 M B 13 .1 G B 0. 98 G B W ik ita lk 2 5 92 4 0. 00 6 0. 00 8 0. 03 1 0. 00 1 41 M B 74 M B 5. 22 G B 16 0 M B H ol ly w oo d 6 24 97 82 37 7 0. 02 6 0. 03 6 – 0. 14 3 27 M B 26 3 M B – 4. 15 G B O rk ut 24 88 – 26 ,3 10 0. 10 2 0. 15 6 – 0. 20 3 70 M B 71 1 M B – 12 1 G B E nw ik i 25 88 73 82 38 9 0. 05 3 0. 05 1 – 0. 02 1 82 M B 60 8 M B – 7. 04 G B L iv ej ou rn al 20 46 – 44 41 0. 04 3 0. 05 1 – 0. 04 7 12 2 M B 66 3 M B – 50 .5 G B In do ch in a 9 30 32 05 86 0. 78 8 0. 76 7 – 0. 00 7 85 M B 83 8 M B – 3. 39 G B IT 9 30 32 05 86 0. 78 8 0. 76 7 – 0. 00 7 85 M B 83 8 M B – 3. 39 G B Tw itt er 54 9 19 28 – – 0. 86 8 0. 17 4 – – 1. 14 G B 3. 83 G B – – Fr ie nd st er 11 81 33 65 – – 0. 81 5 0. 90 2 – – 2. 43 G B 9. 14 G B – – U K 17 8 62 1 – – 1. 17 4 5. 23 3 – – 1. 78 G B 11 .8 G B – – It al ia nw ik i 6 15 – 21 5 0. 00 8 0. 01 4 – 0. 00 6 23 M B 15 9 M B – 0. 81 G B Fr en ch w ik i 11 25 – 43 3 0. 00 9 0. 01 6 – 0. 00 6 46 M B 27 2 M B – 1. 54 G B W e om it re su lts fo r B at ch H L be ca us e th er e is no pe rf or m an ce di ff er en ce be tw ee n B at ch H L + an d B at ch H L as th ey bo th ex te nd th e sa m e hi gh w ay la be lli ng ap pr oa ch to an sw er di st an ce qu er ie s on dy na m ic ne tw or ks .N ot e th at w he n a m et ho d di d no tfi ni sh th e la be lli ng co ns tr uc tio n in 24 h, w e de no te it as “– ” 123 124 M. Farhan et al. Ta bl e 6 A ve ra ge nu m be r of ve rt ic es af fe ct ed by B at ch H L an d B at ch H L + af te r pe rf or m in g ba tc h up da te s on al lt he da ta se ts M et ho d Ty pe Y ou tu be Sk itt er Fl ic kr W ik ita lk H ol ly w oo d O rk ut E nw ik i L iv ej ou rn al In do ch in a IT Tw itt er Fr ie nd st er U K It al ia nw ik i Fr en ch w ik i B at ch H L 36 6 K 97 1 K 55 K 12 7 K 14 K 50 3 K 12 20 K 27 6 K 20 79 K 58 78 K 10 ,6 22 K 66 K 54 ,5 15 K – – D el et e B at ch H L + 23 K 11 K 22 K 17 K 2 K 3 K 5 K 13 K 16 K 28 K 2 K 6 K 12 K – – B at ch H L 22 K 10 K 21 K 16 K 2 K 3 K 4 K 12 K 15 K 26 K 2 K 6 K 12 K – – A dd B at ch H L + 22 K 10 K 21 K 16 K 2 K 3 K 4 K 12 K 15 K 26 K 2 K 6 K 12 K – – B at ch H L 16 6 K 83 4 K 42 K 81 K 7 K 29 3 K 71 2 K 15 6 K 20 0 K 56 81 K 83 41 K 36 K 54 ,0 26 K 45 K 48 K M ix B at ch H L + 22 K 10 K 21 K 16 K 2 K 3 K 4 K 12 K 15 K 26 K 2 K 6 K 12 K 4 K 48 0 degree landmarks contributes greatly towards shortest-path coverage and makes querying process faster. 11.3 Performance under varying batch sizes We compare the total time of querying and updating on dynamic graphs. We randomly select 10,000 updates from E and process them in batches of varying size. For each batch, we first delete 50% updates and then perform a batch in the fully dynamic setting. To make a fair comparison, the total time of our method BatchHL+, and the baseline meth- ods BatchHL, FulHL and FulFD is the total time to perform a batch update plus the query time to perform 1000 queries after performing a batch update and then averaged over 1000 quer