TAN Tunzi, GAO Suixiang, YANG Wenguo
(School of Mathematical Sciences,University of Chinese Academy of Sciences,Beijing100049,China;Key Laboratory of Big Data Mining and Knowledge Management of Chinese Academy of Sciences,Beijing100190,China) (Received 18 February 2017; Revised 15 November 2017)
Abstract Determining the connectedness of an undirected graph is a frequent issue in practical graph mining and regarded as a key subproblem of the graph partitioning problem. Apart from graph partitioning, graph connectedness also plays an imperative role in tracking the spread of disease, VLSI design, social network analysis, and theoretical studies in graph theory such as “Cayley graph”. This work reviews several important methods for determining the connectedness of an undirected graph, such as breadth-first search, depth-first search, and the eigenvalues of a graph Laplacian matrix. In addition, we propose several new methods, such as power sum and logical sum of adjacency matrix. We compare all the relevant methods empirically on random graphs with up to 10 000 vertices, and show that the breadth-first search and logical sum methods deliver good performances on large graphs with more than 100 vertices and the logical sum method is the fastest.
Keywords connectedness; breadth-first search; depth-first search; power sum; Laplacian matrix; logical sum
An undirected graphG=(V,E) consists of a pair of sets: a set of verticesV={v1,v2,…,vN} and a set of edgesE={v1,v2,…,vM}, where all the edges are bidirectional. If there are no self-loops, parallel edges, and weights in the undirected graph, it is called a simple graph[1-2], so the “graph” occurred in this paper stands for a simple graph. The graph G is connected if there is a path in G from every vertex to every other vertex[3-4]. When partitioning a graph into several connected components, determining the connectedness of all the components is regarded as a key sub problem. In some previous work, such as Refs.[5-7], connectedness has been ignored, because the objective function aiming at minimizing the total cut makes the connectedness constraint unnecessary. But if we change the objective function, how to determine the connectedness of a graph or part of the graph becomes very important in computing. Apart from graph partitioning, graph connectedness also plays an imperative role in tracking the spread of disease[8], VLSI design[9], social network analysis[10-11], and some theoretical results in graph theory like “Cayley graph”[12].
“How to determine whether a graph is connected or not?” is a frequently asked question on the Internet. Different methods mentioned by people in their blogs or some websites are rarely found or organized in research papers. This paper reviews some main methods for determining the connectedness of a graph, proposes some new ones, and does experiments to compare all the methods.
There are several different ways to represent a graphG=(V,E),amongwhichadjacencymatrixisthemostwidelyusedandiftheadjacencymatrixisgiven,thegraphisfixedcorrespondingly.AdjacencymatrixA={ai,j}N×Nshowstheconnectionbetweenvertexandvertex[2],wherevi,vj∈V,Nisthenumberofverticesand
Thedegreeofvertexv:deg(v)inagraphisthenumberofedgesincidenttothevertex,sothedegreematrixD(G)ofgraphGisadiagonalmatrixwiththevertexdegreesonthediagonal.
AnotherimportantconceptusedinthispaperisLaplacianmatrix,whichisalsoapopularmatrixrepresentationofagraph.GivenagraphGonNvertices,itsLaplacianmatrixisthen-by-nmatrixL(G)=D(G)-A(G),whereA(G)istheadjacencymatrix,andD(G)isthedegreematrix.
Forexample,givenagraphG=(V,E)withV={u,v,w,x,y}andE={uv,uw,vw,vx,wx,xy}.TheadjacancymatrixAofthegraphGis
A=0110010110110100110100010éëêêêêêêêùûúúúúúúú
The Laplacian matrixL(G) of the graphGis
whereDis the degree matrix,which contains the degree of each vertex.
Last but not least, the “distance” between two vertices in a graph is the number of edges in a shortest path connecting them.
There are three different types of methods for determining the connectedness of a graph:
1)Search methods, including breadth-first search, depth-first search, random walks methods.
2)Algebraic computation on adjacency matrix or Laplacian matrix of the graph, such as power, eigenvalue, and logical sum.
3)Path finding: shortest path algorithms like Dijkstra, Warshall, Floyd algorithm.
In order to determine the connectedness of a graph by path finding methods, the paths between one vertex and all the other vertices should be found, which is very time consuming and complicated. So we combine the second with third methods and apply the path finding methods on the computation of adjacency matrix.
2.1.1 Breadth-first search
Breadth-first search was first proposed by Ref.[13] in 1950s to find the shortest path out of a maze, and invented independently by Ref.[14] for solving efficiently path-connection problems.
Making use of breadth-first search to determine the connectedness of a graph can be interpreted as three steps:
1)Put one chosen vertex into the waiting list (shorten as “WL”) and mark it;
2)Move the first vertex WL1in WL to SL (short for “search list”) and put all the unmarked vertices within the distance of 1 with WL1into WL from the back and mark them.
3)Repeat second step until there is no vertex in WL. If the elements in SL are all the vertices of the graph, the graph is connected.
Figure 1 shows an example of the procedure to do breadth-first search on a graph.
Fig.1 Breadth-first search
The time complexity of breadth-first search algorithm isO(|V|+|E|), whereVis the number of vertices andEisthenumberofedges,becausetheworstcaseisthatalltheedgesandverticesarevisited[15].
2.1.2Depth-firstsearch
Depth-firstsearchwasinvestigatedinthe19thcenturyfortheuseofsolvingmazes[16].
Determiningtheconnectednessofagraphbytheuseofdepth-firstsearchalgorithmworksinthefollowingthreesteps:
1)PickupastartingvertextoSLandmarkit;FL(formallist)=[0];
2)PutoneunmarkedvertexdirectlyconnectedwiththelastvertexinSLintoSLandmarkit.RecordtheformalchosenvertexofthenewlychosenvertexinFL.
3)Repeatsecondstepuntilthereisnounmarkedvertexconnectedtothechosenvertex,androllbacktotheformalchosenvertexbytheuseofFL.Stopuntilsearchbackto0intheFL.IftheelementsinSLareallthevertices,thegraphisconnected.
Figure2showsanexampleoftheproceduretododepth-firstsearchonagraph.ThetimecomplexityofthisalgorithmisΘ(|V|+|E|)[15]whichissimilarwiththebreadth-firstsearch,sothechoiceofthesetwoalgorithmsdependsondifferentsituationsandtheirdifferentpropertiesinsteadoftheirtimecomplexities.
2.1.3Randomwalksalgorithm
Randomwalks,firstproposein1905[17],havebeenwidelyusedinmanyfields:physics,computerscience,chemistry,andsoon[18-19].Arandomwalkisapaththatconsistsofaseriesofrandombehaviors
Fig.2 Depth-first search
orsteps.Thealgorithmcannotstopuntilalltheverticesarevisitedorthenumberofstepsexceedsagiveninteger.ThealgorithmisshowninAlgorithm1.
2.2.1 Power sum of adjacency matrix
ThekthpowerofagraphGisagraphwiththesamesetofverticesasGandanedgebetweentwoverticesiffthereisapathoflengthatmostkbetweenthem[20].TheshortestpathbetweeneverytwoverticesinagraphcontainsatmostNvertices,sowecomputethepowersumofadjacencymatrixA:
P=I+A+A2+A3+…+AN-1,
(1)
whereIis the identity matrix.
If all the elements in the first row of P are larger than 0, the graphGisconnected.TheideaofpowersummethodissimilarwithWarshallalgorithm,aimingatfindingalltheshortestpathsbetweenacertainvertexandalltheothervertices.ThecomputationcomplexityofpowersummethodisO(N3).
Algorithm 1 Random walks algorithmRequire: rand: Random number; A: Adjacency matrix; K: An integer, SL: visiting listEnsure:SL1:Initial next←v,i←02:Generate rand, and i←i+1. The degree of vertex next is deg(next).3:ifi-1deg(next)
Algorithm 2 Logical sum algorithm (version 1)Require A: Adjacency matrix, V={V1,V2,…,VN}, V1 is marked.1:Pick up one vertex Vj, where minjarg{a1,j=1} and mark the vertex Vj. Note: arg{a1,j=1} means all the js which meet a1,j=1.2:Renew the value of the first row by the logical sum of the first row and the jth row; replace all the elements in column j and row j with 0.3:Repeat step 1 and 2 until there is no “1” left in the first row. If all the vertices are marked, the graph is connected.
Algorithm 3 Logical sum algorithm (version 2)Require A: Adjacency matrix, V={V1,V2,…,VN},V1 is marked.1:Pick up all the vertices M={Vj}, where arg{a1,j=1} and mark them. Note: arg{a1,j=1} means all the js which meet a1,j=1.2:Renew the value of the first row by the logical sum of the first row and all the rows in M; replace all the elements in columns M and rows M with 0.3:Repeat step 1 and 2 until there is no “1” left in the first row. If all the vertices are marked, the graph is connected.
2.2.2Laplacianmatrix
Laplacianmatrixisanotherimportantmatrixrepresentationofagraphandhavebeenusedtocalculatealotofimportantpropertiesofthegraph[21].TherearemanycharacteristicsoftheeigenvaluesofgraphLaplacianmatrix,amongwhichthetheory:thenumberofconnectedcomponentsofGisequaltothemultiplicityof0asaneigenvalue[22]canbeusedtodeterminetheconnectednessofagraph.Ingeneral,thetimecomplexityofthismethodisO(N2)[23].
2.2.3Logicalsumofadjacencymatrix
Thismethodisthecomputationalversionofsearchmethods.Logicalsumisabasiccomputationincomputerscienceandalsosimilarwiththedefinitionof“union”and“or”.Asweknow,ifavertexiisconnectedwithj,jisconnectedwithm,andtheniisconnectedwithm.Itisthesameideawiththelogicalsum:
0+0=0,0+1=1+0=1+1=1.
ApplyinglogicalsumtographcomputationwasfirstlyproposedinaSpanishbookin1992.Weproposeonenewmethodwithtwodifferentversionsbasedonlogicalsum.TheimplementationoflogicalsumondeterminingtheconnectednessofagraphissummarizedinAlgorithm2.
Logicalsumalgorithm(version2)showsaslightlymodifiedalgorithmbasedonthefirstone.Thedifferencebetweenversion1andversion2isthecomputationorder:onlyonerowisaddedtothefirstrowduringeverystepinversion1,butinversion2,alltheverticesmeetingtheconstraintarefoundandaddedtothefirstrowduringeverystep.Thetimecomplexityofthismethodisthesamewithbreadth-firstsearch:O(|V|+|E|).
Since the time complexity of random walk method is unknown and we want to show that logical sum method shows better than other methods in real experiments because of its computational idea. We conducted numerical experiments to compare all the methods mentioned in the third section. (DS: depth-first sesarch; BS: breadth-first search; RW: random walk; PS: power sum; LM: Laplacian matrix; LS (1): logical sum (1); LS (2): logical sum (2))
Table 1 to Table 5 and Fig.3 present the results.
Table 1 Numerical experiments on graphswith 10 vertices ms
In order to show their performance on different scale of graphs, four sets of experiments on random graphs with up to 10 000 vertices have been done. After that, well performed methods: breadth-first search and logical sum are applied to random graphs with 20 000 and 30 000 vertices.
In the four sets of experiments, each method was run third times on random graphs with 10, 100, 1 000, and 10 000 vertices, respectively.
Table 2 Numerical experiments ongraphs with 100 vertices ms
Table 3 Numerical experiments on graphswith 1 000 vertices ms
Table 4 Numerical experiments on graphswith 10 000 vertices s
Table 5 Numerical experiments on graphswith 20 000 and 30 000 vertices s
In the fifth set of experiments, breadth-first search and logical sum are applied to random graphs with 20 000 and 30 000 vertices.
Fig.3 Performances of different methods
In our first to fourth set of experiments, each method is applied three times on random graphs and the random graph in every round is the same. Power sum takes the least time when there is only 10 vertices in a graph and breadth-first search follows. While in the second set of experiments, breadth-first search method is the fastest algorithm, and power sum follows. When there is more than 1 000 vertices in the graph, the advantage of logical sum algorithm becomes more and more obvious. Besides, random walk method is not stable since the way of producing next chosen vertex is not stable. Figure 3 shows the running time of different methods changing over the number of vertices in the graph.
As shown in Table 5, breadth-first search and logical sum are very fast when dealing with large graphs and logical sum (version 2) is the best which saves almost half of the running time of breadth-first search.
Depth-first search and breadth-first search are well-known and widely used methods. They are very easy and fast, but breadth-first search are much faster in determining the connectedness of a graph than depth-first search method and breadth-first search method is also very competitive in comparison with algebraic methods. Another search method: random walks method is easily developed, but not very stable in application.
Power sum method is always faster than the Laplacian matrix method and it is also faster than logical sum methods when the graph is small. However, when the number of vertices of a graph grows, logical sum methods are more efficient.
Since there are a lot of packages and tools for those old methods, if the graph is not so complicated and large, the breadth-first search is the best choice. But if the graph is very large, loogical sum (version 2) method is very easy to develop and extremely fast, which should be a very good choice.