Several topological indices of random binary trees

2013-01-17 02:18LUOChongliangFENGQunqiangZHANGShuguang

中国科学技术大学学报 2013年12期

LUO Chongliang,FENG Qunqiang,ZHANG Shuguang

(Department of Statistics and Finance,School of Management,University of Science and Technology of China,Hefei 230026,China)

0 Introduction

A topological index is a map from the set of chemical compounds to the set of real numbers.Most topological indices are closely correlated with some physico-chemical characteristics of the underlying compounds.Many chemists who derive and use molecular topological indices employ concepts and terminology of graph theory in their work (see,for example,Refs.[1-2]).Here,graphs are generated from molecules by replacing atoms with vertices and bonds with edges.

Many chemical compounds have a tree-like structure.For example,The entire family of alkanes (saturated hydrocarbons) have tree graphs.For a treeTof sizen,Ref.[13] studies the extreme values of the first Zagreb index ofT:If and only ifTis a path onnnodes it reaches the minimum value 4n-6,and just whenTis a star onnnodes it reaches the maximum valuen(n-1).Ref.[14] gets similar results on the second Zagreb index of trees.

The sequel is organized as follows.In Section 1,the Zagreb index and a modified version of it are introduced.Regardless of the probability model,a basic property of binary trees is also mentioned.Section 2 is dedicated to binary search trees.We first set a relation between the Zagreb index and the number of leavesLn,and a stochastic recurrence equation forLn.The recurrence is a transparent tool for the computation of exact moments,which is shown in Section 2.1.And it is also directly amenable to the univariate contraction method,as is demonstrated in Section 2.2.Section 3 is dedicated to Catalan trees,where a known result of the number of leaves is used.The exact mean of the Zagreb index is also given.In Section 4,briefly random digital search trees are discussed.The Gordon-Scantlebury and Platt indices,two other topological indices,are mentioned in passing in Section 5.Finally,a short comment on such three random tree classes via the Zagreb index is given in the last section.

1 Zagreb indices and binary trees

whered(v) denotes the degree of the vertexvinG.

A binary tree is either empty,or has left and right subtrees that are themselves recursively binary trees.Connections of chemistry to random binary trees have been noted,for example in Ref.[15],where they studied molecules with a binary tree structure,where nodes of valence 3 are saturated and the unsaturated nodes have an affinity that is inversely proportional to their valance.

The following lemma plays an important role in our study.

Lemma1.1[16]If a binary tree of sizenhasl(1≤l

It is easier to work with a modified Zagreb index that does not get affected by non-root nodes,but rather only requires changes based on the degree of the root.Such a modified index is defined in exactly the same way as the standard index,except that the degreed(r) of the rootrof the whole tree is always enhanced tod(r)+1.

(1)

2 Random binary search trees

A binary search tree is constructed from the permutation (π1,π2,…,πn) of the integer set {1,2,…,n} by a standard search algorithm.The first element of the permutation is inserted in an empty tree as its root node.A subsequent elementπj(withj≥2) is guided to the left subtree ifπj<π1,otherwise it is taken into the right subtree.In the recipient subtreeπjis treated recursively by the same insertion algorithm,until it is inserted in an empty subtree,at which point a node is allocated for it and adjoined appropriately as a left (right) child if its label is less than (more than) that of the last node on the search path.

In the probability model induced by random permutations we assume that the tree is built from permutations of {1,2,…,n},where a uniform probability is imposed on the permutations rather than on the trees.When alln! permutations are uniformly random,occurring equally likely,binary search trees are not equally likely.Several permutations give rise to the same tree,favoring shorter balanced trees to tall linear shapes[17].Henceforth the term random binary search tree will refer to a binary search tree built from a random permutation.

LetLnbe the number of leaves in a random binary search tree of sizen.From Lemma 1.1,we have

(2)

We shall use an alternative method to treatLn.Through a distributional equality below,one can compute the explicit expressions for the mean and variance ofLndirectly and get the asymptotic normality for it by the contraction method.

LetRnbe the root of a random binary search tree of sizen.ThenRnis distributed uniformly on {1,2,…,n},and the size of the left subtree isRn-1 and that of the right one isn-Rn.We have,forn≥2,

(3)

The initial values forLnis as follows:

L0=0,L1=L2=1;

2.1 The moments

The relation (3) is useful for the direct computation of moments.We shall illustrate the procedure on the mean and variance.

Take the difference between a version of the recurrence forE[Ln] from that forE[Ln-1].Observe that because the recurrence (3) is valid forn≥2,we must also taken-1≥2,to get

valid forn≥3.Thus,

Toward the variance calculation,minusE[Ln] from both sides of the relation (3),square and take expectation.Employing several symmetries we have,forn≥3,

Note that the last term in the above equation is 0 ifn≥4,and 2/(9n) ifn=3.Then,applying the initial values Var[L0]=Var[L1]=Var[L2]=0,one can get

which gives that Var[L4]=Var[L3]=2/9.Taking the difference between a version of the recurrence forn-1 from that forn,we simplify the recurrence to

This recurrence has the solution

Applying the explicit expressions ofE[Ln] and Var[Ln] to (2),the process of calculation of the first two moment ofZnis straightforward.

Proposition2.1LetZnbe the Zagreb index of a random binary search tree of sizen.Then

E[Z0]=0,E[Z1]=1,

and

ProofIt is easy to check the initial values forE[Zn] and Var[Zn].From the relation (2),ifn≥2,

Let1εbe an indicator of the eventεthat assumes the value 1,ifεoccurs,and assumes the value 0 otherwise.Since the root of a binary search tree has only one child only when it is 1 onn,we have by (1)

(4)

which implies that forn≥2,

For the variance ofZn,the relation (4) also gives that forn≥4,

Thus,forn≥4,

□

We observe that the distribution ofZnis concentrated around its mean,since the variance is smaller than the square of the mean.

Corollary2.1Asn→∞,

As expected,a random binary search tree of sizenis much “closer” to a path of lengthn(with the Zagreb index 4n-6) than a star of sizen(with the Zagreb indexn(n-1)),whennis large.

2.2 Asymptotic normality

In principle,one can compute the higher moments by the methods utilized for the mean and variance,and appeal to the method of recursive moments[21].However,the explosive complexity is forbidding.

For the asymptotic distribution ofLnandZn,the contraction method provides a shortcut.The method was crafted by Rösler[22]to deal with the distribution of Quicksort.Ref.[23] added several extensions.Recently general contraction theorems and multivariate extensions have been added by Refs.[24-26].Ref.[27] provides a lucid survey,and Ref.[20] also provides a survey in Chinese.What makes the contraction method appealing for an application like the one we have at hand is that now we have in our possession several general theorems that greatly reduce the effort in applying it.

The contraction method used here starts with the recursive distribution equality (3).The rates of growth of the mean and variance ofLncomport with the rates required for an application of Corollary 5.2 in Ref.[25].

Takingf(n)=g(n)=nin Corollary 5.2 of Ref.[25],we see that (Rn-1)/g(n) converges inL3toU,a standard Uniform (0,1) random variable,and (n-Rn)/g(n) converges inL3to 1-U; here the toll function is nothing,which,of course,vanishes inL3,as is required for the convergence

(5)

Theorem2.1LetZnbe the Zagreb index of a random binary search tree of sizen.Then

ProofFrom (2) and (5),we have

□

where the toll function

h(n)=4·1{Rn∈{1,n}}+9·1{Rn∉{1,n}}

is clearlyO(1).It is not hard to see that the conditions in Corollary 5.2 of Ref.[25] are also satisfied.

3 Random binary Catalan trees

In the Catalan probability model all binary trees of sizenare considered equally likely.It is so called because when research started on binary trees,it was natural to assume a uniform model,and binary trees are counted by Catalan numbers:The numberBnof binary trees of sizenis given by the Catalan number[28]:

Using the method of singularity analysis of generating functions,the number of leaves in binary Catalan trees is considered by Ref.[29,Example Ⅲ.14 and Ⅸ.25].We also denote byLnthe number of leaves in a random binary Catalan tree of sizen.LetB(z,u) be a bivariate generating function

whereC(n,m) denotes the number of binary Catalan trees of sizen,which contain exactlymleaves.Then

(6)

from which Ref.[29] shows

and

By simple algebra with the relation (2),the first two moments and asymptotic normality forZn,the Zagreb index of a random binary Catalan tree of sizen,can be obtained.

Theorem3.1LetZnbe the Zagreb index of a random binary Catalan tree of sizen.Then

and

The structure of the mean and variance of the Zagreb index in random binary Catalan trees is similar to that in the binary search tree case (only the coefficients of linearity are different).So,one also gets a concentration law here,too (cf.Corollary 2.1),by simply altering the specification of the constant.

From the explicit formula forB(z,u) one can easily obtain an explicit result forE[Zn].We shall illustrate the procedure on the mean for the sake of completeness.Forn≥1,we have

which implies that forn≥1,

The relations (2) and (4) thus give that

4 Random digital search trees

We also letLnandZnbe the numbers of leaves and the Zagreb index of a random digital search tree withnnodes,respectively.From the construction of DST,it is obvious that,forn≥2,

(7)

For the asymptotic normality for the Zagreb index of a random digital search tree,the contraction method is also applied.We first argue heuristically the existence of a limit for

The argument elicits the nature of the limit.

(8)

Theorem4.1LetZnbe the Zagreb index of a random digital search tree withnnodes.Then

and

5 The Gordon-Scantlebury and Platt indices

For a simple graphG,the degree of an edge is the number of its adjacent edges.The Gordon-Scantlebury index is defined as the number of paths of length two inG,and Platt index is the total number of the degrees of all edges inG.LetZG,SGandPGbe the Zagreb index,the Gordon-Scantlebury index and the Platt index ofG,according to Ref.[12],

ZG=2(SG+E(G)),

and

PG=2SG,

whereE(G) denotes the number of edges inG,Since for a tree ofnnodes,E(G)=n-1,we can directly establish parallel results for the Gordon-Scantlebury and Platt indices of the three kinds of random trees in this paper.We omit the details.

6 Conclusion

The Zagreb index induces the topological property of random trees (graphs).We observed that all the Zagreb indices of the three random binary tree classes that we studied here are close and have asymptotic normality.However,as Tab.1 shows that,on the average,a random binary search tree is a bit closer to a star than a random binary Catalan tree,and is a bit closer to a path than a random digital search tree,when their sizes grow.

Tab.1 The first two asymptotic moments of the Zagreb index of a tree with large size n

[1] Harary F.Graph Theory[M].2nd ed.Reading,MA:Addison-Wesley,1972.

[3] Christophi C,Mahmoud H.The oscillatory distribution of distances in random tries[J].The Annals of Applied Probability,2005,15:1 536-1 564.

[4] Devroye L,Neininger R.Distances and finger search in random binary search trees[J].SIAM Journal on Computing,2004,33:647-658.

[5] Janson S.The Wiener index of simply generated random trees[J].Random Structures and Algorithms,2003,22:337-358.

[6] Janson S,Chassaing P.The center of mass of the ISE and the Wiener index of trees[J].Electronic Communications of Probability,2004,9:178-187.

[7] Mahmoud M,Neininger R.Distribution of distances in random binary search trees[J].The Annals of Applied Probability,2002,13:253-276.

[8] Neininger R.The Wiener index of random trees[J].Combinatorics,Probability and Computing,2002,11:587-597.

[13] Li X,Li Z,Wang L.The inverse problems for some topological indices in combinatorial chemistry[J].Journal of Computational Biology,2003,10:47-55.

[14] Lang R,Li X,Zhang S.Inverse problem for Zagreb index of molecular Graph[J].Applied Mathematics A Journal of Chinese Universities,2003,18:487-493.

[15] Quintas L,Szymanski J.Nonuniform random recursive trees with bounded degree[J].Sets,Graphs,and Numbers:Colloquia Mathematica Societas Janos Bolyai,1992,60:611-620.

[16] Siltaneva J,Mäkinen E.A note on the expected distribution of degrees in random binary trees[J].ACM SIGCSE Bulletin,2000,32:32-33.

[17] Mahmoud H.Evolution of random search trees[J].New York:Wiley,1992.

[18] Devroye L.Limit laws for local counters in random binary search trees[J].Random Structures and Algorithms,1991,2:303-315.

[19] Mahmoud H.Pólya urn models[M].New York:Chapman-Hall,2008.

[20] Su C,Feng Q,Liu J.Modern Limit Theory and Its Applications to Random Structures[M].Beijing:Higher Education Press,2010.(In Chinese)

[21] Chern H,Hwang H,Tsai T.An asymptotic theory for Cauchy-Euler differential equations with applications to the analysis of algorithms[J].Journal of Algorithms,2002,44:177-225.

[22] Rösler U.A limit theorem for “Quicksort”[J].RAIRO Informatiques Theoriques et Applications,1991,25:85-100.

[23] Rachev S,Rüschendorf L.Probability metrics and recursive algorithms[J].Advances in Applied Probability,1995,27:770-799.

[24] Neininger R.On a multivariate contraction method for random recursive structures with applications to Quicksort[J].Random Structures Algorithms,2001,19:498-524.

[25] Neininger R,Rüschendorf L.A general limit theorem for recursive algorithms and combinatorial structures[J].The Annals of Applied Probability,2004,14:378-418.

[26] Rösler U.On the analysis of stochastic divide and conquer algorithms[J].Algorithmica,2001,29:238-261.

[27] Rösler U,Rüschendorf L.The contraction method for recursive algorithms[J].Algorithmica,2001,29:3-33.

[28] Knuth D.The Art of Computer Programming,Vol.3:Sorting and Searching[M].2nd ed.Reading,MA:Addison-Wesley,1998.

[29] Flajolet P,Sedgewick R.Analytic Combinatorics[M].Cambridge:Cambridge University Press,2009.

[30] Coffman E G Jr,Eve J.File structures using hashing functions[J].Commun ACM,1970,13:427-432.

[31] Flajolet P,Sedgewick R.Digital search trees revisited[J].SIAM Journal on Computing,1986,15:748-767.

[32] Devillersand J,Balaban A T.Topological Indices and Related Descriptors in QSAR and QSPR[M].Amsterdam:Gordon & Breach,1999.

中国科学技术大学学报2013年12期

中国科学技术大学学报的其它文章: 基于GARCH模型的中国行业外汇风险暴露研究; 配套产品的扩散模型研究
——以美国汽车与配件市场为例; 中国参数化地震巨灾债券的定价分析; 环F2+uF2上线性码关于Rosenbloom-Tsfasman距离的MacWilliams恒等式; 股指期货基差的非线性特征和均值回复机制研究; 基于自正则的K-S方法的均值变点检验
——对我国上证综指的实证分析