Yuchen Zhou ,Hongtao Huo ,Zhiwen Hou ,Lingbin Bu ,Yifan Wang ,Jingyi Mao ,Xiaojun Lv and Fanliang Bu,★
1School of Information Network Security,People’s Public Security University of China,Beijing,100038,China
2Institute of Computing Technology,China Academy of Railway Sciences Corporation Limited,Beijing,100081,China
ABSTRACT Graph Convolutional Neural Networks (GCNs) have been widely used in various fields due to their powerful capabilities in processing graph-structured data.However,GCNs encounter significant challenges when applied to scale-free graphs with power-law distributions,resulting in substantial distortions.Moreover,most of the existing GCN models are shallow structures,which restricts their ability to capture dependencies among distant nodes and more refined high-order node features in scale-free graphs with hierarchical structures.To more broadly and precisely apply GCNs to real-world graphs exhibiting scale-free or hierarchical structures and utilize multi-level aggregation of GCNs for capturing high-level information in local representations,we propose the Hyperbolic Deep Graph Convolutional Neural Network (HDGCNN),an end-to-end deep graph representation learning framework that can map scale-free graphs from Euclidean space to hyperbolic space.In HDGCNN,we define the fundamental operations of deep graph convolutional neural networks in hyperbolic space.Additionally,we introduce a hyperbolic feature transformation method based on identity mapping and a dense connection scheme based on a novel non-local message passing framework.In addition,we present a neighborhood aggregation method that combines initial structural features with hyperbolic attention coefficients.Through the above methods,HDGCNN effectively leverages both the structural features and node features of graph data,enabling enhanced exploration of non-local structural features and more refined node features in scale-free or hierarchical graphs.Experimental results demonstrate that HDGCNN achieves remarkable performance improvements over state-ofthe-art GCNs in node classification and link prediction tasks,even when utilizing low-dimensional embedding representations.Furthermore,when compared to shallow hyperbolic graph convolutional neural network models,HDGCNN exhibits notable advantages and performance enhancements.
KEYWORDS Graph neural networks;hyperbolic graph convolutional neural networks;deep graph convolutional neural networks;message passing framework
Graph Convolutional Neural Networks (GCNs),as a state-of-the-art model for graph representation learning,have gained significant traction in various fields and demonstrated remarkable achievements [1-4].GCNs enable efficient learning of graph representations by embedding the nodes in a Euclidean space and performing message passing and aggregation within this space.However,when applied to scale-free graphs characterized by tree-like structures or hierarchies [5,6],utilizing GCNs to embed such graphs in Euclidean space gives rise to severe distortion issues[7,8].Notably,real-world graphs,including disease transmission networks,social networks,trade networks,and drug trafficking organization networks,consistently exhibit prominent tree structures.In many applications of GCN models,data from various domains exhibit hierarchical structures.For instance,in domains such as transportation,finance,social networks,and recommendation systems[9],clear hierarchical structures are prevalent,such as the main road branches in transportation,the branching structures in financial transactions,social relationships in social networks,and tree-like structures in recommendation systems.To address the challenge of embedding distortion,hyperbolic geometry arises as a promising solution.Embedding nodes in hyperbolic space allows for enhanced preservation of the hierarchical structure and relationships within the graph.Hyperbolic geometry,being non-Euclidean in nature,provides a more effective description of tree-like structures and hierarchical relationships,resulting in more accurate node embedding representations.Nevertheless,current hyperbolic embedding techniques still grapple with challenges associated with issues like oversmoothing and over-squeezing,which tend to emerge when applying multi-layer stacking.
The depth of a neural network plays a pivotal role in the performance of the model across various applications.For instance,Convolutional Neural Networks (CNNs) employed in computer vision typically consist of dozens or even hundreds of layers,whereas most Graph Convolutional Networks(GCNs)exhibit shallow structures.In the domain of computer vision,shallow CNN models effectively capture fundamental features such as edges and object shapes.However,deep models have the ability to further comprehend and grasp high-level semantic information by building upon the primary features extracted from the shallow layers.Theoretically,a deep CNN can extract more abstract and sophisticated information from an image,equipping the model with image understanding capabilities akin to those of a human observer.In various emerging industrial applications,constructing deep neural network architectures to achieve superior model performance has become paramount.For instance,in applications such as bearing fault diagnosis[10,11],it is essential to enhance the network’s depth by establishing residual networks or by increasing the depth of adversarial graph networks to obtain more accurate results in diagnosing rolling element bearing faults.Inspired by CNNs,our objective is to incorporate the depth framework of CNNs into GCNs,aiming to construct a deep GCN model for effective aggregation of distant node information,capturing non-local structural features,and acquiring high-level abstract node features.However,stacking an excessive number of graph convolution layers results in nodes continuously aggregating information from global nodes.Eventually,this leads to uniform node features,making it challenging to distinguish between nodes and causing over-smoothing or over-squeezing[1,2,12].
To extend GCNs in hyperbolic geometry and incorporate a deep network architecture to fully leverage rich node features,we encounter the following challenges:(1)Achieving end-to-end processing necessitates investigating methods for mapping Euclidean input features to the hyperbolic space.This is crucial to enable seamless transmission throughout the network.(2)Performing graph convolution operations in hyperbolic space requires the development of techniques for feature transformation,neighborhood aggregation,and nonlinear activation.These operations must be adapted to the unique properties of hyperbolic geometry.(3)Developing a deep graph convolutional neural network framework and mitigating the challenges associated with model degradation due to its depth are crucial endeavors for investigating distant dependency relationships among nodes and extracting more refined node features.This framework should be capable of capturing intricate dependencies across multiple layers to enhance the expressive power of the model.Addressing these challenges will pave the way for extending GCNs into the hyperbolic space,enabling the integration of deep learning techniques and facilitating the exploitation of rich node features in graph data.
To address the aforementioned challenges,we propose a novel end-to-end graph representation learning framework called Hyperbolic Deep Graph Convolutional Neural Network (HDGCNN).HDGCNN combines deep graph convolutional neural networks with hyperbolic geometry to overcome these problems.Firstly,HDGCNN employs a hyperbolic model to transform input features in Euclidean space into hyperbolic embeddings,leveraging the unique properties of hyperbolic space.Furthermore,HDGCNN introduces an identity mapping mechanism within the hyperbolic space for feature transformation,ensuring effective utilization of hyperbolic representations.In the process of neighborhood aggregation,HDGCNN incorporates an approach based on initial structural features and hyperbolic attention coefficients.This technique harnesses the structural information and node features,enabling comprehensive utilization of available information.To enable a deep network structure,we introduce a novel non-local message passing framework that densely connects the hyperbolic features generated by each layer,fostering feature reuse and enhancing the expressive power of the model.Experimental results demonstrate that HDGCNN outperforms GCN based on Euclidean space in node classification and link prediction tasks on scale-free graphs.Additionally,the deep architecture of HDGCNN captures more refined node features compared to shallow hyperbolic neural network models,leading to significant performance improvements in node classification tasks.The contributions of this paper can be summarized as follows:
(1) We propose an end-to-end hyperbolic deep graph convolution neural network framework.This framework combines deep graph convolution neural network with hyperbolic geometry,defines the core operation of deep graph convolution neural network in hyperbolic space,and realizes the mapping of input graph from Euclidean space to hyperbolic space.
(2) We propose a hyperbolic feature transformation method based on identity mapping.This method can perfectly adapt to the hyperbolic deep graph convolution neural network architecture,addressing the performance degradation issue caused by frequent feature transformations in deep network models.Consequently,the proposed method ensures that the model’s performance remains unaffected by the network’s depth.
(3) We propose a neighborhood aggregation method based on initial structural features and hyperbolic attention coefficients.This method harnesses the structural features of the input graph and the attention coefficients calculated after updating node features during message aggregation at each layer.By fully utilizing the structural information and node features,our method enhances the model’s ability to capture important graph features.
(4) We introduce a dense connection scheme based on a non-local messaging framework.This scheme effectively explores remote dependency relationships between nodes and extracts more refined node features in scale-free graphs or hierarchical graphs with power-law distributions.Furthermore,it facilitates feature reuse and non-local message transmission,addressing the problem of over-smoothing caused by excessively deep models.
The remainder of this paper is organized as follows: Section 2 provides a brief overview of the related work on deep graph convolutional neural networks and hyperbolic neural networks.Section 3 presents the background knowledge on hyperbolic neural networks.Section 4 presents a comprehensive description of our proposed method,HDGCNN,including its key details.Section 5 presents the experimental results,providing an evaluation of the performance of HDGCNN.Finally,Section 6 concludes the paper,summarizing the key findings and contributions.
Graph Neural Networks (GNNs) represent one of the forefront techniques for graph representation learning in contemporary research.By leveraging the graph’s structural information and initial node features,GNNs adeptly acquire embedding representations for nodes [13-18],and have demonstrated outstanding performance across various domains,attaining state-of-the-art results.Notably,recent studies have integrated hyperbolic geometry into GNNs,an advancement that enables the embedding of hierarchical graphs with reduced distortion.This augmentation of hyperbolic geometry enhances the applicability of GNNs to a wider array of fields.
Despite the strong advantages of Graph Neural Networks(GNNs)in processing graph data,most state-of-the-art GNN models are shallow structures.This limitation significantly hampers the ability of GNNs to extract high-order node features.However,deepening GNNs gives rise to several challenges,such as over-smoothing[12,19,20],graph bottleneck[21],and over-squeezing[22,23].
To address these issues,various research efforts have proposed methods for deepening GNNs.For instance,Li et al.[24] drew inspiration from Residual Neural Networks (ResNet) [25],introduced residual connections to deep GNNs,successfully training a model with a depth of 56 layers.Chen et al.[26] incorporated residual connections by adding the output of the graph convolutional layer to the previous layer,thereby reducing the strong correlation between each layer and achieving successful training with a depth of 64 layers.Xu et al.[27] adopted residual connections in the last layer of the network to combine the output of each previous layer and trained a deep GNN.
The aforementioned methods all involve architectural modifications,and a portion of the efforts is dedicated to incorporating normalization and regularization techniques.For instance,PairNorm proposed by Zhao et al.[28],NodeNorm proposed by Zhou et al.[29],MsgNorm proposed by Li et al.[30],Energetic Graph Neural Networks (EGNNs) proposed by Zhou et al.[31],and other methods utilize normalization and regularization techniques to alleviate the aforementioned problems and facilitate deepening GNNs.
Additionally,GNNs can be deepened through the random dropout technique.For instance,Rong et al.[32]proposed the DropEdge model,which addresses the over-smoothing problem caused by deep models by randomly deleting a certain number of edges during training.Huang et al.[33]introduced the DropNode model,optimizing deep models by randomly dropping a certain number of nodes during training.
Hyperbolic spaces have garnered significant attention owing to their constant negative curvature and their capacity to capture hierarchical relationships in data structures[34,35].Recently,hyperbolic spaces have emerged as a promising continuous approach for learning latent hierarchical representations in real-world data[36].In the context of recommender systems,Yang et al.[37]proposed the Hyperbolic Information Collaborative Filtering(HICF)method,which tightly couples the embedding learning process with hyperbolic geometry to enhance personalized recommendations.Chen et al.[38]introduced a knowledge-aware recommendation approach based on the hyperbolic geometric Lorenz model,effectively modeling the scale-free tripartite graph after data unification.
In natural language processing,Liu et al.[39]devised a novel approach based on a multi-relational hyperbolic graph neural network,enabling pre-diagnosis through inquiries about patients’symptoms and medical history.To achieve more accurate product search,Choudhary et al.[40] presented an innovative attention-based product search framework,representing query entities as two vector hyperboloids,learning their intersections,and employing attention mechanisms to predict product matches in the search space.
Furthermore,hyperbolic graph neural networks have found application in various fields,including molecular learning[41,42],word embedding[43],graph embedding[44],multi-label learning[45],and computer vision [46,47].These versatile applications demonstrate the potential of hyperbolic spaces in addressing diverse challenges across different domains.
We aim to develop an end-to-end deep learning framework,wherein the primary learning objective is to acquire the functionf,as defined in Eq.(1),which maps input nodes to embedding vectors.These learned node embeddings can serve as inputs for various downstream tasks.LetG=(V,E)represent a single graph,whereVdenotes the set of nodes with|V|=Nnodes,and E denotes the set of edges.We employ∈Rnto denote then-dimensional input node features,where the superscripts 0 andEcorrespond to the node features of the initial layer and the node features in Euclidean space,respectively.
While Zhang et al.[48] proposed that the input node features can be pre-trained embedding features,we are specifically interested in studying an end-to-end deep learning framework.Consequently,we directly employ the original node features of the dataset as input.In the following,we use superscripts Toand H to denote tangent features located in the tangent space at o and hyperbolic features located in hyperbolic space,respectively.
The Riemannian manifold [49] can be denoted aswhere M represents a smooth and differentiablen-dimensional manifold equipped with a Riemannian metricgM.In this contextis a topological space,extending the concept of a 2-dimensional plane tondimensions,and each point within the manifold can be locally approximated by Euclidean space.Since M is differentiable at every point,the tangent space TxM of a pointxin M can be defined as the first-order linear approximation of the local space aroundx,which is isomorphic to Rn.The Riemannian metricgMon M can be represented asdefining a positive definite inner product 〈·,·〉 : TxM×TxM →R over the tangent space TxM,smoothly varying withx.This enables the definition of geometric properties and local information,such as angle,geodesic distance,and curvature of the tangent vectors in TxM.
The shortest pathγ: [α,β] →M,connecting a curve between any two points on the manifold M with constant velocity,is referred to as a geodesic.The length of the curve on the manifold M can be obtained through integration.For two connected pointsuandvon the manifold M,whereγ(α)=uandγ(β)=v,we consider the segmented curve P comprising all connected points fromutov.Here,inf(·) denotes the minimum value under the limit.Consequently,the induced distance function between pointuand pointvon M is expressed as Eq.(2):
The function that projects the tangent vectorv∈TxM as the initial velocity vector onto M is denoted as an exponential mapexpx: TxM →M.The exponential mapexpxrealizes the projection of the tangent vectorvfrom the tangent space TxM atxonto the manifold M,resulting in the pointexpx(v) ∈M.In general,the mapping of TxM →M corresponds to moving unit length along the geodesic line uniquely defined byγ(0)=x,with the direction of movement given byγ'(0)=v.
The inverse function of the exponential map is recorded as a logarithmic maplogx: M →TxM,which facilitates the mapping of points on the manifold back to the tangent plane.It is important to note that different manifolds have distinct exponential and logarithmic maps.Subsequently,we will present the corresponding mapping functions for specific manifolds.
Furthermore,parallel transportPTx→y: TxM →TyM represents a linear equidistant connection between tangent spaces.This process allows tangent vectors to move along geodesics and defines a canonical method to connect tangent spaces.
A hyperbolic space is characterized as a smooth Riemannian manifold with constant negative curvature,locally resembling a Euclidean space.In hyperbolic space,several isometric models are commonly employed,including the Lorentz(Hyperboloid or Minkowski)model,Poincaré ball model,Poincaré half-space model,Klein model,hemisphere model,and others.Presently,the most frequently used models are the Lorentz model and the Poincaré model.However,Nickel et al.[50]discovered that the Lorentz model exhibits greater numerical stability during optimization.Given that our proposed hyperbolic deep graph convolutional network architecture comprises a deep model structure,the exponential mapping and logarithmic mapping will be executed repeatedly during the training process.As the numerical stability of the Lorentz model is well-suited for scenarios involving deep model architectures,we have chosen the Lorentz model to define our model.
As previously discussed regarding the Riemannian manifold,the Lorentz model Lnof then-dimensional hyperbolic space is a Riemannian manifold embedded in then+1-dimensional Minkowski space.Its representation is denoted as Ln=where Hn,krefers to an-dimensional hyperbolic manifold with a constant negative curvature -1/k(wherek>0).The Hn,kis defined as Eq.(3):
In the above Eq.(3),〈,〉L denotes the Lorentz inner product.For anyx,y∈Rn+1,the Lorentz inner product can be expressed as Eq.(4):
Among these,gLis a diagonal matrix with all elements equal to 1 except for the first element,which is -1.In other words,the metric tensorgL=diag[-1,1,1,...,1].For any pair of pointsx,y∈Hn,kon Hn,k,the following Eq.(5)of distance function can be derived using the metric tensorgL:
The tangent space at the pointxon the hyperbolic manifold Hn,kis denoted as TxHn,k.Thisndimensional Euclidean space,TxHn,k,can be locally approximated to the neighborhood of the pointxon Hn,k.In TxHn,k,the elementvis referred to as a tangent vector,and it is represented as:TxHn,k:=Notably,denotes the norm ofv∈TxHn,k.
Forx∈Hn,kandv∈TxHn,k(wherev≠0),the exponential mapping of the Lorentz model is given as Eq.(6):
Forx,y∈Hn,k,andy≠x,the logarithmic mapping formula of the Lorentz model is presented in Eq.(7):
In this section,we present the specific architecture of HDGCNN.Initially,we map the input graph,which lies in Euclidean space,to a hyperbolic manifold using an inception layer.Subsequently,node embeddings situated in hyperbolic space are further mapped to the tangent space of a reference point through logarithmic mapping.Next,the graph convolution operation is performed on the Euclidean plane within the tangent space,and the resultant node embeddings obtained after the graph convolution operation are projected to the next hyperbolic manifold using an exponential map.This process is iteratively repeated,applying logarithmic mapping and exponential mapping,and eventually incorporating dense connections for the hyperbolic node embeddings produced by each layer.As a result,the architecture effectively captures long-range dependencies and finer node features between nodes in scale-free graphs or hierarchical graphs with power-law distributions.The overall architecture of HDGCNN is depicted in Fig.1,with each component described in detail in the next subsections.
Figure 1:The overall architecture of hyperbolic deep graph convolutional neural network(HDGCNN)
As depicted in Fig.1,we designate the originoas the reference point for the hyperbolic space.We initiate the embedding of the initial node located in the hyperbolic spaceasx0,Hand then project it onto the tangent spacethrough the logarithmic mappinglogo(·).Subsequently,a graph convolution operation is executed into aggregate neighborhood information and yieldx1,E.This newly obtained node embeddingx1,Eis then projected into the subsequent hyperbolic spaceusing the exponential mappingexpo(·),resulting inx1,H.The above operations are repeated iteratively until the final outputxl,His achieved.Furthermore,HDGCNN employs a dense connection architecture to link the output of each layer of the hyperbolic space with the initial node embedding,thereby effectively encoding high-order neighborhood information.
As an end-to-end graph representation learning framework,HDGCNN is designed to directly process raw graph data as input.However,since the original input graph is defined in the Euclidean space,we aim to enhance feature representation and alleviate graph embedding distortion by embedding the scale-free or hierarchical graph into the hyperbolic space.To achieve this,the node features in the Euclidean space are initially mapped into the hyperbolic space through the inception layer of HDGCNN.
LetG=(V,E) denote the input graph,andrepresent the input Euclidean node features.We establish the originas a reference point for the hyperbolic space Hn,k,which possesses a constant negative curvature of-1/k.This choice of origin facilitates the embedding of nodes from the hyperbolic space into the projection into the tangent space ToHn,k.
Figure 2:Projection of input nodes from euclidean space to hyperboloid manifold
Feature transformation plays a pivotal role in the graph convolution operation,serving as a crucial data enhancement technique to elevate low-dimensional node embeddings into a higher-dimensional space.By capturing inter-feature relationships,feature transformation enables the learning of more refined node features,thereby significantly enhancing graph representation learning performance.Moreover,for datasets with excessively high node feature dimensions,feature transformation can effectively reduce the computational complexity and improve overall efficiency.
The essence of feature transformation lies in mapping the node embeddings from the previous layer to the embedding space of the subsequent layer.Letdenote the node feature matrices of thel-th andl-1-th layers of the GCN in the Euclidean space,respectively.Additionally,letW∈be a learnable weight matrix.The feature transformation adopted by GCN can be formulated as Eq.(9):
In our endeavor to conduct feature transformation operations onXH∈Hn,kwithin hyperbolic space,we encounter a challenge due to the absence of a vector space structure in this setting.Consequently,it becomes necessary to execute feature transformations in the tangent space ToHn,k,which is situated at the reference pointowithin the hyperbolic space.
4.2.1FeatureTransformationBasedonIdentityMappinginGCNs
To exploit remote dependencies and non-local structural features among nodes,our objective is to develop a deep graph convolution network model.The iterative aggregation and update operations of the graph convolution layer facilitate the extraction of more refined non-local node features.However,a concern arises from the work of Klicpera et al.[51],who identified that frequent interactions between feature matrices of different dimensions can lead to performance degradation in the model.To address this issue,Chen et al.[26] drew inspiration from the concept of identity mapping in ResNet and propose the incorporation of both unit and weight matrices with varying weights.Specifically,as the number of layers deepens,the weight of the unit matrix increases while the weight value of the weight matrix diminishes.This feature transformation approach using identity mapping is expressed as follows,denoted by Eq.(10):
In the Eq.(10),Indenotes the identity matrix,whileδlrepresents the weight decay parameter,which adapts dynamically with the layer indexl.As the number of layers deepens,the value ofδlgradually decreases.Additionally,λin Eq.(11)is a manually set hyperparameter,controlling the decay rate ofδl.This design guarantees that the deep model can achieve performance at least on par with that of the shallow model.
4.2.2HyperbolicFeatureTransformationBasedonIdentityMapping
To perform feature transformation in the hyperbolic space Hn,k,a two-step process is employed.Firstly,nodes in the hyperbolic space are projected to the tangent space ToHn,knear the hyperbolic reference point using the logarithmic mapping functionfrom Eq.(7).Subsequently,feature transformation based on the identity mapping is carried out in the tangent space,and the resulting node features are then mapped back to the hyperbolic space using the exponential mapfrom Eq.(6).
LetXl,H∈R|V|×nandWl∈Rn×n'respectively represent the node feature matrix and weight matrix in thel-th layer of the hyperbolic space.We define the matrix multiplication in the hyperbolic space based on identity mapping as Eq.(12):
Neighborhood aggregation plays a crucial role in GCNs.It involves updating a node embeddingxiby aggregating information from its first-order neighborsIn deep models,iterative graph convolution layers enable the aggregation of high-order neighborhood information to capture global features.For hyperbolic neighborhood aggregation,the first step is to calculate the neighborhood weight coefficients.These weights can be computed using various methods,such as structural information[13,53],node distance[54],or feature attention[55].Next,we can aggregate neighborhood information using the computed neighborhood weight coefficients.However,these coefficients are calculated based on node features and reflect the importance between different nodes.As the number of layers increases,learned node features gradually stabilize.Continuing to use node features for calculating the weight coefficients may lead to overfitting.Therefore,we aim to primarily employ neighborhood weight coefficients for neighborhood aggregation in the shallow part of HDGCNN,and rely on the graph’s structural features for aggregation in the deeper layers.This way,HDGCNN can effectively utilize both structural and node feature information,adapting to deep model architectures.As for dynamically allocating the weight coefficients for structural and node features for neighborhood information aggregation,we will adaptively learn based on the model’s layer changes.Further details on this approach can be found in the subsequent sections.
4.3.1ComputationofAttentionCoefficientbetweenNodesonHyperbolicManifold
In our approach,we update the inter-node attention coefficients based on the hyperbolic node features learned in each layer of HDGCNN.This enables us to aggregate neighbor information considering both the structural information and the significance of the first-order neighbor nodes with respect to the central node.Our attention coefficient calculation method is inspired by the Graph Attention Network (GAT) proposed by Veliˇckovi´c et al.[56].However,the original GAT calculates attention coefficients based on node features in the Euclidean space.In our work,we extend this method to hyperbolic manifolds.Given the hyperbolic embeddings of two nodes at layerl,denoted aswe utilize the following Eq.(13)to compute attention weights between nodes:
4.3.2NeighborhoodAggregationBasedonStructuralFeaturesandHyperbolicAttentionCoefficient
To fully leverage structural information and node features for neighborhood aggregation,we propose a method based on initial structural features and hyperbolic attention coefficients,and construct an adjacency matrix using skip connections.Specifically,the adjacency matrix used in thel-th layer of HDGCNN is composed of the initial adjacency matrixAadj∈RN×Nand thematrix.
The initial adjacency matrixAadjcontains the structural information of the input graphG=(V,E).Each element ofAadjis either 1 or 0,indicating the presence or absence of an edge at the corresponding position,respectively.It is defined as Eq.(14):
In order to incorporate both structural information and inter-node attention coefficients during the process of aggregating neighborhood information,we propose the following method to calculate the adjacency matrix,as shown in Eq.(15).
The parameterδlin Eq.(15)represents the weight decay parameter that adapts with the number of layersl.As the number of layers deepens,δlgradually decreases.The value ofδlfollows the same formulation as in Eq.(11),whereλis a hyperparameter.We adopt this approach because the node features learned by the shallow model may not be as refined,so we assign more weight to the adjacency matrixAGAT,which contains node feature information,to facilitate its updates during backpropagation.Conversely,as the number of layers increases,the learned node features tend to stabilize.We aim to capture long-range dependencies and non-local structural features between nodes through the iterations of the deep model.Thus,we assign more weight to the adjacency matrixAadj,which contains structural information,while reducing the weight ofAGAT.The adjacency matrix construction method using skip connections is illustrated in Fig.3.
Figure 3:The adjacency matrix A is constructed using Aadj and AGAT with skip connections
4.3.3HyperbolicNeighborhoodAggregation
In order to effectively compute neighborhood aggregation weights by leveraging both structural information and feature attention,we introduce the adjacency matrix construction method with skip connections as described earlier.In the shallow layers,the aggregation of first-order neighbor information by central nodes is primarily influenced by the attention coefficient inAGAT.As the model goes deeper,the aggregation of neighborhood information by central nodes is predominantly influenced by the adjacency matrixAadj,which represents the structural information.AGATandAadjdynamically adjust their respective weight values according to the changes in the number of layers,ultimately forming an integrated adjacency matrixA.Consequently,our proposed neighborhood aggregation method on hyperbolic manifolds can be summarized as Eq.(16):
The nonlinear activation method is performed as Eq.(17).Initially,the node embeddings on the hyperbolic manifoldwith curvature -1/kl-1are mapped to the tangent planeat the originousingThen,a nonlinear activationσ(·) is applied in the Euclidean space to the mapped embeddings.Finally,is used to map the results after nonlinear activation back to the hyperbolic manifold
Since our HDGCNN adopts learnable curvature,each layer of the hyperbolic manifold has a different curvature,and non-linear activation enables the curvature to be updated across layers.Hence hyperbolic nonlinear activations(·)[55]with different curvatures are introduced.
To construct a hyperbolic deep graph convolutional neural network(HDGCNN)model that can effectively capture remote dependencies between nodes in scale-free graphs or hierarchical graphs with power-law distributions,as well as finer node features,and address the over-smoothing issue caused by deep models,we draw inspiration from the design principles of ResNet and DenseNet[57].We adopt the concept of skip connections to propagate the initial node featuresx0,Hembedded in the hyperbolic space from the initial layer of HDGCNN to the output of each subsequent layer,and further pass the output of each layer to the subsequent layers.This non-local message passing framework enables us to effectively build hyperbolic deep graph convolutional neural network models,as shown in Eqs.(18)and(19):
In the above Eq.(18),Mlrepresents the message function of thel-th layer.It aggregates the features of nodeiand its first-order neighbors in the hyperbolic space of thel-1-th layer,obtaining the hyperbolic node featuresafter neighbor information aggregation in hyperbolic space.Ulin Eq.(19)denotes the update function of thel-th layer.Leveraging dense connections,it adaptively accumulates and updates the initial node features,the output results of the previous layer,and the results of the hyperbolic graph convolution of this layer,ultimately obtaining the hyperbolic embedding of nodeiin thel-th layer.The specific update method is defined as shown in Eq.(20):
This section provides a comprehensive summary of all the modules employed in HDGCNN as mentioned earlier.Given an input graphG=(V,E),where the input Euclidean features are denoted asFirstly,the hyperbolic input features,are obtained by mapping the Euclidean features to the hyperbolic space through the initial layer,as explained in Section 4.1.These hyperbolic input features then undergo a hyperbolic transformation based on identity mapping,utilizing Eq.(21),as explained in Section 4.2.
Next,neighborhood aggregation is performed on nodes using structural features and hyperbolic attention coefficients,achieved by Eq.(22),as explained in Section 4.3.Subsequently,a nonlinear activation is applied to the neighborhood aggregation results using a learnable curvature,as presented in Eq.(23),as explained in Section 4.4.Finally,Eq.(24) is employed to create dense connections among the output results of each HDGCNN layer based on the novel non-local message passing framework,facilitating the capture of more refined high-order features of nodes,as elucidated in Section 4.5.
The aforementioned four steps constitute a hyperbolic graph convolutional layer,and HDGCNN is formed by stacking multiple hyperbolic graph convolutional layers.The algorithm description of the HDGCNN is shown in Algorithm 1.
In this section,we present an extensive series of experiments to thoroughly evaluate the performance of HDGCNN on both node classification and link prediction tasks.The main objective is to showcase the effectiveness and capabilities of HDGCNN in addressing these crucial graph-related tasks.To achieve this,we conduct a comparative analysis by benchmarking HDGCNN against shallow methods,neural network-based methods,GNN-based methods,hyperbolic GNN-based methods,and other state-of-the-art approaches.Moreover,we perform ablation experiments to assess the impact and effectiveness of our proposed building blocks.Through these rigorous and diverse experiments,we aim to establish the superiority of HDGCNN over existing methods.The results of these experiments will underscore the potential and significance of HDGCNN as a powerful tool for graph representation learning.
5.1.1Datasets
The HDGCNN proposed in this paper is specifically designed to handle scale-free or hierarchical graphs with tree structures,enabling the embedding of such graphs into hyperbolic spaces with reduced distortion.To assess the effectiveness of HDGCNN in handling tree-structured graphs,we carefully selected four datasets with varying degrees of tree-like characteristics.The details of these datasets are summarized in Table 1 for reference and comparison purposes.By conducting experiments on these datasets,we aim to demonstrate how HDGCNN performs in capturing and preserving the tree structures,and how it outperforms other methods in terms of node classification and link prediction tasks.
Table 1: Dataset statistics
We evaluate the tree-like characteristics of the datasets using Gromov’sδ-Hyperbolicity [58].A smaller value ofδ-Hyperbolicity indicates a higher degree of treeness in the dataset.For our experiments,we carefully selected four datasets with varyingδvalues,namely Disease,Airport,Pubmed,and Cora,to thoroughly assess the effectiveness of our proposed method.The Disease dataset represents a disease propagation tree,simulating the SIR disease transmission model [59],with each node representing either an infection or a non-infection state.The Airport dataset consists of airports and routes,where nodes represent airports,and edges represent routes between them.Pubmed and Cora [60] are citation datasets,where nodes represent scientific papers,edges represent citation relationships,and the node labels correspond to the academic fields of the papers.
5.1.2Baselines
To comprehensively evaluate the performance of HDGCNN,we compare it against a variety of state-of-the-art methods,which can be categorized as follows: shallow methods,neural networkbased methods (NN),graph neural network-based methods (GNN),and hyperbolic graph neural network-based methods.The shallow methods we consider are Euclidean-based embedding (EUC)and Poincare sphere-based embedding(HYP)[35].As for the NN methods,we include feature-based MLP and its hyperbolic variant(HNN)[34].For the GNN category,we select GCN[13],GAT[56],SAGE[14],and SGC[61],as these methods have demonstrated exceptional performance in various applications by fully leveraging node features and structural characteristics.Additionally,we evaluate hyperbolic GNN methods,including HGCN[55],HGNN[53],HGAT[62],and HGCF[63],which represent hyperbolic versions of the GNN approaches mentioned earlier.By comparing HDGCNN with these state-of-the-art methods,we can thoroughly assess its performance and demonstrate its advantages in various scenarios.
5.1.3ParameterSettings
For the sake of fairness,we adopt consistent dataset splits for all baseline methods and models.In the node classification(NC)task,we employ a 30/10/60%split for Disease,and a 70/15/15%split for Airport,which are designated for training,validation,and testing,respectively.As for the Pubmed and Cora datasets,we follow the same split strategy as proposed by Yang et al.[64].In the link prediction task,the edges in all datasets are randomly distributed with a ratio of 85/5/10%.Across all datasets,we set the learning rate to 0.01,dropout to 0,weight decay parameter to 0,bias parameter to 1,andλin Eq.(11)to 0.9.We configure the curvature to self-learn mode and employ the Adam optimizer[65]for model optimization.The maximum number of training epochs is set to 5000,with early stopping based on a window size of 100 epochs,where training stops if the validation loss does not decrease continuously for 100 epochs.The minimum number of training epochs is set to 100.We employ ReLU activation as the linear activation for hidden layers.The LeakyReLU activation function is used as the linear activation function in the hyperbolic attention coefficient module,with its negative slope set to 0.2.For numerical stability in optimization,we choose the Lorentz(Hyperboloid)model as the Riemannian manifold.The parametersrandtfor the Fermi-Dirac decoder used in the link prediction task are set to 2 and 1,respectively.
To ensure a fair comparison between HDGCNN and the hyperbolic GNN baseline method,we represent node features using a 16-dimensional representation.In this paper,we deploy a 32-layer HDGCNN model,while the hyperbolic GNN baseline method adopts a 4-layer model structure.These choices are made to provide a comprehensive assessment of the performance of HDGCNN in comparison with the hyperbolic GNN baseline method.The detailed model configurations and parameter settings can be accessed at the following link: https://github.com/FrancisJUnderwood/hdgcnn/tree/master.
We present the performance comparison of HDGCNN with other baseline methods in Table 2.For the link prediction task,we employ the area under the ROC curve (AUC) on the test set as the evaluation metric.For the node classification task,we use the F1 score to assess the performance of node classification.
Table 2: AUC for link prediction(LP)and F1 score for node classification(NC)tasks
Analyzing the results in Table 2,we observe that HDGCNN outperforms a considerable number of baseline methods on datasets with evident tree structures (lowδvalues).In the Disease dataset withδ=0,HDGCNN achieves an average accuracy approximately 26.5% higher in link prediction(LP)and 24.8%higher in node classification(NC)compared to the Euclidean-based GNN baseline method.Moreover,HDGCNN shows an average of about 10.8% higher accuracy in LP and 17.5%higher accuracy in NC than the hyperbolic GNN baseline method.HDGCNN also demonstrates competitive performance on the Airport and Pubmed datasets.
However,on the Cora dataset where the tree structure is less apparent (highδvalue),the Euclidean-based GNN methods HGCF and GAT attain the best results in link prediction and node classification tasks,respectively.Despite this,HDGCNN still achieves notable performance on the Cora dataset,showcasing its versatility and potential in various graph-related tasks.
5.2.1Analysis
We Analysis reveals the following key insights:
(1)By comparing the experimental results of HDGCNN with those of the Euclidean-based GNN method,we discern thatgraph data exhibiting a tree structure can indeed benefit from hyperbolic geometry.The experimental findings across the Disease,Airport,and Pubmed datasets substantiate this assertion.Notably,HDGCNN shines remarkably on the Disease dataset,which exhibits the highest degree of tree-like structure (δ=0),showcasing a substantial performance advantage over the Euclidean-based GNN baseline method.As the tree-like structure weakens (δvalue increases),HDGCNN maintains its significant performance lead over the GNN baseline method on the Airport and Pubmed datasets,albeit with a diminished margin compared to the Disease dataset.
(2) A comparative analysis between HDGCNN and hyperbolic GNN methods illustrates thathyperbolic GNN methods can indeed benefit from a deep model structure.This deduction is corroborated by results across the four datasets and is further validated in the subsequent node embedding visualization analysis in Section 5.4.3.Empirical findings demonstrate that HDGCNN,with its deep model architecture,surpasses all shallow hyperbolic GNN baseline methods in link prediction and node classification tasks.It is notable that HGCF and GAT achieve optimal outcomes in link prediction tasks and node classification on Cora datasets,respectively.Our analysis posits that this could be attributed to the fact that HGCF employs fewer operations on hyperbolic maps and extensively utilizes graph convolution operations within the Euclidean tangent space.Additionally,given the Cora dataset’s diminished tree-like nature,it tends to benefit more from Euclidean geometry.
(3)Furthermore,it is worth noting that MLP and its hyperbolic variant,HNN,are feature-based methods that do not leverage graph structural information and lack an information aggregation step.The discernible performance disparity observed between MLP,HNN,and HDGCNN,along with other hyperbolic GNN baseline methods,in the context of node classification tasks,demonstrates the significance and efficacy of neighborhood information aggregation in learning node feature representations.This observation reinforces the conclusions drawn by Chami et al.[55].
Collectively,these findings underscore the efficacy and versatility of HDGCNN across various graph-related tasks,reaffirming its position as a promising approach in graph representation learning.
In this section,we conduct a series of ablation experiments aimed at systematically validating the indispensability of each proposed component within HDGCNN.We meticulously assess the contributions of the identity mapping-based feature transformation module,the neighborhood aggregation module,and the non-local message passing framework,individually.To achieve this,we progressively exclude these modules from HDGCNN and evaluate the resulting model performance.The derived variant models,denoted as HDGCNN-Trans,HDGCNN-Agg,and HDGCNN-NoneLocal,correspondingly represent models in which the feature transformation method,neighborhood aggregation method,and non-local message passing framework have been substituted.In more detail,HDGCNN-Transubstitutes the feature transformation method with a conventional linear transformation[53,55,63],while HDGCNN-Aggemploys a conventional weighted aggregation [13] as a replacement for the neighborhood aggregation method.The performance of HDGCNN and the three variant models across node classification tasks on the four datasets is comprehensively summarized in Table 3.
Table 3: F1 score of HDGCNN and its variant models in node classification task
5.3.1Analysis
The analysis of the results presented in Table 3 reveals a consistent trend of decreased model performance upon the removal of each individual component.Notably,on datasets such as Disease and Airport,characterized by a pronounced treeness degree,the non-local message passing framework emerges as a pivotal element.The absence of this module leads to a conspicuous decline in HDGCNN’s performance.Similarly,on the Pubmed dataset,the omission of any module leads to a substantial reduction in performance.In the case of the Cora dataset,the removal of our proposed neighborhood aggregation module results in a significant performance degradation.
These observations provide further substantiation of our findings in Section 5.2.1,indicating that datasets characterized by a lower tree structure degree are more likely to benefit from Euclidean geometry.Moreover,these results underscore the essential role of neighborhood information aggregation in the effective acquisition of node feature representations.Concurrently,they affirm the superior performance of our proposed neighborhood aggregation approach in contrast to conventional methods.
In this section,we will conduct a quantitative analysis of HDGCNN’s capacity to alleviate oversmoothing,while also undertaking a qualitative assessment of its aptitude in capturing distant highorder node features through node embedding visualization.Initially,we introduce a smoothness evaluation metric to quantify the smoothness of each layer’s output in the model—smoothness,in this context,refers to the similarity among nodes within the graph.By comparing the smoothness across various layers of three hyperbolic graph neural network models—namely HDGCNN,HGCN,and HGCF—we substantiate HDGCNN’s suitability for constructing hyperbolic deep graph neural network architectures.These architectures excel in extracting refined node features from graph data characterized by tree-like structures,thereby mitigating issues related to over-smoothing.Subsequently,we proceed to visualize the node embeddings generated by the aforementioned hyperbolic graph neural network models.By discerning clustering patterns among different categories of nodes in the visualization space,we qualitatively underscore HDGCNN’s proficiency in capturing intricate node features.
5.4.1EvaluationMetricofSmoothness
The over-smoothing problem caused by deep graph neural network models originates from the fact that each node in the graph aggregates information from almost the entire graph due to the multi-layer graph convolution operation.This ultimately leads to the convergence of all node features,resulting in closely embedded nodes spatially.To address this,we employ the Mean Average Distance(MAD),as proposed by Chen et al.[66],as a quantitative measure.MAD assesses smoothness by calculating the average distance between nodes.Smoothness is utilized to represent the similarity of node embeddings:a larger MAD value signifies lower embedding similarity,while a smaller MAD value indicates higher similarity.The calculation method for the MAD value is presented in Eqs.(25)to(28).
Eq.(25)delineates the procedure for computing the MAD value of a given pair of target nodes,where 1(x)equals 1 if x>0,and 0 otherwise.Eq.(26)is utilized to determine the average of non-zero elements within each row of matrixDtgt.Within Eq.(27),Mtgtsymbolizes anN×Nmask matrix,while ○denotes the element-wise multiplication operation that filters information from theN×Ndistance matrixDusing the mask matrixMtgt.The calculation of the distance matrixDis presented in Eq.(28).Here,denote the hyperbolic feature vectors of nodesiandj,respectively,and the distance matrixDis computed by evaluating the cosine values between all pairs of nodes.It is important to highlight that the node feature matrixxHrepresents the hyperbolic output from the final layer of HDGCNN.
5.4.2QuantitativeAnalysisofOver-Smoothing
Ordinary GCN models typically exhibit a shallow architecture comprising 3 to 4 layers.To enhance the depth of the graph convolutional network and effectively capture non-local structural features and long-range interdependencies among nodes,we seek to extend the network’s depth.However,deeper networks often encounter the challenge of excessive smoothing.To address this concern,we present HDGCNN,a solution that facilitates the creation of deep network architectures without succumbing to the issue of excessive smoothing,thereby enhancing node feature extraction.
In Table 4,we present the F1 scores of three hyperbolic graph neural network models—namely,HGCN,HGCF,and HDGCNN—across varying layer model structures during node classification tasks on the Disease dataset.Our observations reveal that HGCN fails to yield performance improvements upon increasing the number of network layers.Notably,as the network depth reaches 16 layers,the model encounters a gradient disappearance issue,rendering it untrainable.Similarly,while HGCF’s performance deteriorates significantly with deeper network architectures,HDGCNN’s performance remains stable and even exhibits improvement.These findings underscore that hyperbolic graph neural networks are susceptible to over-smoothing,a challenge effectively addressed by HDGCNN.
Table 4: F1 score variation in node classification on the disease dataset by hyperbolic graph neural networks with varying layer depths
Subsequently,we embark on a quantitative analysis of the smoothness exhibited by distinct hyperbolic neural network models across varying depths.Depicted in Fig.4 are the Mean Average Distance(MAD)values of three models-HGCN,HGCF,and HDGCNN-across different layers while performing node classification tasks on the Disease dataset.Lighter colors indicate smaller MAD values,signifying more pronounced node over-smoothing.The outcomes observed in the analysis indicate a pivotal trend.Specifically,as the depth of HGCN reaches 16 layers,the MAD value converges to 0,indicating that the nodes in the graph have become indistinguishable.Notably,the MAD value of HGCF progressively diminishes with increasing network depth,with a particularly substantial decline by the time the depth reaches 16 layers.Conversely,HDGCNN’s MAD value remains consistently within a stable range as the network depth increases,exhibiting even a propensity to rise.This conspicuous trend reaffirms the efficacy of our proposed HDGCNN in seamlessly amalgamating a deep graph neural network architecture with hyperbolic geometry.More remarkably,it adeptly circumvents the over-smoothing conundrum characteristic of hyperbolic graph neural networks.
Figure 4:MAD values for HGCN,HGCF,and HDGCNN across layers in the disease dataset
5.4.3NodeEmbeddingVisualization
In order to visualize the node classification results,we project the node embeddings generated by the three hyperbolic graph neural network models into a 3-dimensional space.As depicted in Fig.5,each row illustrates the visualization outcomes of node embeddings produced by HGCN,HGCF,and HDGCNN across the four datasets.Correspondingly,each column corresponds to the node embeddings of the Disease,Airport,Pubmed,and Cora datasets from left to right.
Figure 5:Visualization of node embeddings
The visualization in the Fig.5 reveals that HDGCNN consistently attains superior classification performance on all datasets.Within the hyperbolic space,nodes of each class exhibit distinct clustering,with minimal confusion in node embedding representations.It is noteworthy that nodes closer to the hyperboloid’s origin generally occupy higher hierarchy levels within the tree structure.These outcomes underscore HDGCNN’s capability to seamlessly transform Euclidean-based node embeddings into hyperbolic embeddings,effectively preserving the hierarchical structure.
By comparing HDGCNN with HGCN and HGCF,we further confirm that deep model structures confer an advantage to hyperbolic graph neural networks.
5.5.1AttentionDistribution
In this section,we will delve into an analysis of the attention coefficients acquired by three models:HGCN,HGCF,and HDGCNN.The aim is to ascertain whether HDGCNN effectively employs the hyperbolic neighborhood aggregation method,grounded in structural features and hyperbolic attention coefficients,to learn attention weights.To initiate this examination,we introduce a disparity metric on the attention coefficient matrixApertaining to nodei,denoted asΔi=[67],whereUisignifies the uniform distribution score of nodei.The metricΔiserves to gauge the degree of deviation between the acquired attention coefficient and the uninformative uniform distribution.Notably,a largerΔivalue indicates a more meaningful learned attention coefficient.By contrasting the discrepancy metric distributions across the attention coefficient matrix (as given by Eq.(15)) obtained from HDGCNN’s output layer with those yielded by HGCN and HGCF,we present the specific outcomes in Fig.6a.Illustrated in Fig.6a are the disparity metric distributions of attention matrices learned by the aforementioned models on the Airport dataset.The figure distinctly reveals that the attention coefficients learned by HDGCNN exhibit larger variance.Conversely,the attention coefficients acquired by HGCN and HGCF primarily distributed in the range of small values ofΔi.This compellingly signifies HDGCNN’s proficiency in differentiating significant nodes,thus providing further substantiation for the augmented performance capabilities conferred by the hyperbolic neighborhood aggregation method grounded in structural features and hyperbolic attention coefficients.
Figure 6:(a)Attention weight distribution on Airport;(b)The number of model parameters varies with the number of layers of the model;(c)Influence of hyperparameter λ on HDGCNN performance
5.5.2ParameterSensitivityAnalysis
In this section,we will conduct a comprehensive sensitivity analysis to investigate the impact of the number of layers in HDGCNN on the total model parameters,as well as the influence of the adaptive decay parameterδl(as described in Eq.(11)) on the overall model performance.Fig.6b depicts the variation in HDGCNN’s model parameters as the 4-layer model structure doubles to a 32-layer model across four datasets.Notably,the model parameters for the Disease,Pubmed,and Cora datasets do not exhibit a doubling pattern with the increase in the number of layers.Meanwhile,Fig.6c depicts the fluctuations in node classification accuracy for the 32-layer HDGCNN across the four datasets as the hyperparameterλwithinδlis adjusted.By manipulating the value ofλ,the extent of information decay during the feature transformation stage can be controlled.It is evident from Fig.6c that the performance of HDGCNN remains consistently stable within theλrange of 0.1 to 1.5.This resilience indicates the robustness and reliability of the model.
In this study,we introduced an end-to-end hyperbolic deep graph convolutional neural network model named HDGCNN for enhancing graph representation learning.This model was designed to effectively harness the structural insights of scale-free graphs featuring hierarchical arrangements,capturing extensive node dependencies,and extracting nuanced node features.Initially,we facilitated the seamless transition of input features from Euclidean space to hyperbolic space,thereby conserving the graph’s hierarchical structure with minimal distortion.This amalgamation of hyperbolic geometry and GCN yielded a potent framework for capturing both structural and node-specific attributes more efficiently.Subsequently,our proposed methodologies spanning feature transformation,neighborhood aggregation,and message passing stages contribute to the creation of a robust deep network architecture that circumvents over-smoothing issues.By capitalizing on the inherent strengths of this deep network structure,we adeptly grasp long-range dependencies among nodes and extract more intricate node features.Ultimately,our empirical findings underscore the remarkable superiority of HDGCNN over Euclidean GCN approaches and shallow hyperbolic graph neural networks,particularly when applied to graph datasets characterized by pronounced tree-like structures.
In the future,we intend to further investigate the effective integration of multi-dimensional edge feature learning within the framework of HDGCNN.This is driven by the realization that,in real-world graphs,aside from the potential manifestation of scale-free or hierarchical structural characteristics,edges also encompass a plethora of valuable features.Our aim in the forthcoming research is to explore how to concurrently embed scale-free or hierarchical graphs into hyperbolic space while efficiently acquiring multi-dimensional edge features,in order to enhance the precision of node feature learning.
Acknowledgement:We would like to express our sincere gratitude to Fanliang Bu for his valuable advice during the writing of our paper.In particular,we are thankful to Hongtao Huo for his guidance and assistance in preparing the manuscript.Furthermore,we appreciate the reviewers for their helpful suggestions,which have greatly improved the presentation of this paper.
Funding Statement:This work was supported by the National Natural Science Foundation of China-China State Railway Group Co.,Ltd.Railway Basic Research Joint Fund(Grant No.U2268217)and the Scientific Funding for China Academy of Railway Sciences Corporation Limited(No.2021YJ183).
Author Contributions:The authors confirm contribution to the paper as follows: study conception and design:Y.Z.,F.B.;data collection:Z.H.,H.H.,X.L.;analysis and interpretation of results:L.B.,Y.W.,J.M.;draft manuscript preparation:Y.Z.All authors reviewed the results and approved the final version of the manuscript.
Availability of Data and Materials:The data that support the findings of this study are available from the first and corresponding authors upon reasonable request.
Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.
Computer Modeling In Engineering&Sciences2024年4期