Tonghe Ying(应通和), Jianbao Zhu(朱健保),†, and Wenguang Zhu(朱文光)
1Department of Physics,University of Science and Technology of China,and Key Laboratory of Strongly-Coupled Quantum Matter Physics,Chinese Academy of Sciences,Hefei 230026,China
2International Center for Quantum Design of Functional Materials(ICQD),Hefei National Laboratory for Physical Sciences at the Microscale,and Synergetic Innovation Center of Quantum Information and Quantum Physics,University of Science and Technology of China,Hefei 230026,China
Keywords: machine learning potential,gold cluster,first-principles calculation
For solid material, such as binary compounds and supported nanoparticles,[1,2]determination of its global stable structure is the cornerstone for a detailed study of its mechanical,electronic,magnetic,or chemical properties. In principle,the structure can be determined computationally by searching the potential energy surface (PES) of the system. To obtain the ground-state structure, unbiased search algorithms, such as genetic algorithm(GA),[3]basin-hopping(BH),[4]particleswarm optimization(PSO)[5]have been developed. However,all of these methods require a large number of energy and force calculations in the framework of density functional theory(DFT).Since the computational cost of DFT grows cubically with the size of the system and the configuration space also grows exponentially with the number of atoms,[6]the global structure search for large systems may be unpractical.In this context,alternative methods like empirical potentials[7]are often employed, but the transferability and accuracy are not always satisfactory.
During lasts decades, ML techniques have been proved useful in various aspects of computational material physics.[8–16]However, most of them are used to assist the simulation of molecular dynamics (MD), where ML potential only needs to cover a part of configuration space.[17]In order to do the global optimization, we have to obtain an ML potential covering almost the whole configuration space,which can be very challenging. Nevertheless, once such an ML potential[18–20]is constructed,e.g.,neural-network-based approaches,[21–23]Gaussian approximation potentials,[24]it can largely accelerate the global optimization benefiting from the speed of ML predictions and still preserve the accuracy of first-principles calculations at the same time. Because of the characteristic of atomic systems, they can be naturally regarded as a graph,with vertices representing individual atoms and edges representing bonds. Therefore, high dimensional neural networks,especially convolutional graph neural networks,[21,25–29]which operate on graph structured data,are suitable to speed up the global optimization. These neural network potentials are based on atomic energy contributions and can be applied to systems of different sizes,which go beyond the limitations of previous neural networks for fixed-size systems.[30]
Gold is one of the most inert metals in the periodic table, but at the nanometer scale it exhibits outstanding properties for nanoscience such as nanoelectronics, nanobiology,and nanocatalysis.[31–34]Since many intriguing structures and properties are found for nanometer-sized gold clusters, they are attracting interest as the building blocks of novel nanostructured materials and devices.[35,36]Among these applications,cluster structures play an important role in determining their properties, and some of them are highly stable at magic number sizes.[37]In this work,we utilized the combination of PSO[5]and a DFT trained SchNet potential[38]to accelerate the global optimization of Au clusters whose size is below 20 atoms. Trained with our newly developed dataset,the SchNet model shows a great performance on predicting the energy and forces of Au cluster structures in the process of optimizations. With this ML potential accelerated global optimization,we have successfully reproduced the well-known tetrahedral structure of Au20[35]and revealed the 2D–3D structural transition of Au clusters.
The workflow starts with PSO, a branch of evolutionary methodology for crystal structure prediction. Within the PSO scheme, each particle is influenced by both the best local and the best global particle,i.e., a particle can learn from its past experiences to adjust its flying speed and direction.Here,the population size is set to be 150 and other parameters are maintained as suggested by Wang.[5]In order to explore the search space of clusters as widely as possible to obtain low energy structures, we do not impose the symmetry constraint on the structure generation step. The most expensive part of the global optimization algorithm is the local relaxation performed at the DFT level. However, in our method,these calculations are replaced by the ML potential, which is obtained by training on our developed dataset. Since the best global particle of each generation largely controls the evolution of PSO, it is reevaluated by calculating the energies of the best local particles obtained from the ML potential at the DFT level. It is worth noting that our method can be classified into an evolution-type algorithm, which puts emphasis on structure search by generational change,while a selectiontype algorithm such as Bayesian optimization[39]aims to select structures preferentially from a large number of candidates by ML. Compared to the selection-type algorithm, generated initial structures by our method show a trend to more stable structures as generations increase. Moreover, in our method,we utilize a well-trained ML potential to replace the timeconsuming local optimization by DFT, promoting the acceleration of structure search.
In order to reduce the total computational cost during the structure exploration,we exploit a graph convolutional neural network called SchNet[38]to describe interatomic potentials in the Au cluster system,which is a combination of a graph neural network and a convolutional neural network. The graph neural network regards atoms and pairwise interactions between them as nodes and edges, respectively. These nodes and edges are represented by feature vectors and they can be iteratively updated through convolution layers. The convolution layer is actually a message-passing function,which gathers information from neighboring atoms within a cutoff radiusRcand updates the feature vector of the central atom. To have an intuitive understanding of SchNet,Behler–Parrinello neural network(BPNN)[40]is often utilized for comparison,which is widely used last decades.In BPNN,the chemical environment around an atom is represented by a set of symmetry functions,which are rotationally,translationally and permutationally invariant, satisfying the requirement of descriptors. In order to distinguish between different chemical elements,each chemical element has its own neural network architecture with fitted parameters. However, in SchNet, descriptors are replaced by iteratively updated feature vectors, whose parameters can be learnt through training. Furthermore,all chemical elements in SchNet share the same architecture, the uniqueness of them guaranteed by respective randomly initialized feature vectors.Figure 1 shows an overview of the SchNet architecture. At each layer, the cluster is represented atom-wise analogous to pixels in an image and interactions between atoms are modeled by interaction blocks where radial basis functions
play a key role in expanding the interatomic distances,riandrjrepresenting the position coordinate of atomiand atomjrespectively. The centersμkare chosen uniformly every 0.2 ˚A between zero and the distance cutoff 5 ˚A while the scaling parameterγ=1 ˚A.Interactions of atomias the convolution with surrounding atoms can be formulated as
where°represents the element-wise multiplication andlmeans the layer. The filter-generating networkWl:R3→RFmaps the atom positions to the filter values ofFdimensions.In the pooling layer,the total energy of a system is described as the summation of atomic contributions
where the atomic energyεiis a function of its chemical environment.
For the training of the model, the forces are included in the training loss function.The interatomic forces are related to the energy,so we can obtain an energy-conserving force model by differentiating the energy model w.r.t. the atom positions
Then a combined loss
can be defined,whereρis the trade-off parameter between the energy and forces loss. To minimize the loss function, after having tested many different hyperparameter settings, we finally use the ADAM optimizer[41]with 32 mini-batches and the learning rate is decayed with a ratio 0.96 from 10-3to 10-5once learning stagnates. The selection of hyperparameters,such as the trade-offρ,feature dimensionsF,interaction numbersT, will be discussed later, and the others follow the setting proposed previously.[38]
Fig. 1. Illustrations of the SchNet architecture overview. The shifted softplus is defined as ssp(x)=ln(0.5ex+0.5)and the radial basis function is abbreviated to rbf. Wl(ri-rj)is the filter-generating network(orange color)and the box in gray represents the interaction block,which can be repeated for n times.
The dataset used to train the ML model is a key factor.To ensure its diversity, data of 14 different sized Au clusters(Au10~Au19, Au21, Au26, Au33, Au35), with each size containing 100 different configurations, are used as initial structures for the simulated annealing (SA). Thus, an initial training dataset of 1.4×106structures is generated by SA from 1000 K to 0 K, which is kept 20 ps and the time step equals 20 fs. However,there are some unreasonable or similar structures in these generated data, which need further refinement.First, structures with the absolute value of any component of forces being above 10 eV/˚A,which are considered as unphysical configurations, are removed from the dataset. Second,if some configurations are considered to be similar, only one of them can be kept. The two steps of refinements make the initially generated dataset decreased from 1.4×106to 38276 configurations.
In the present work, we ensure that two structures are considered different by employing an interatomic and energy comparison criterion.[3]If the difference of energies of two configurations is more than 0.1 eV,they are considered as different configurations.Otherwise,the structure comparison criterion needs to be employed. A sorted list,Di, of all interatomic distances is calculated for all candidatesi. Two configurationsiandjare regarded equal,provided
kindexes each entry in the listDi. The first criterion is the relative accumulated difference between two candidates and the second criterion is the maximum difference between two distances of the two candidates. Typical values areδrel=0.03 anddmax=0.7 ˚A.
The first-principles DFT calculations were performed using the ViennaAb InitioSimulation Package(VASP).[42]Valence electrons were described using the projector-augmented wave[43](PAW)method. The plane wave expansions were determined by an energy cutoff of 250 eV. The exchange and correlation functional was treated using the Perdew–Burke–Ernzerhof (PBE)[44]parametrization of generalized gradient approximation (GGA) for structural relaxations and total energy calculations. The isolated cluster was represented by a large cubic supercell of edge length 20 ˚A and obviously only the Gamma point was used for itsk-point sampling.Optimized atomic structures were achieved when forces on all the atoms were<0.03 eV/˚A and the energy was<1×10-4eV.Because the spin-orbit coupling(SOC)is positively correlated with the atomic number while Au has a large atomic number,to obtain the energies of found structures more accurately,we have also carried out the calculations including SOC, which is implemented in the noncollinear version[45]of the PAW method.
In order to obtain accurate energy and force predictions,we first perform a model selection on the given reference data.Table 1 lists the results for various settings of numbers of interaction blocksT, numbers of feature dimensionsFof the atomic representations,and the energy-force trade-offρof the combined loss function. It is obvious that the model’s ability to predicting the energy and forces becomes better asTandFincrease.Finally,the model architecture ofT=6 andF=128 works best for the energy as well as forces. For the energy predictions, we obtain the best result when usingρ=0.1 as this puts more emphases on the energy loss while we achieve the best force predictions whenρ=0.01. Here,we select theρ=0.1 model as this achieves the most balanced performance.
Table 1. Mean absolute errors for energy and forces predictions. We compare SchNet models with varying numbers of interaction blocks T,feature dimensions F,and energy-force tradeoff ρ. The selected model is in bold.
We train a series of neural network (NN) potentials by gradually enlarging the size of the training dataset from 3827 to 30620.Figure 2(a)shows the evolution of the mean absolute error(MAE)of the energy and forces for these NN potentials,respectively. It is clearly seen that the value of the MAE of both the energy and force significantly decreases as the size of training data increases, indicating a gradual improvement of the potentials. The NN potential with 30620 training data gives the best performance, and will be used in the following work. The training process is done iteratively and after every iteration, testing of the trained weights is done by calculating the MAE of the energy and forces. After the 787th iteration the learning rate is below 10-5,and the training process is stopped. The difference between the MAE of training and testing is 0.043 eV for energy and 0.012 eV/ ˚A for forces,as shown in Fig.2(b). The correlation plot between DFT energy (forces) and NN energy (forces) of the testing dataset is shown in Fig.2(c). It is evident from the plot that the accuracy of our NN potential is comparable to that of DFT.
Since Au20is a tetrahedral structure that was confirmed by experiment and DFT calculation,[35]we take a look at the PSO evolution of its energy distributions, which is shown in Fig. 3. At the beginning of the search, found structures are in relative high-energy region, while their energies decrease quickly as the evolution continues. The structure that is found in the 3rd generation is an intermediate product, a local minimal one shown in the inset. The minima has been kept for several generations,until the global ground structure is found in the 21th generation. We can see that Au20is the famous tetrahedral pyramid,which is proved to be highly stable and is a unique molecule with atomic packing similar to that of bulk gold. It is important to notice that we do not include Au20into our training set,making the found Au20ground structure more convincing.
Fig. 2. (a) The evolution of the MAE of the predicted energy (left panel)and forces(right panel)as a function of the training set size. (b)The MAE of the energy and forces of training and testing dataset decays as a function of iterations. (c)The comparison of the energy and forces calculated by the ML model and DFT.The black dashed line is the result of the least square fitting.
Fig. 3. Evolution of energy distributions during PSO structure search for Au20 clusters,where local optimization is conducted by NN potential.Structures in the inset are a local minimal configuration and the ground one.
Fig. 4. Structures of ground state candidates, found using the SchNet model. The final binding energies are all reoptimized by DFT. There are three items below every structure picture.The first one indicates the number of atoms,while the second and the third items mean the binding energy calculated by PBE and PBE+SOC respectively. The unit of values is eV.
In addition, we perform structure searches for Au clusters between 4 and 20 atoms, with final configurations reoptimized by DFT, as shown in Fig. 4. To get more accurate energy results,we calculate the binding energy of Au clusters both with PBE and PBE+SOC respectively.The binding energies obtained including the SOC are higher than those without including the SOC for both 2D clusters and 3D clusters. It is obviously seen that small sized Au clusters tend to form planar structures while the larger ones are more likely to have stereo structures,with the critical size for the 2D–3D structural transition being Au14,which is generally consistent with previous research.[46]Furthermore, it can be noticed that many structures evolve from the prior structure with one atom added in a certain position. All 2D Au clusters whose size is below 14 atoms can be described by this rule and the tetrahedral pyramid of Au20can also be regarded as the evolution from Au19with one atom added in the top.More interestingly,some structures found by PSO even have lower binding energies than those reported by former literatures.[46,47]For example, the binding energies of Au13, Au15, Au17, Au18are lower by 0.214 eV,0.191 eV,0.326 eV,0.370 eV than that in the previous literatures,after we recalculate their binding energies with the same parameter of PBE+SOC in this paper. We also plot the energy per atom of our predicted clusters as a function of the cluster size,as displayed in Fig.5,showing an overall downward trend. This is due to the increase of the average number of nearest-neighbors with increasing size. Moreover, the binding energy per atom of even-sized clusters shows a little lower than the ones of their odd-sized neighbors, which can be explained by the effect of electron pairing. It is also remarkable that Au7is one local maximum point in Fig. 5, whose lower symmetry compared to neighbors Au6and Au8results in an increase in energy.
Fig.5.Energy per atom of ground structures,obtained by the SchNet model.The energy values are given by DFT.
In this work,we developed an ML atomic potential serving the task of structure search for Au clusters,which has two major advantages. Firstly, SchNet considers the structure as a graph, with atoms representing nodes and interactions between atoms representing edges. This kind of representation bypasses tedious descriptors and makes it possible to be applied in the field where lacking of expert knowledge for the design of descriptors. Secondly, our ML potential replaces the time-consuming local optimization performed by VASP,which allows more structure optimization trials for large systems. By an extensive search in the size range of 4–20, we exhibited their low-lying structures and revealed that the 2D–3D structural transition takes place at the cluster sizeN=14.Such a successful search for low-lying structures of Au clusters not only proves that the current SchNet model is robustly and efficiently valid in seeking low-lying candidates of Au clusters but also indicates that ML is indeed powerful in interatomic potential modeling.
The code and supporting data for this article are available from Ref.[48].
Acknowledgements
Computational support was provided by Supercomputing Center in USTC and National Supercomputing Center in Tianjin.
Project supported by the National Key Research and Development Program of China (Grant Nos. 2017YFA0204904 and 2019YFA0210004).