General seismic wave and phase detection software driven by deep learning

2021-12-09 00:52MingZhaoJiahuiMaHaoChangShiChen
Earthquake Research Advances 2021年3期

Ming Zhao ,Jiahui Ma ,Hao Chang ,Shi Chen

a Institute of Geophysics,China Earthquake Administration,Beijing,100081,China

b Beijing Baijiatuan Earth Science National Observation and Research Station,Beijing,100095,China

c National Space Science Center,The Chinese Academy of Sciences,Beijing,100190,China

d Institute of Microelectronics,The Chinese Academy of Sciences,Beijing,100029,China

Keywords:Deep learning neural network Seismic phase detection Docker container

ABSTRACT We developed an automatic seismic wave and phase detection software based on PhaseNet,an efficient and highly generalized deep learning neural network for P-and S-wave phase picking.The software organically combines multiple modules including application terminal interface,docker container,data visualization,SSH protocol data transmission and other auxiliary modules.Characterized by a series of technologically powerful functions,the software is highly convenient for all users.To obtain the P-and S-wave picks,one only needs to prepare threecomponent seismic data as input and customize some parameters in the interface.In particular,the software can automatically identify complex waveforms(i.e.continuous or truncated waves)and support multiple types of input data such as SAC,MSEED,NumPy array,etc.A test on the dataset of the Wenchuan aftershocks shows the generalization ability and detection accuracy of the software.The software is expected to increase the efficiency and subjectivity in the manual processing of large amounts of seismic data,thereby providing convenience to regional network monitoring staffs and researchers in the study of Earth's interior.

1.Introduction

Accurately obtaining the seismic phase arrival time is the most fundamental and important task for research regarding seismic distribution and tomography.Generally,earthquake arrival times are manually picked by experienced analysts.Such process has strong subjectivity and is often time-consuming.At the same time,due to the exponentially increased seismic stations and data,how to process data and produce reliable public earthquake catalogue have become more and more challenging.Therefore,a fast,accurate and practical seismic waveform arrival detection system seems to be an urgent need in the earthquake industry(Bergen et al.,2019;Perol et al.,2018;Zhao et al.,2019a,b).

For decades,people have developed a variety of earthquake and phase detection algorithms,such as STA/LTA (Allen,1978;Baer and Kradolger,1987;Withers,1998),AB-AIC (Akaike,1974;Sleeman and van Eck,1999;Akazawa,2004),and template matching (Zhang et al.,2015;Peng et al.,2009).However,only few algorithms can simultaneously achieve high detection sensitivity,general applicability,and computational efficiency(Yoon et al.,2015).

In recent years,machine learning technology has been developed rapidly and is widely used in various fields such as image classification,speech recognition,etc.In terms of seismology,this technology is showing high sensitivity,robustness and efficiency in the detection of small earthquakes and phase picking (Ross et al.,2018;Wang et al.,2019;Zhu et al.,2019;Zhu and Beroza,2018;Zhao et al.,2019a,b),and thus playing a more and more important role in seismic data processing and cataloging.

Nevertheless,machine learning methods are relatively new to many earthquake researchers and frontline monitoring and forecasting personnel.In addition,the training of machine learning models often requires extensive labeled data and computer resources,which are relatively rare.Finally,the installation and use of the machine learning code is usually complicated and requires users who master a certain computer expertise.Therefore,it is necessary to develop efficient,economical,and user-friendly seismic data processing software based on the state-of-the-art machine learning algorithms and models.

Fig.1. Working principle of the software.The PC side send the raw data and parameters to the server side,and calculation is done on the server side using the docker.The docker act as a Blackbox to process the data and return results to the server.After the server receives the results,it will send the results back to the PC side.

Fig.2. Software architecture diagram.

In this study,we develop the Seismic Wave and Phase Detection Software(SWPHDS),which consists of a user-friendly interface of custom control panel based on C# language and .NET Framework 4.0,an encapsulated docker image for data processing which is deployed either remotely or locally,and the necessary communication module between these two.The docker image is the core function module,it integrates ObsPy (https://www.obspy.org) for data preprocessing (Krischer et al.,2015),a pretrained deep learning model run on tensorflow (https://www.tensorflow.org/) for automatically detecting earthquakes and picking P and S phase arrival times,and a postprocessing module based on Matplotlib (www.matplotlib.org/),pandas (https://pandas.py data.org/) and NumPy (https://numpy.org/) to output the predicted results in figures and tables.The deep learning model used here,called PhaseNet,is trained using more than 700,000 three-component waveforms with P-and S-wave arrival times labeled from Northern California earthquake catalog during 2000–2017,and has good generalization ability on local earthquakes with an epicenter distance less than~100 km(Zhu et al.,2019;Zhu and Beroza,2018).The docker container can be easily deployed on mainstream operating systems such as Windows and Linux,as well as on the cloud,like azure.Furthermore,it allows developing,deploying and running applications inside isolated containers,which greatly facilitates the users because the cumbersome installation process is omitted.SWPHDS supports common seismic data formats such as SAC,MSEED,NPZ (a kind of NumPy format),and even text files.Through GSPDS,the arrival time of seismic P-and S waves can be accurately and efficiently determined from the original continuous waveform and outputs as tabular data(CSV,JSON,TXT)or PHA format.The PHA format phase picking files can be imported into the JOPENS software,which is a widely used data processing system of digital seismic network in China(Wu and Huang,2010;Zhao et al.,2020a,b),for further analysis.

2.Software design and functional modules

The software design mainly considers two points.The first one is scalability.The software adopts a modular design,and can be updated periodically according to user's needs.Its functions can also be expanded.By far,the software has four functional modules(Fig.1):①configuration panel;②communication module;③data process module;④data visualization.As shown in Fig.2,different modules will be deployed in different places:1,2,4 consist of the frontend user interface and are deployed on the PC side,3 is the background calculating module and is deployed on the server or is locally using the docker toolbox.More details about the working principles will be provided in section 2.

The second is to meet the needs of users at different levels.We design both the local and network version.The network version is for beginners who lack the knowledge of deploying docker container.It has a userfriendly interface and automatically connects to our remote server preconfigured on the cloud.Thus,users only need to upload their data,specify custom parameters,and the data and parameters will be transferred and processed on the cloud automatically.For professionals,they have more choices:they can either choose to configure a Linux server with docker service themselves,or build an encapsulated docker container image that can be loaded on the local computer.The advantage of local version is more stable and faster,section 2 will provide more details on how to install and configure on local computer under windows system.

1) Front Control Panel.As shown in Fig.3 and Fig.4,all kinds of operations can be accomplished by simply clicking the front control panel.The usage of the front control panel is described in detail in Section 2.

Fig.3. Login page.

2) Communication module.It communicates with the server or locally deployed docker container,uploads the data to the server,downloads the calculation results to the user specified location after the calculation.We use SSH communication protocol for connection and SFTP protocol for upload and download files.SSH is an encrypted network protocol used to run network services securely over an insecure network.Any network service can be protected using SSH.SFTP is a network protocol that provides file access,file transfer,and file management through any reliable data stream.

3) Data process module.The docker image integrates three functions:

a) Data pre-process.First we de-mean and de-linearize the data,filter at 1 Hz,normalized by standard deviation,then resampled at 100 Hz and divided equally into slices of 3000 sample points with 1500 sample points overlapping.Finally,the three pre-processed channel waveform slices are used as the standard input of the PhaseNet model.These are all processed by ObsPy package,which is an open-source Python framework for processing seismic data.

b) Phase picking.Predict P and S arrival times from the pre-processed data using the pretrained PhaseNet model.The model runs on tensorflow 1.10 or higher version.GPU acceleration and multicore processing are used to accelerate the process.The results are two probability distribution curves corresponding to P and S phases,the sample point corresponding to the maximum probability value stands for P and S arrival times.The default threshold to trigger a phase detection is 0.5.It can be changed at the configuration panel.

c) Data post-processing.We can choose to plot the three-channel seismograms with P and S pick results as well as the probability curves.We can also choose to output the P and S picks in table or text files.Or,we can do both.

4) Data visualization.The outputs from the remote server (or local docker) are then transferred automatically to the local machine and displayed under the“Result”and“Chart”menu.Users can see the results intuitively from the graph and export the table data for further analysis.

3.Installation and implementation

In this section,we will give a short description of the software installation,and more details about the frontend user interface and the background calculation docker container.

3.1.Software installation

Currently,the user interface of the software needs to be installed on the windows operating system (win 7 or higher) with .NET Framework 4.0 support.The .NET Framework is a computing and communication platform to achieve heterogeneous language and high interoperability of the platform.It mainly includes common language runtime and .NET framework class library.The main advantages of .NET include crosslanguage,security,and support for open Internet standards and protocols.It is compatible with almost every windows computer.

As usual,there are slight differences between the online version and local version.After the installation process,each user can directly use the online version which connects to our server deployed on the azure cloud.For the local version,users need to install Docker Toolbox first (htt ps://github.com/docker/toolbox/releases),and then load the pre-built docker image designed for seismic data processing.Docker is a set of platforms as a service(PaaS)product that uses OS-level virtualization to deliver software in packages called containers.Containers are isolated from one another,they allow you to package and isolate applications with their entire runtime environment—all of the files necessary to run.This makes it easy to move the contained application between environments (dev,test,production,etc.)while retaining full functionality.

3.2.Software implementation

3.2.1.Frontend user interface

(1) Login

As Fig.3 shows,after launching the software,users need to log in with their account password and choose their preferred version (local or online).For the network version,you need to input the online server IP and test whether the connection is successful.To ensure the security of the remote server,the software assigns an initial account and password for each user at the login page.Users need to update the profiles in the server database to change their accounts.For the local version,you can change your account and password in your locally deployed Docker Toolbox.

Once the account checks pass,for the network version,it will connect to the server and set a dedicated folder for the current user to temporarily store uploaded files and calculation results.For the local version,the system will call and run the Docker Toolbox in the background.

For convenience,after successful login,the program will save the user name,password,and other information in the xml file.When the user enters the login page again,the program will automatically read the xml document.

(2) Parameter customization

The second part is the parameter customization menu,as shown in Fig.4.User should operate according to the following steps:①add data files by clicking the“change”button.For the local version,users are also required to specify the path of the shared folder;②specify necessary parameters such as the format of input data (MSEED,SAC or TXT) and output results (CSV,TXT or JOPENS),probability threshold for phase picking(usually 0.3–0.7),and computing acceleration scheme(like GPU and multi-threading),etc.③click the“start”button to begin calculation;④The results will be displayed under the“Result”and“Chart”menu,and you can also choose to export the results by click“export”.

Fig.4. Main page.(Probability threshold:Only the results with a probability value greater than the threshold will be output;Operation equipment:CPU or GPU;Multithreading:It can run and calculate more than one docker image;Output type:Output format can be csv,json or jopens;Print picture:It can print all of the pictures or partial pictures with higher probability).

(3) Data Upload

Now the user can start uploading the data.Click the“add”button above the toolbar,and a window for selecting data files will pop up.The input file can be one file,multiple files or even a folder.For the local version,the data files will be put into the shared folder,and then the user should call docker container to process.For the network version,the data files will be sent to a folder on the server,waiting for the system to send instructions for calculation.We define global variables AllUploadFiles and AllRecieve Paths to save selected files and directories for receiving outputs.

3.2.2.Background computing

After the user click the“start”button,the software will send a request to the server or locally deployed docker container.Then the docker container will load the docker image to process the data.We have already built a docker image which contains all the needed packages,including Python 3.6,ObsPy 1.1.1,Tensorflow 1.10,NumPy 1.18,Pandas 1.0 and Matplotlib 3.1.Besides,two shared folders for input data and output results need to be created locally and mapped to the corresponding directory in the docker container.

For the network version,the whole calculation process is done at the server.Thus,the system will send instructions to the docker installed on the server.The docker will calculate the files in the input directory under the user's folder and save the calculation results in the output directory under the user's folder.

For the local version,the program will send instructions to the docker toolbox which is running in the background.The docker toolbox works similarly with the docker on the cloud.

3.2.3.Data download

For the network version,we set a flag in the program to monitor the calculation status.If the calculation process is completed,the flag will change,and the program will start to download the results from the server automatically.

For the local version,the calculation results will be copied to the output folder and then automatically deleted in the shared folder.For the network version,the program will download the results from the server and then clear the data in the server.

3.2.4.Display in table and chart

After the results are downloaded,users can view the results in the form of table and chart by clicking“Result”and“Chart”.Such exhibition method makes the results more intuitive and facilitate user's analyses and researches(Figs.5 and 6).

For the table display,users should save the output in a python dictionary.Displaying pictures in the system mainly depends on the Data-GridView class.After defining the DataGridView class,users should add all the results in the(dictionary or directory?),and the results will finally be displayed line by line.

The chart display corresponds the output picture to the calculation results and shows the resulting picture in the original order by default.The picture display mainly depends on the BitMap class.After defining a BitMap object,the system only needs to specify the picture name in the object,and the results thus can be printed to the system window.

4.Application

We test the software using the Wenchuan Earthquake Aftershocks Classification Dataset (Zhao et al.,2020a,b),which is not used in the PhaseNet training process.The aftershock waveforms are recorded by 16 seismic stations in the vicinity of the Longmenshan fault zone in Wenchuan county and its neighboring areas(30°N-33.5°N,102°E-106°E)from July to September 2008.The data include 9 909 waveform segments from 1 765 aftershocks(ML1.0–6.6),and 10,939 noise waveforms randomly intercepted from the continuous waveform.The signal-to-noise ratio (SNR) range for the waveforms (also called positive samples) is 0–90 dB,and that for most noise waveforms is-30-10 dB.

Fig.5. Results shown in table.

Fig.6. Results shown in chart.

Fig.7. The distribution of residuals of P picks (a) and S picks (b).The statistical results of P,S automatic picks that can match the manual picks according to each station (c).The SNR of waveforms (manual) and those identified by the software(Auto).

For the local version,it takes about 2 500 s to process 20 848 waveforms on the DELL Precision 7 520 mobile workstation(Intel Xeon CPU 3.0 GHz,32 GB memory).For the network version,since the data processing is done on a more powerful server (DELL Precision 7 920 Tower),it takes much less time(<800 s).However,the communication(including upload data and download results) with the server takes nearly the same or even longer time as the calculation,but the advantage is also obvious:your laptop does not need to be a powerful one.

The probability threshold we choose for phase picking is 0.5.The first thing we check is whether the software can effectively distinguish events and noise.As a result,we find that 8 615 out of 9 909 waveforms are correctly identified(with at least one pick).For the 10 939 noises,1 034 are identified as events(at least one pick),and the remaining 9 905 are not triggered,suggesting that our software can distinguish events and noise effectively.

Then we compare the automatic P and S picks with their own labels(the so-called“ground truth”)to verify the accuracy of the picks.As can be seen from Fig.8,the PhaseNet algorithm used in our software has few false triggers,it identifies 7 638 pairs of P and S picks from 9 909 waveforms.Fig.7a shows the time residuals between automatic P picks and manual picks,and most of them are located in[-0.5 s,0.3 s].Fig.7b shows the time residuals of S picks,and most of them are located in[-1.0 s,0.3 s].

Next,we visually examine the statistical results of each station and remove three stations (LUYA,MIAX,WCH) because of their poor data quality and limited available phases.Fig.7c shows the results of the remaining 13 stations.It is obvious that several stations(HSH,JMG,MXI,PWU,QCH,XJI,YZP) possess higher detection accuracy due to the fact that they have more samples and better data quality,and are closer to the Longmenshan fault zone(less than 50 km).By contrast,for stations such as JJS and SPA,although they have reasonable sample numbers,the distances are relatively larger(larger than 100 km),slightly exceeds the scope of PhaseNet algorithm,resulting in moderate results.The last group of stations including WDT,XCO,YGD have the smallest sample numbers,and the data quality is poor (noisy and interrupted),thus their results are the worst.

Finally,we display the signal-to-noise(SNR)ratio distribution of the picks(Fig.7d):for those SNR ≥40 dB,tthe vast majority of manual picks can be detected successfully by our software,for SNR within the range of 10–40 dB,more than 60% can be detected,and for SNR ≤10 dB,the detected rate will reduce to less than 50%.

5.Discussion and conclusions

Using a number of computer technologies,including C#for software interface development,Python,docker container for data processing,and SSH,SFTP for communication,this study develops a seismic wave detection and arrival time picking software to obtain accurate seismic P and S phases arrival times from continuous waveform directly,providing earthquake researchers and analysts with convenient operation and integrated analysis tools.By selecting different parameters,users can balance aspects such as running performance and comprehensive output.A test on the dataset of the Wenchuan aftershocks proves that the software has high ability to automatically distinguish events and noise,accurately pick P and S arrival times with reasonable computing time and computing resources.The easy-to-use feature of the software also opens a window for amateur seismologists to get start with earthquake detection and seismic data processing.

Fig.8. The comparison between phase picking results by the software and the manual picks as“ground truth“from 16 stations,where P(blue solid line)and S(red solid line) are manual picks,(blue dotted line)and(red dotted line) are the software picks.

Since it is the SWPHDS version 1.0,there are still many aspects that need to be improved:(1)More parameter options can be moved to the user interface from the docker container,including the filtering range,cutting length of the waveform,the switch options for demean,delinearize,normalize,etc(should be in the same form as the first two examples);(2)theuploadinganddownloadingprocessesneedtobeaccelerated;(3)more phase picking algorithms can be integrated for users to choose,such as the widely used traditional methods,or other machine learning methods;(4)more computing devices can be supported.For example,we plan to transplantthesoftwaretoRaspberrypiandcombineitwithmobileseismic detection instruments to realize real-time seismic data analysis.

Data and resource

The SWPHDS is an open-source software,and it is available on https://github.com/mingzhaochina/aicwps-pspicker soon,if you would like to deploy docker container yourself,the docker image to process data(2 GB)is available upon request.

Author contributions

Hao Chang and Jiahui Ma developed the software user interface.Ming Zhao built the docker image for data processing and conducted the test experiment.

Acknowledgement

Thanks for the data and technical support provided by the Institute of Geophysics,China Earthquake Administration.Thanks to Dr.Weiqiang Zhu for sharing PhaseNet code.Thanks for the helpful discussion with Dr.Miao Zhang.This study is jointly sponsored by the Basic Scientific Research Fee of Institute of Geophysics,China Earthquake Administration (DQJB19A0114) and the National Natural Science Foundation of China(41804047).