De-Jian LiuYe XuYing-Jie LiZe-Hao LinShuai-Bo Bianand Chao-Jie Hao
1 Purple Mountain Observatory,Chinese Academy of Sciences,Nanjing 210008,China; xuye@pmo.ac.cn
2 School of Astronomy and Space Science,University of Science and Technology of China,Hefei 230026,China
Abstract The Five-hundred-meter Aperture Spherical radio Telescope (FAST) is the most sensitive ground-based,singledish radio telescope on Earth.However,the original H I spectra produced by FAST are affected by standing waves.To maximize the power of FAST for high-sensitivity observations,we proposed an algorithm that combines fast Fourier transforms and extreme envelope curves to automatically correct the baselines of FAST H I spectra and remove standing waves from the baselines.This algorithm can reduce the amplified noise level caused by standing waves to a near-ideal level without losing signals or introducing false signals.The root mean square of the average baseline reaches ∼8 mK,approaching the theoretical sensitivity of an H I spectrum produced by FAST for an integration time of 335 minutes,i.e.,∼6 mK.
Key words: methods: observational–techniques: spectroscopic–telescopes
The Five-hundred-meter Aperture Spherical radio Telescope(FAST),located in Guizhou Province of Southwest China,is the world’s largest single-dish radio telescope.It is an important facility for surveying neutral hydrogen up to the edge of the universe,detecting weak space signals,hearing possible signals from other civilizations,etc.(Nan et al.2011;Qian et al.2020).Although it has produced significant scientific achievements(Qian et al.2020),the baselines of the original H I spectra are not flat enough and they contain massive standing waves that might be generated by reflections between the dish and the receiver cabin(Jiang et al.2020).Although efforts have been made to minimize the standing wave effects of FAST data (Jiang et al.2020),standing waves still exist in the spectra.Standing waves can amplify the noise level of a signal;e.g.,ripple amplitudes of∼15 mK are commonly seen(Jiang et al.2020),which cause the noise of the obtained high-sensitivity spectra to be higher than the theoretically predicted noise.Some studies usually require extreme high-sensitivity observations during the analysis of the spectra;e.g.,searching for the stellar winds,compact highvelocity clouds and active star-forming dwarf galaxies (Lizano et al.1988;Giovanardi et al.1992;Burton et al.2001;Salzer et al.2002,Li et al.2022).A high-precision baseline correction method is hence necessary for studies that require extreme high-sensitivity spectra.
Polynomial fitting and trigonometric function fitting methods are commonly used to make baseline corrections(Gan et al.2006;Baek et al.2011).These methods usually need to cut peaks from the original spectrum and estimate the baseline using a polynomial or trigonometric function.However,these methods can be ineffective if the baseline is complex or the format of the function is not good enough.The asymmetrically reweighted penalized least squares (arPLS) algorithm,developed from penalized least squares methods (Eilers2003;Carlos Cobas et al.2006;Zhang et al.2010;Baek et al.2015),is a widely used baseline correction method for FAST data(Wang et al.2022;Zhang et al.2022).The baseline can be estimated by changing the “weight” parameter iteratively.Similar weights are assigned to baseline regions without peaks,while no weights or small weights are assigned for peaks;once assigned,the weights gradually reduce as the level of the signal increases.However,arPLS is not good at removing standing waves.
To correct an inclined baseline and remove standing waves from the original H I spectra automatically produced by FAST,an algorithm combining fast Fourier transforms (FFTs) and extreme envelope curves(EECs),called FFTEEC,is proposed in this work.
FAST is equipped with a 19 beam receiver and dual linear polarizer (i.e.,XXandYY).The full bandwidth of theL-band is 500 MHz over the frequency range 1.0–1.5 GHz.The frequency resolution of the high-resolution modes is 476.84 Hz,corresponding to a velocity resolution of ∼0.1 km s−1@ 1.4 GHz.The beam size is 29,and the pointing error is ∼02 (Li et al.2018;Jiang et al.2019,2020).The data used in this paper consist of H I spectra of G176.51+00.20 observed on 2021 August 19 and 20,with the 19 beam tracking observing mode,and the total integration time of the data was 335 minutes with a sampling rate of one second.We resampled the data to a velocity resolution of 0.1 km s−1,and only considered data in the velocity range−7000–7000 km s−1,i.e.,140,001 channels.The data displayed in this paper are from theYYpolarization of Beam M01.
Figure1displays the original spectra after correcting the flux and velocity(Jiang et al.2020).In the left panel,the nonuniform waterfall image indicates that the baselines of the different original spectra are inconsistent,and the average spectrum with massive standing waves is inclined.The right panel shows fringes in the waterfall image after roughly correcting the inclined baseline with the polynomial fitting method.The standing waves are obvious and unstable(see the partially enlarged view).The arc-shaped standing waves in the 2D waterfall image mean that the phases of the standing waves drift with time and the periods of the standing waves in different spectra are inconsistent at the same time.
Figure 1.Left: The waterfall image and average spectrum of original FAST spectra.The waterfall image,which contains 156 spectra,is non-uniform,which arises from the baselines of different original spectra being inconsistent.The baseline is inclined and contains massive standing waves with different frequencies.Right:Waterfall image after removing the polynomial-fit baseline and an enlarged image.The standing waves in the waterfall image are arc-shaped,indicating that they drift with time.The periods of standing waves are not stable in the different spectra.
Due to the irregular phases and periods of standing waves,as well as huge amounts of data produced by FAST,e.g.,6.0 TB for our data,we propose a highly precise baseline estimation algorithm,FFTEEC,to correct baselines automatically.First,the original spectrum is preprocessed by a polynomial fitting method.Second,the standing waves are extracted and removed using FFTs.Third,the EEC method is employed to calibrate the unsmooth parts in the baseline.Fourth,the extracted signal is combined with the baseline obtained in the third step as the result.Figure2displays the whole pipeline of the algorithm.
An FFT is a fast algorithm that computes discrete Fourier transforms (DFTs,Cooley &Tukey1965).The DFT of sequencexwith lengthNcan be expressed as
Any frequency with a large amplitude in the frequency domain,X,can be considered as a standing wave.A new sequence,Y,is obtained by sortingXfrom high to low and removing the first several items.
An EEC,ye,can be obtained with the following steps.First,get the smoothed sequenceyfromY,and the local maximum and minimum are extracted fromy.Second,the maximum and minimum envelope curve ofy,ymaxandymin,respectively,can be obtained by fitting the local extrema with a linear interpolation method.Thus,yecan be expressed as
To extract a complete signal,we propose an iterative method to cut the signal automatically.The initial signal range is obtained from the EEC,whereyeis greater than 20σ of the smoothed baseline.The signal range needs to be extended if the difference between the wing of the smoothed signal and the baseline is greater than 1σ until it is smaller than 1σ.The method is displayed in Figure2as a subprocess that is framed by a box.
Figure 2.Pipeline of FFTEEC.The figure corresponding to each step is the average spectrum after the step is processed.
Figure 3.Left:Results processed by different methods.The blue line shows the FFTEEC result,where the zero-point is 1,and the green line is the arPLS result.The flat baselines indicate that both methods can be used to correct inclined baselines.Right:Simulated standing waves and residuals from different methods.The blue line is the FFTEEC result and the green line is the arPLS result.The orange line is the simulated standing wave.The spectra have been separated by 1 on the y-axis to aid presentation.The profiles of the residuals of arPLS and FFTEEC are consistent with the simulated baseline.Standing waves are obvious in the residuals of FFTEEC,but unclear in the residuals of arPLS since the random noise is very high.
The simulated data considered here contain the pure analytical signalp,a simulated baseline,b,and random noise,j,3The random noise was generated by the random number generator numpy.random.randn from the Python language.which can be expressed as
andbcontains the residuals extracted from the real data by the method in this paper.
Figure3displays the results and residuals processed by FFTEEC and arPLS,respectively.4We used the C++arPLS software package provided by Ganriel Kronberger to speed up the calculations. https://github.com/heal-research/arPLS.Both methods can be used to correct the inclined baseline,since the baselines in the left panel are sufficiently flat.The baseline obtained by arPLS is better than that obtained by FFTEEC on the two ends of the spectrum.In the right panel,although the profile of the residuals of arPLS is comparable to the simulated baseline,it is hard to judge whether the standing waves have been removed since the noise level of the residuals is very high.In contrast,the residuals from FFTEEC are remarkably consistent with the simulated baseline,not only with the profile,but also the standing waves.
Figure 4.Results for simulated data.Top: The orange line is the simulated signal and the blue line is the spectrum processed by FFTEEC.Middle: An enlarged figure of the top panel.Bottom: The residuals between the simulated signal and our result.The profile of the simulated signal is similar to the results produced by FFTEEC in the top and bottom panels.The residuals are flat in the bottom panel,indicating that the standing waves have been removed from the baseline and we did not introduce any false signals or lose signals.
Figure4displays a comparison between a simulated signal and spectra processed by FFTEEC,and it can be seen that all profiles are consistent (see details in the top panel).To show the differences more clearly,we have provided a partially enlarged view in the middle panel,and residuals are displayed in the bottom panel.The signals in the spectra processed by FFTEEC are consistent with the simulated signal,and the residuals between them are flat.In the simulated data,the width of the 3σ signal is 60.2,while that obtained by the automatic signal extraction method is 84.8,which is wider than the signal.
The root mean square error (RMSE) can be applied to illustrate the difference between two spectra,which can be expressed as
whereNis the length of the sequence,andyand~yare the signals to be compared.The RMSE between the simulated signals and the spectra processed by FFTEEC is 0.10,and that between the simulated signals and the simulated signals after random noise injection is also 0.10,indicating that there is only random noise in our result.As highlighted by the visual effect shown in Figure4and RMSE values of the simulated data,our method does not lose signals or introduce false signals.
Figure 5.FAST baseline in the frequency domain.There are three obvious peaks in the blue line,corresponding to the frequencies of standing waves.The peaks have been removed in the orange line,indicating that our method can effectively extract standing waves.
The root mean square (rms) can also be employed to judge the effectiveness of our method,which can be expressed as
whereNis the length of the sequence andyis the baseline after removing the signal.We compared the rms noise levels between the random noise of simulated data and our results,and the ratio between them is 1.12,indicating that our method can get a near-ideal rms noise levels.
We applied a 10th order polynomial fitting method to correct the inclined baseline preliminarily.However,the baseline is roughly flat after polynomial correction,but still bumpy,and there are massive standing waves with different periods in it.
Figure5presents the baseline of FAST in the frequency domain processed by FFT.The blue line is the original data,and there are three distinct standing waves with different frequencies.The orange line is the result where the standing waves have been removed.Here,we find that the frequencies of the standing waves of most FAST data are relatively consistent and contain three different frequencies,as shown in the figure after processing the data from different beams.However,some standing waves appear to have additional frequencies;e.g.,theXXpolarization of Beams M10,M13 and M15,theYYpolarization of Beams M06,M07 and M17,etc.Taking the first 20 orders can remove almost all standing waves in the different data sets,whether the standing waves contain three frequencies or more.Thus the order of the FFT algorithm usually can be assigned to 20,indicating removal of the top 20 items with the largest amplitudes in the frequency domain.This number was obtained after experimenting using various FAST data sets.
Figure6displays spectra corrected by FFTEEC and arPLS,in the left and right panels respectively.In the left panel,the standing waves have been removed from the spectrum,since there is only random noise in the waterfall image.Meanwhile,FFTEEC is stable for the different spectra,since the average spectrum is so flat that it is hard to find any obvious standing waves.The result processed by arPLS is shown in the right panel;no matter in the waterfall image or average spectrum,standing waves are obvious,indicating that arPLS can just correct the inclined baseline and is not good at removing standing waves in the original H I spectra produced by FAST.
Figure 6.Left: Results of FFTEEC.FFTEEC can remove the standing waves in the original H I spectra produced by FAST,since there is only random noise in the waterfall image,and the average spectrum is flat.Right: Results of arPLS.Standing waves are obvious in the figure,whether in the waterfall image or the average spectrum,indicating that arPLS is not good at removing standing waves.
Table1lists the rms noise of the average spectrum of five groups,i.e.,theXXpolarization of Beams M02 and M03 and theYYpolarization of Beams M01,M06 and M13.The first row is for arPLS and the second row is for FFTEEC.The parameter λ,which is used to control the balance between fitness and smoothness,of arPLS is 1011when real data were processed.Based on the rms estimation method of the average spectrum provided by FAST (see Equation (10) of Jiang et al.2020),the theoretical rms of the average spectrum,integrated for 335 minutes,is 6 ∼8 mK.The rms noise of arPLS is∼20 mK,which is about three times greater than the theoretical rms.As a comparison,the rms noise of FFTEEC is ∼8 mK and approaches the theoretical rms.
Table 1rms Noise of Five Groups of Data
In conclusion,although arPLS can be applied to correct the inclined baseline,it is hard to remove standing waves from the baseline.In contrast,FFTEEC can effectively remove standing waves and obtain a near-ideal rms.There are some shortcomings for FFTEEC,e.g.,it can only be used to correct the baseline when knowing the position of the signal,but it cannot automatically extract the signal as arPLS can.Additionally,sometimes the parameters of FFTEEC affect the results;e.g.,the length of the smoothing box when automatically extracting a signal.
Acknowledgments
This work was funded by the National Natural Science Foundation of China (NSFC,Grant Nos.11933011 and 11873019),the Natural Science Foundation of Jiangsu Province (Grant No.BK20210999) and the Key Laboratory for Radio Astronomy.This work used data from the Fivehundred-meter Aperture Spherical radio Telescope (FAST).FAST is a Chinese national mega-science facility,operated by National Astronomical Observatories,Chinese Academy of Sciences (NAOC).L.Y.J.thanks support from the Entrepreneurship and Innovation Program of Jiangsu Province.
Research in Astronomy and Astrophysics2022年8期