The Data Processing of the LAMOST Medium-resolution Spectral Survey of Galactic Nebulae (LAMOST MRS-N Pipeline)

2022-08-01 01:47ChaoJianWuHongWuWeiZhangYaoLiJuanJuanRenJianJunChenChihHaoHsiaYuZhongWuHuiZhuBinLiandYongHuiHou

Chao-Jian Wu , Hong Wu, Wei Zhang , Yao Li, Juan-Juan Ren , Jian-Jun Chen, Chih-Hao Hsia ,Yu-Zhong Wu, Hui Zhu, Bin Li, and Yong-Hui Hou

1 CAS Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China; chjwu@bao.ac.cn

2 National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China

3 School of Astronomy and Space Science, University of Chinese Academy of Sciences, Beijing 100049, China

4 CAS Key Laboratory of Space Astronomy and Technology, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China

5 State Key Laboratory of Lunar and Planetary Sciences, Macau University of Science and Technology, Taipa, Macau, China

6 Purple Mountain Observatory, Chinese Academy of Sciences, Nanjing 210008, China

7 University of Science and Technology of China, Hefei 230026, China

8 Nanjing Institute of Astronomical Optics & Technology, National Astronomical Observatories, Chinese Academy of Sciences, Nanjing 210042, China

Received 2021 October 9; revised 2022 April 27; accepted 2022 May 18; published 2022 June 17

Abstract The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST)medium-resolution spectral survey of Galactic Nebulae(MRS-N)has conducted for more than three years since 2018 September and observed more than 190 thousand nebular spectra and 20 thousand stellar spectra.However,there is not yet a data processing pipeline for nebular spectra. To significantly improve the accuracy of nebulae classification and their physical parameters,we developed the MRS-N Pipeline. This article presented in detail each data processing step of the MRS-N Pipeline, such as removing cosmic rays, merging single exposure, fitting sky light emission lines, wavelength recalibration,subtracting skylight,measuring nebular parameters,creating catalogs and packing spectra.Finally,a description of the data products, including nebular spectra files and parameter catalogs, is provided.

Key words: surveys – catalogs – methods: data analysis – ISM: general

1. Introduction

The Large sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) medium-resolution spectral survey of Galactic Nebulae (MRS-N) , (Wu et al. 2020, 2021), as a subproject of the Medium-Resolution Spectral Survey (MRS, Liu et al.2020),mainly relies on the LAMOST(also known as Guo Shou Jing Telescope), which is the first astronomical largescale scientific facility of China (Wang et al. 1996; Su & Cui 2004)Cui et al.2012;Zhao et al.2012)and has achieved great success in many astronomical fields (Wu et al. 2010a, 2010b;Liu et al. 2020; Gao et al. 2014; Liu et al. 2015; Karoff et al.2016;Luo et al.2016;Wang et al.2017b;Ren et al.2018b;Liu 2019; Liu et al. 2019; Gu et al. 2019; Wu et al. 2016, 2020;Tian et al. 2015; Xiang et al. 2015a; Huang et al. 2016; Liu et al. 2017a, 2017b; Tian et al. 2017; Wang et al. 2017a; Li et al.2018;Tian et al.2018;Wang et al.2018;Xu et al.2018;Yu & Liu 2018; Zhao et al. 2018; Li et al. 2019; Tian et al.2019;Wang et al.2019b,2019a;Shen et al.2016;Huang et al.2019;Wang et al.2021),to observe the nebulae(including the H II regions, Herbig-Haro objects, supernova remnants, planetary nebulae) in the northern Galactic Plane (GP). From 2018 October to now, MRS-N has been conducted for three years and obtained more than 190 thousand medium resolution spectra of Galactic nebulae. It is one of the largest nebular surveys in the world (Wu et al. 2021).

Several pipelines have been developed to reduce the LAMOST raw data. LAMOST 2D Pipeline (Luo et al. 2015)is used to extract spectra from CCD images, like subtracting dark and bias, correcting flat field, extracting spectra,calibrating wavelength, subtracting sky light, merging spectra,etc., and do flux calibration, while 1D Pipeline (Luo et al.2015), which is used after 2D Pipeline, works on the classification and parameter measurement of stars, galaxies or quasars(QSOs).Xiang et al.(2015b)introduced the LAMOST stellar parameter pipeline at Peking University (LSP3), which is parallel to the LAMOST 1D Pipeline. The main function of LSP3 is to determine the radial velocities (RVs) and stellar atmospheric parameters, such as Teff, log g and [Fe/H].However,the LAMOST 1D Pipeline and LSP3 are mainly used to measure stellar parameters.They are both not suitable for the data of MRS-N. An MRS-N spectrum contains dozens of strong sky light emission lines,nebular emission lines and faint continuum.Unlike the stellar spectra,it is difficult to reduce the sky light effectively for MRS-N spectra. Especially in a large scale nebular region, it is very difficult to find the spectrum of pure sky light. That is to say, the sky light of MRS-N data cannot be reduced with the regular method in the LAMOST 2D Pipeline. Moreover, the measurement of nebular parameters is also different from the measurement of stellar spectra. Specific algorithms are required. Then we developed a new pipeline,which was named MRS-N Pipeline and should be used by combining with the LAMOST 2D Pipeline,only for the MRSN data.

Figure 1. MRS-N coverage of past three years’ observations. The x-axis represents the Galactic longitude and the y-axis represents the Galactic latitude. The blue circles indicate observations of the first year,green and red circles indicate observations of the second and the third year,respectively.The yellow circles show the four specific areas.

In this paper, we first introduce the MRS-N observations of past three years in Section 2. Section 3 describes merging single exposure, wavelength calibration, subtracting sky light and measuring nebular parameters in detail. The data products are presented in Section 4. Finally, Section 5 gives a brief summary.

2. Observation

From 2018 October to now, we have completed more than three years of MRS-N observations. The observations in the three years are from 2018 October to 2019 January, 2019 November to Mar. 2020 and Oct. 2020 to Mar. 2021,respectively. As described in Wu et al. (2021), MRS-N observations should be carried on at moonless time. Due to the weather and some irresistible reasons, only 9 days were suitable for observing and finally 12 plates were finished in the first year. We have optimized the observation strategy comparing with the first year, then 31 plates were observed in the second year. Among the 31 plates, 20 observed plates were covered into a united area included Rosette Nebula and NGC2264(Ros area;Wu et al.2021),two plates were covered in the Westerhout 5 area(West)and the left 9 were observed to cover the GP area. In the second year, we finished a complete specific area, the Ros area. In the third year, mainly due to the weather,the number of observed areas decreased.Only 11 days were suitable for observing and finally 17 plates were finished in the third year.

Figure 1 shows the three years’ coverage of MRS-N. The blue circles indicate observations of the first year, the green circles indicate the second year’s observations and the red circles indicate the third year’s observations.The yellow circles show the four specific areas with names marked. All the observed MRS-N data will be processed by using the pipeline described below.

3. Methodology

As described in Wu et al.2021,the data processing of MRSN is different from the stellar spectra reduction.The LAMOST 1D pipeline (Luo et al. 2012, 2015) and LSP3 (Xiang et al.2015b) are not suitable for MRS-N data. So we developed the MRS-N Pipeline. The MRS-N Pipeline includes removing cosmic rays,merging single exposure,fitting skylight emission lines(Ren et al.2021),subtracting skylight(Zhang et al.2021),wavelength recalibration, parameters measurement of nebular emission lines, creating catalogs and packing spectra. Figure 2 illustrates a flowchart of MRS-N Pipeline.

In Figure 2,the two gray boxes above the dashed line,which does not belong to the MRS-N Pipeline, are the steps from LAMOST official 2D pipeline. The LAMOST 1D MRS-N spectra, reduced by the LAMOST 2D pipeline without subtracting sky light and correcting flat fields, are called the MRS-N raw data in this work (yellow box in the flowchart).After removing cosmic rays (or the false sharp emission lines)and merging for the MRS-N raw data, the MRS-N combined spectra are obtained. For those MRS-N combined spectra without equatorial coordinates, the coordinates can be calculated by using the pixel coordinates in the 2D images.MRS-N Pipeline provides a linear relationship through the known equatorial coordinates and known pixel positions of the spectra. By using the relationship, the coordinates of those spectra with only pixel positions can be estimated.The purpose of this step is to improve the utilization of fibers.After this step,we get the MRS-N prepared spectra. By measuring the sky lines of the MRS-N prepared spectra, the wavelength can be recalibrated with the method of Ren et al. (2021). After subtracting the sky background for the wavelength recalibrated spectra, the MRS-N final spectra will be obtained. Then some information generated during data reduction is added to fits(Flexible Image Transport System) header. The last step of MRS-N Pipeline is nebular parameters measurement.

3.1. Merging Single Exposure

Figure 2. The MRS-N Pipeline flowchart.

Figure 3.An example of merging single exposure.From the two lower left panels,it is clear that the S/N of merged Hα(left)is improved significantly.The red dotted lines in each panel represent nebular emission lines. It shows that the false sharp emission lines in Single exposure 1 (red ellipse) has been effectively eliminated.

An MRS-N plate contains three single exposure spectra,each of which is exposed for 900 s. The first step of MRS-N Pipeline is to coadd the three single exposure spectra.The main purpose of merging spectra is to improve the signal-to-noise(S/N) ratio. The common merging methods are median merging and mean (sum) merging. The advantage of median is that it can effectively remove the cosmic rays, but the disadvantage is that the S/N is lower than the mean measurement. Mean merging gives a narrower (less noisy)distribution than merging by median,though both substantially reduced the width of the distribution. The conclusion so far is that combining by averaging is mildly preferable to combining by median.Computationally,the mean is also faster to compute than the median.

In MRS-N Pipeline, we used the sum merging method. The wavelengths of the three single exposure spectra are first aligned. The probability of cosmic rays falling in the same position is almost zero. Based on this condition, for the three points (e.g., x1, x2, x3) of three single exposure spectra at the same wavelength, we first select the maximum value (suppose x1=max(x1,x2,x3)).Then the variance of the remaining two values (x2 and x3) can be calculated. When x1 is greater than 3×variance (x2, x3), x1 will be considered to be mainly affected by cosmic rays or the false sharp signal and will be replaced by the mean (x2, x3). Otherwise, there is no cosmic ray or false sharp signal here.The three points can be summed directly.After testing,the cosmic rays and false sharp emission lines can be removed effectively with this method.The result of an example is shown in Figure 3.As can be seen from Figure 3,the false sharp emission lines are removed effectively and the S/N ratio has been significantly improved.

3.2. Wavelength Recalibration

The sky light emission lines, which are mostly from the Meinel rotation-vibration bands of OH (Meinel 1950), are ubiquitous in spectra of stars, galaxies or nebulae (Osterbrock& Martel 1992; Osterbrock et al. 1996); the sky light usually exhibits rich emission lines and a weak continuum in the optical spectra. The fainter the observed objects, the more serious the sky light lines contaminate its spectra.Therefore,in nebular spectra,the sky light emission lines appear particularly prominent. Because of their narrow spectral line widths and fixed central wavelengths, the sky light emission lines can be well used as comparison lines to correct the zero-point of spectral wavelength.

Figure 4. The topmost panel shows an example spectrum (red band) of MRS-N with many skylight lines (red dotted lines) and nebular emission lines (blue dotted lines).Seven of the skylines are fitted with Gaussian function and the fitting results are presented in the middle upper panel.The middle lower panel shows an example of the fitted RV calibration function. The seven black dots represent the fitted mean RVs of the seven single sky emission lines and the gray dot corresponds to the extrapolated value at 6731 Å.The red curve represents the fitted RVs calibration function.The lowest panel gives the mean values of residuals(gray dotted horizontal line) and the 1σ standard deviations (gray dashed lines).

As described in Section 3, the MRS-N raw data are wavelengths calibrated with calibration lamp but without subtracting sky light and correcting flat fields. However, the lamp spectra and scientific spectra are not observed at the same time,which means that the instrument status may have changed during the two observations.Therefore,strictly speaking,it will introduce instrument uncertainties when doing calibration with lamp spectra. This problem can be perfectly solved when the scientific spectra are calibrated with skylight emission lines.Take the red band spectrum of MRS-N as an example, the topmost panel of Figure 4 shows a spectrum with the sky light emission lines and nebular emission lines. These sky light emission lines and nebular emission lines are observed at the same time under the same instrument status. Seven of the skylines, λ6287 Å, λ6300 Å, λ6363 Å, λ6498 Å, λ6533 Å,λ6544 Å and λ6553 Å, are fitted with Gaussian function and the fitting results are presented in the middle upper panel of Figure 4. The central wavelengths of the sky light emission lines in scientific spectra should be the same as the theoretical values. So, by fitting the sky light emission lines, Ren et al.(2021), which is an important part of the MRS-N Pipeline,provided a calibration function (f(λ)=aλ2+bλ+c, λ is the wavelength in unit of μm,a,b and c are the three indexes fitted with sky light lines)by using this method to correct the RVs of scientific spectra in real time.The middle lower panel of Figure 4 shows an example of the fitted RV calibration function. The seven black dots represent the fitted mean RVs of the seven single sky emission lines and the gray dot corresponds to the extrapolated value at 6731 Å. The red curve represents the fitted RVs calibration function. The lowest panel gives the mean values of residuals (gray dotted horizontal line) and the 1σ standard deviations (gray dashed lines). Ren et al. (2021)concluded that the systematic deviation can be effectively corrected with this method, especially for the Hα and [N II]λλ6548, 6584. More detailed can be seen in this reference.

3.3. Subtracting Sky Light

Figure 5. An example of before and after subtracting sky light. Left panel: Red curve represents the Gaussian fitting of Hα before subtracting sky light, blue curve represents the Gaussian fitting of sky light.Right panel:Red line represents the Gaussian fitting of Hα after subtracting sky light.The red dotted lines in the two panels show the centroids of nebular Hα emission line.

Sky light subtraction is necessary for the ground-based spectrograph. The traditional method (Soto et al. 2016) of removing sky light is not suitable for MRS-N data.Zhang et al.(2021)introduced a method of subtracting sky light for spectra of MRS-N based on the relation of I(Hαsky)/I(λ6554) and solar altitude, which is the angle of the Sun relative to the Earth’s horizon.I(λ6554)is the flux of the OH at 6554 Å which is from the Earth’s atmosphere.Zhang et al. (2021) concluded that I(Hαsky)/I(λ6554) and solar altitude have the following relationship:

3.4. Measurements of Nebular Parameters

All the physical parameters, such as RVs, full width at half maximum (FWHM), line intensity, etc., are measured by the Gaussian fitting method, which is widely utilized in spectra of the Sloan Digital Sky Survey (SDSS) (Rebassa-Mansergas et al. 2016) and LAMOST spectra (Ren et al. 2018a). We first fit all the science spectra with a second-order polynomial plus a single-Gaussian line profile. According to the χ2and fitting error, the spectra with larger χ2value and fitting error will be fitted again by the function of a second-order polynomial plus a double/triple-Gaussian line profile (Ren et al. 2021). Figure 6 shows three examples of the single-Gaussian fitting results of Hα,[N II]and[S II]emission lines which are from one MRS-N nebular spectrum.

4. Data Products

Until now, more than 190 thousand nebular spectra have been observed in MRS-N. The data processing is in progress.The data products generated by the MRS-N Pipeline contains the following:

Nebular Spectra: The single exposure spectra and coadded spectra are all stored in fits format. They are all wavelengthrecalibrated and sky-subtracted spectra.In these spectra,there are no spectra of stars.The stellar spectra observed in MRS-N will be processed by LAMOST 2D and 1D pipeline or LSP3.Because there are fewer nebular emission lines at the blue band of MRS(Wu et al.2021),these spectral data are mainly for the red band,which covers the wavelength range 6300 Å–6800 Å with R ∼7500.The fits files of single exposure spectra are named in the form of spec-XXr-PID-YYYYMMDD-N.fits, where XX represents the spectrograph number(between 01 and 16),PID is the name of observed plate, YYYYMMDD shows the date of observation, N indicates the number (between 01 and 03) of single exposure. The structure of a single exposure spectrum is the same as the structure of LAMOST raw data(see http://dr1.lamost.org/doc/data-production-description#toc_3).

Figure 6. The Gaussian fitting of three nebular emission lines. All the nebular parameters are from the fitting results.

Figure 7. The structure of a coadded spectrum.

The names of coadded spectra are in the form of sumspec-XXr-PID-YYYYMMDD.fits. The string XX, PID and YYYYMMDD have the same meanings as above. Figure 7 shows the structure of a coadded spectrum. From Figure 7, we can see that each fits file has four extensions (EXTEN0,EXTEN1, EXTEN2 and EXTEN3). EXTEN0, EXTEN1 and EXTEN3 represent the relative flux, recalibrated wavelength and invert variance(Luo et al.2015)of EXTEN0,respectively.EXTEN2 shows the information of observed targets. Each extension includes 250 rows,indicating the spectral data of 250 fibers mounted on every spectrograph.

Nebular Parameters Catalog: The catalogs of nebular parameters are also stored in fits format. Each catalog contains 41 columns.Table 1 shows the detailed description for each column.

At present,it is difficult to find a large-scale and multi-target nebular spectra survey to compare with MRS-N in the world.Therefore, we cannot make an external comparison for all the MRS-N data. It can be considered that there is little difference about the parameters in a certain region (e.g.,< 30″ ) within a nebula. Based on this condition, we selected the data in S147 region and compared it with the results of Ren et al. (2018b).Moreover, we also made an internal comparison with the data in the same region.

External comparison: In S147 region, by cross matching with the data of Ren et al. (2018b), about 480 MRS-N spectra were obtained.We compared the parameters of matched ∼480 targets with that of Ren et al. (2018b). For this comparison,RVs and line intensity ratio of Ren et al. (2018b) are from the LAMOST low resolution spectra. Similar to Figures 9 and 10 of Ren et al. (2018b), we also gave the histogram distribution of RVs and line intensity ratio(see Figure 8).The RVs of Hα,[N II] and [S II] peak at ∼8.73 with σ=3.91, ∼8.80 with σ=5.18 and ∼9.62 with σ=5.68,respectively.Our results are consistent with the measurements of Ren et al.(2018b)but with lower dispersion. This is obvious because our spectral resolution is higher. About the line intensity ratios, the peak(∼1.41) of [S II]λ6717/ [S II]λ6731 in this work is consistent with that(∼1.35)of Ren et al.(2018b).However,the peaks of Hα/ [N II]λ6584 and Hα/[S II]λλ6717,6731 in our work are∼2.27 and ∼1.60, which are larger than the results given by Ren et al. (2018b) and with larger dispersion. In fact, they are still consistent within 1σ range.One reason of larger dispersion may be that the two methods of subtracting skylight (mainly affects Hα emission line) are different; Compared with the method of Ren et al. (2018b) (selecting the dark area among S147 as the sky light background), our method of subtracting sky light is more reasonable. So we prefer to make a preliminary judgment that the wider dispersion may be caused by the physical factor, but it is not observed in the low resolution spectra.

Table 1 The Description of MRS-N Parameters Catalog

Figure 8. The results of external comparison with Ren et al. (2018b).

Internal comparison: We randomly divided the MRS-N spectra within the S147 region into two groups(the two groups are evenly distributed in the S147 region) for comparison.Unlike stellar spectra, the comparison here is for the spectra obtained in the nearby but not the same coordinates. Figure 9 demonstrates the results of comparison. In Figure 9, the top three panels show the distribution of RVs for Hα, [N II] and[S II]; the middle three panels represent the distribution of FWHMs for Hα,[N II]and[S II];the bottom three panels give the distribution of line intensity ratio for Hα/[N II]λ6584,Hα/[S II]λλ6717,6731 and[S II]λ6717/[S II]λ6731.The distributions of the two groups were marked with red and blue lines respectively. The red and blue dashed lines represent the Gaussian fitting results of two groups. This comparison result can qualitatively show that the MRS-N results are consistent to a certain extent.

5. Summary

As one of the largest nebular spectra surveys on the northern GP, LAMOST MRS-N has conducted for three years (since Oct.2018)and accumulated more than 190 thousand mediumresolution nebular spectra.In order to make it easier for users to understand and use data,we developed the MRS-N Pipeline for the reduction of MRS-N data. The MRS-N Pipeline should be used in combination with the LAMOST 2D Pipeline.It mainly includes removing cosmic rays, merging single exposure,fitting sky light emission lines, wavelength recalibration,subtracting skylight, measuring nebular parameters, creating catalogs and packing spectra. In addition, to improve the utilization of fibers, the fiber without coordinates can also be recalibrated. By using the MRS-N Pipeline, the accuracy of nebulae classification and the measurements of nebular physical parameters can be significantly improved.

Figure 9.The results of internal comparison within the S147 region.The spectra in the S147 region were divided into two groups evenly.The distributions of the two groups were marked with red and blue lines respectively. The red and blue dashed lines represent the fitting results of two groups.

Acknowledgments

This project is supported by the National Natural Science Foundation of China (Grant Nos. 12073051, 12090041,12090040, 11733006, 11403061, 11903048, U1631131,11973060, 12090044, 12073039, 11633009 and U1531118),and the Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences,and the Key Research Program of Frontier Sciences, CAS(Grant No. QYZDY-SSW- SLH007).

C.-H.Hsia acknowledges the supports from the Science and Technology Development Fund, Macau SAR (file No. 0007/2019/A)and Faculty Research Grants of the Macau University of Science and Technology (No. FRG- 19-004-SSI).

Guo Shou Jing Telescope(the Large Sky Area Multi-Object Fiber Spectroscopic Telescope LAMOST) is a National Major Scientific Project built by the Chinese Academy of Sciences.Funding for the project has been provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatories,Chinese Academy of Sciences.

ORCID iDs

Chao-Jian Wu https://orcid.org/0000-0003-3514-6619

Wei Zhang https://orcid.org/0000-0002-1783-957X

Juan-Juan Ren https://orcid.org/0000-0003-3243-464X

Chih-Hao Hsia https://orcid.org/0000-0003-2549-3326