A Catalog of Quasar Candidates Identified by Astrometric and Mid-infrared Methods in Gaia EDR3

2023-03-25 07:36QiqiWuShilongLiaoZhaoxiangQiHaoLuoZhenghongTangandZihuangCao

Qiqi Wu ,Shilong Liao ,Zhaoxiang Qi ,Hao Luo ,Zhenghong Tang ,and Zihuang Cao

1 Shanghai Astronomical Observatory,Chinese Academy of Sciences,Shanghai 200030,China;wuqiqi@shao.ac.cn,shilongliao@shao.ac.cn

2 University of Chinese Academy of Sciences,Beijing 100049,China

3 National Astronomical Observatories,Chinese Academy of Sciences,Beijing 100101,China

Abstract Quasars are very important in materializing the reference frame.The excess emission of active galactic nuclei(AGNs) in the mid-infrared band can be used to identify quasar candidates.As extremely distant and point-like objects,quasars also could be further selected by an astrometry method.Increasing the number of reliable quasar candidates is necessary in characterizing the properties of Gaia astrometric solution and evaluating the reliability of Gaiaʼs own quasars classification.We identify quasars by using appropriate AllWISE[W1-W2]color and different combinations of astrometric criteria.Together with the contamination and completeness,the magnitude,astrometric properties,density distribution,and the morphological indexes of these selected quasars are evaluated.We obtain a quasar candidate catalog of 1,503,373 sources,which contains 1,186,690 candidates (78.9%) in common with the Gaia EDR3_AGN catalog and 316,683 newly identified quasar candidates.The completeness of this catalog is around 80%compared to LQAC5,and the purity of the overall catalog is about 90%.We also found that the purity of quasar candidates selected by this method will decrease in the crowded sky area and the region with less WISE observations.

Key words: (galaxies:) quasars: general–parallaxes–catalogs–astrometry

1.Introduction

Quasars,known as one type of active galactic nuclei (AGNs),are extremely distant and point-like objects.Therefore,quasars are very important in materializing the reference frame.The Third International Celestial Reference Frame (ICRF3),which is the realization of the International Celestial Reference System(ICRS)at radio wavelengths,contains 4588 radio sources obtained with the Very Long Baseline Interferometry (VLBI) (Charlot et al.2020).The European Space Agency’s Gaia mission (Prusti et al.2016),which aims to provide more than one billion accurate determination of proper motions and parallaxes of stars,has already provided more than one million quasar candidates in the visible part of the electromagnetic spectrum.With a comparable accuracy with VLBI,Gaia is dedicated to establishing a new kinematically non-rotating reference frame in the visible wavelengths with its own astrometric measurements of quasars,named the Gaia Celestial Reference Frame (Gaia-CRF) (Mignard et al.2018;Gaia Collaboration et al.2022a).Furthermore,quasars are generally considered as almost zero parallax and proper motion objects,therefore they are vital objects to characterize the astrometric properties,such as the Gaia astrometric solution(Liao et al.2021a,2021b).With these concepts in mind,it is crucial to maximizing the number of quasars in optical wavelengths.

There have been lots of efforts taken to enlarge the number of quasars.Since the first quasar was identified(Schmidt 1963),over the past decades,surveys such as the Large Bright Quasar Survey (Hewett et al.1995),the Hamburg Quasar Survey(Hagen et al.1995),the INT Wide Angle Survey (Sharp et al.2001),the 2DF Quasar Redshift Survey (2QZ) (Croom et al.2004) and the quasars from Solan Digital Sky Survey (SDSS)(Pâris et al.2018;Lyke et al.2020)contributed the majority of the quasars identified in the optical wavelengths.Together with the new data released from Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) (Cui et al.2012;Zhao et al.2012;Ai et al.2016;Dong et al.2018;Yao et al.2019),the number of quasars discovered in recent years increased rapidly.These quasars have been compiled into various of catalogs such as Veron-Cetty&Veron catalog(V&V)(Véron-Cetty &Véron 2010),the Large Quasar Astrometric Catalog(LQAC) (Souchay et al.2009,2012,2015,2017,2019),the Known Quasars Catalog (Liao et al.2019) and the Million Quasars (Milliquas) catalog (Flesch 2015,2017,2021).

In spite of the large number of quasars confirmed by their spectra,the number of quasars is far from enough for the establishment of a high-precision celestial reference frame.Based on the excess emission of the AGNs in the mid-infrared band (Lacy et al.2004;Richards et al.2006),the mid-infrared color criteria selections have been proven to be very effective(Mateos et al.2012;Stern et al.2012;Secrest et al.2015;Assef et al.2018).With mid-infrared data release from the Wide-fieldInfrared Survey Explorer (WISE,Wright et al.2010),Secrest et al.(2015) selected about 1.4 million AGN candidates(MirAGN),which contribute the majority of the quasars used to define the Gaia-Celestial Reference Frame in Gaia Data Release 2 (DR2) (Lindegren et al.2018;Mignard et al.2018).Using only photometric and astrometric data,Bailer-Jones et al.(2019) constructed a supervised classifier based on Gaussian Mixture Models to probabilistically classify extragalactic objects in Gaia DR2,in which 690,000 quasars and 110,000 galaxies candidates are identified.

Gaia Early Data Release 3 (EDR3) (Lindegren et al.2021b)provides provisional astrometric and photometric data for more than 1.8 billion sources based on the observations made by Gaia.To enlarge the number of quasars,together with the MirAGN catalog,17 external quasar or AGN catalogs were cross-matched with Gaia EDR3.These catalogs include both spectroscopically confirmed quasars and quasar candidates,such as quasars from 2QZ(Croom et al.2004),Roma-BZCAT release 5(Massaro et al.2015),R90 (Assef et al.2018),the WISE color-selected AGN catalog(Secrest et al.2015)and SDSS DR14Q(Pâris et al.2018).Machine learning methods also identified lots of quasar candidates,such as Gaia-unwise (Shu et al.2019).These quasar-like objects,1,614,173 objects in total,are available in the Gaia Archive as the tableagn_cross_id(hereafter EDR3_AGN catalog) (Lindegren et al.2021b).

See Table 1,among these 17 external quasar catalogs,we have selected six catalogs with a relatively large number of quasars to investigate the composition of the quasar-like objects in EDR3_AGN catalog.The Gaia-unwise catalog contributes most of the quasar-like objects identified in EDR3_AGN catalog,and considerable parts of these targets overlap with catalogs such as R90,the WISE AGN catalog,and SDSS DR14Q catalog.As very distant objects,non-detectable parallax and proper motion are the basic characteristics of quasars.However,the quasar candidates from the color criteria selection only and the machine learning method may include a large number of false identification objects.For example,for the Gaia-unwise catalog,2,610,583 objects are matched to the Gaia EDR3,however,only 60.1%of them meet the astrometric criteria to be identified as quasar-like objects in EDR3_AGN catalog,see Table 1.So there are many stars and galaxies in the Gaia-unwise catalog,which is probably also the case in the quasar-like objects in EDR3_AGN catalog,especially for the six-parameter solution sample (Liao et al.2021a,2021b).Therefore,among the 1.6 million quasar-like objects identified in EDR3_AGN catalog,only 429,249 objects (Frame Rotator Sources,FRS hereafter) are selected to compute the Gaia-CRF3.Compared to Gaia DR2,the systematic residuals of astrometry have been greatly improved in EDR3.Many studies show that the mean proper motion from the confirmed quasars sample is consistent with zero,and no significant systematic residuals are found in global (Liao et al.2021a,2021b;Fabricius et al.2021).Therefore,the astrometric solution in EDR3 is reliable enough to use in quasar selection.The nondetectable parallax and proper motion feature can purify quasar candidates selected by the mid-infrared method,which has been proven quite effective in our previous quasar selection with APOP (Qi et al.2015) and AllWISE data (Guo et al.2018).

Table 1 Reliability and Contribution of Gaia EDR3 AGN

Table 2 Description of QCC

Furthermore,using particular variability and characteristic Spectral Energy Distributions (SEDs) of quasars,Gaia could identify its own quasar list.In Gaia Data Release 3(DR3),Gaia has performed its own quasar classification with the lowresolution spectral data,Gband magnitude and astrometric parameters(Gaia Collaboration et al.2022b).However,for the quasars with low-resolution spectroscopy,their photometric signatures are not enough to distinguish them from stars.As indicated by Claeskens et al.(2006),at redshiftsz<0.5,z∼2.5 andz>3,the white dwarfs,F stars and red dwarfs have similar colors with quasars.Testing the degree of stellar contamination,i.e.,the quality of quasars from the Gaia-dataonly classifications convincingly will be quite essential.Combining different approaches with the near-zero proper motion and parallax,it will be more feasible to distinguish quasars from stellar objects (Mignard &Klioner 2012).The join quasar candidates selected by Gaiaʼs astrometric data and mid-infrared color data might be the crucial indicator to test the reliability of such quality.

As the first attempt to use Gaia EDR3 astrometric data to select quasar candidates with the combination of mid-infrared data,this paper aims to provide a reliable quasar candidate list with Gaiaʼs own astrometric data and mid-infrared method.These quasar candidates will play an important role in characterizing the properties of the astrometric solution of Gaia (see,eg: Liao et al.2021a,2021b;Fabricius et al.2021),the establishment of the reference frame in optical wavelength(Mignard et al.2018) and the verification of the Gaia quasar catalog identified by its spectroscopy data.

In Section 2,we describe the selection process of quasar candidates.The contamination,completeness,morphological indexes and astrometric properties of our catalog are evaluated in Section 3.In Section 4,we discuss the quasar candidates in the Galactic plane and the study of anomalous quasars.Finally,in Section 5,we summarize our results and give our conclusions.

2.Data and Selection Criteria

2.1.Data Used

The Wide-field Infrared Survey Explorer mission (WISE,Wright et al.2010)is a satellite with a 40 cm aperture launched by NASA in 2009,it has four bands at 3.4,4.6,12 and 22 μm(hereafter referred to as W1,W2,W3,and W4,respectively),and has angular resolutions ofin its four bands,respectively.AllWISE catalog(Cutri et al.2013)is built by combining data from the WISE cryogenic and NEOWISE(Mainzer et al.2011)post-cryogenic survey,which contains the positions,apparent motions,magnitudes and point-spread function (PSF) profile fit information for about 748 million objects.

Gaia EDR3(Lindegren et al.2021b),which was released at the end of 2020,is the early releases of Gaia Data Release 3 (DR3).Gaia EDR3 contains the five parameter(positions,parallaxes,and proper motions) astrometric solution for around 585 million sources and the six parameter (positions,parallaxes,proper motions and pseudo-colors4The pseudo-color is the astrometrically estimated effective wavenumber of the photon flux distribution in the astrometric (G) band,measured in μm−1(https://gea.esac.esa.int/archive/documentation/GEDR3/Gaia_archive/chap_datamodel/sec_dm_main_tables/ssec_dm_gaia_source.html).) astrometric solution for 882 million sources.The magnitude limit is aboutG≈21 mag at the faint end and aboutG≈3 mag at the bright end.In addition,it also provides the two parameter (positions) astrometric solution for around 344 million additional sources.For five parameter and six parameter sources,both position and parallax uncertainties are less than 0.5 mas atG≤20 mag,and about 1.0 ∼1.3 mas atG=21 mag,while the proper motion uncertainties are almost less than 0.5 mas yr−1atG≤20 mag,and 1.4 mas yr−1atG=21 mag.

2.2.Quasar Candidate Selection Criteria

With their restricted locus in mid-infrared color space,AGNs can be separated from stars and normal galaxies (Lacy et al.2004;Stern et al.2005;Richards et al.2006).With the WISE data,Stern et al.(2012) proposed to use the color information[W1-W2]≥0.8 only to select AGNs;Mateos et al.(2012)developed the method of using [W1-W2] and [W2-W3] to define the boundaries of the AGNs located region in midinfrared color space.In general,these two methods agree with each other.We cross-match the Gaia EDR3 AGN catalog with AllWISE catalog and obtain 1,335,906 common sources.Among them,1,186,690 sources have W1-W2 (mag) values greater than or equal to 0.8,accounting for 88.8%.The cumulative histogram of W1-W2 (mag) values is shown in Figure 1,which proves that W1-W2 ≥0.8 is a reliable criterion for selecting quasar candidates in Gaia EDR3.With the LAMOST DR5 data,our previous study also shows that [W1-W2]≥0.8 is a fine balance between low stellar contamination(11.1%) and high completeness (91.4%) (Guo et al.2018).To be consistent with these studies,we decided to adopt [W1-W2]≥0.8 as our mid-infrared color selection criterion.The sources with W1 (or W2) S/N<5 are rejected to ensure the reliability of our selection results.

Figure 1.The cumulative histogram for W1-W2 (mag) values of sources in Gaia EDR3 AGN catalog.The red vertical line is W1-W2=0.8 mag.

For the astrometric criterion,Lindegren et al.(2018)proposed a series of criteria to improve the reliability of the AGN catalog in Gaia DR2.The matched objects were further selected to have parallaxes and proper motions compatible with zero within five times the respective uncertainty.Similar criteria are used in the Gaia-CRF3 quasar selection (Gaia Collaboration et al.2022a).

Therefore,based on these studies,we propose the criteria for the selection of quasar candidates in Gaia EDR3 as follows:

wherebis Galactic latitude,ρ is the radius for the positional matching between Gaia EDR3 and AllWISE.Criterion (ii)selects the objects that have five-parameter or six-parameter solutions;Criterion (iii) takes the global parallax zero-point of EDR3 (−17 μas) (Lindegren et al.2021a) into consideration;Criterion (iv) adopts the proper motion criteria from GaiaCollaboration et al.(2022a),where Cov(μ)−1is the covariance matrix of proper motion;Criterion (v) is designed to avoid dense stars near the Galactic Equator during selection,where it is unreliable to select quasars using the mid-infrared method;Criterion(vi)sets the maximum radius for cross-matching,and the combination of (v) and (vi) can effectively improve the accuracy of cross-matching (Lindegren et al.2018).

With these precepts,we obtain a catalog of 1,503,373 quasar candidates(hereafter as QCC),of which 78.9%are in the AGN catalog of Gaia EDR3,and other 316,683 sources are newly identified quasar candidates.Table 2 shows the details of our catalog,the astrometric parameters are derived from Gaia EDR3,W1 and W2 magnitude are from AllWISE.Figure 2 shows the density distribution in the sky.

3.Characteristic of QCC

In this section,we investigate the completeness and contamination caused by stars and galaxies of the QCC.The morphological indexes of the objects in QCC are calculated to further check the reliability.In addition,we analyze the parallax,magnitude and proper motion distribution of the sources in QCC,and compare them with the AGN catalog in Gaia EDR3.

3.1.Reliability and Completeness

We used the fifth release of the Large Quasar Astrometric Catalog (LQAC-5) (Souchay et al.2019) as a reference to investigate the completeness.To estimate the completeness of QCC,we find 341,987 common sources in LQAC5,Gaia EDR3 and AllWISE.To get a robust sample,we remove sources with bad AllWISE parameters (S/N in W1 and W2<5)to obtain a test catalog with 297,527 reliable sources,see Figure 3.Among them,238,807(80.3%) sources are found in QCC.Table 3 shows the completeness of QCC compared with LQAC5.Since we have used strict astrometric criteria for the identifying of quasar candidates,the completeness of QCC is lower than the test catalog,which is about 80%.

Table 3 Completeness Compared with LQAC5

To evaluate the contamination caused by stars and galaxies,we randomly select a 10°×10°region with a center coordinate of R.A.=255° and Dec.=35°.In this test region,4171 common sources are found in QCC and SDSS DR16(Ahumada et al.2020),among them,3230 sources are identified as quasars in SDSS DR16Q (Lyke et al.2020).For the remaining 941 sources,we checked the SDSS spectrums of them,the result shows that 1,9 and 7 of them are classed as star,galaxy and quasar respectively,with no spectrums from SDSS are available for the rest 924 objects.Therefore,for all sources that have spectral classifications,quasars accounted for 99.7% (3237/3247),while stars and galaxies accounted for 0.3% (10/3247),which shows that our quasar candidate catalog identified by astrometry and mid-infrared methods has a very low proportion of contamination.To establish an accurate celestial reference frame,the sources in our final catalog should be point-like sources,Figure 4 shows the SDSS thumbnails of six candidates in our catalog.Using the same reference region,among the 4171 common sources,3920(93.98%)are found to be point-like objects,5The extended and point-like sources are classified by SDSS morphological data,more details could be found in https://www.sdss.org/dr12/algorithms/classify/.with 251(6.02%)identified as extended sources.

Figure 2.The sky distribution of QCC,using the Hammer Aitoff projection in Galactic coordinate.The cell of this map is approximately 0.84 deg2,and the color shader shows the number of the sources in each cell.

Figure 3.The sky distribution of LQAC5 test catalog,using the Hammer Aitoff projection in Galactic coordinate.

The purity of quasar candidates in a random sky region illustrates the effectiveness of the method.However,the purity may vary widely in different sky regions,just as mentioned in Gaia Collaboration et al.(2022a).Therefore,it is necessary to investigate the purity variation of QCC in the whole sky.To carry out this investigation,we need a high-completeness pure quasar catalog as our reference sample.Gaia Collaboration et al.(2022b)provide a sample of 1.9 million quasar candidates in their Section 8,and the purity of this sample is approximately 95%.We assume that this sample has 100%reliability and 100% completeness,and explore the purity change of QCC by comparing QCC with it.This assumption could indeed be inaccurate,yet convenient to assess how the purity of QCC varies with sky density,Galactic latitude and magnitude.We find that 1,241,033 (82.5%) quasar candidates of QCC are in this pure sample provided by Gaia DR3.Figure 5 shows the purity change with different Galactic latitude andGmagnitude.There is a decrease in the purity of QCC where the sources located near the Galactic Equator and at the fainter magnitude.To explore the correlation between purity change and sky density,we propose the concept of purity index.The purity index indicates the number of common sources between the QCC and the reference catalog divided by the number of sources in QCC in each HEALPix(Gorski et al.1999) sky pixel.See Figure 6,QCC has relatively low purity near the Galactic Equator and in the area of the Large and Small Magellanic Clouds(LMC and SMC).Although this does not accurately represent the purity distribution of QCC in the sky,it has important implications for our better understanding of quasar selections by the mid-infrared and astrometric method.

Figure 4.Six sources of our catalog matching with SDSS DR16.Panel A,B,C: point-like source,panel D,E,F: extended source.

The lower purity of QCC near the LMC,SMC and the Galactic Center is to be expected,as these regions are very crowded.Selection of quasar candidates in these regions may require more stringent criteria.In addition to these areas,we found that the purity of QCC in some striped areas and anti-Galactic center areas is also relatively low,see Figure 6.Fewer AllWISE observations may be the main reason for the low purity in these regions.We can find that the area with fewer observations of WISE W2 band in Figure 76The detection count represents the number of individual 7.7 s exposures on which the source was detected with SNR >3 in the WISE profile-fit measurement.coincides well with the low-purity area in Figure 6.In summary,when selecting quasar candidates using the method mentioned in this paper,the purity of samples will decrease in crowded sky regions and in the regions with fewer WISE observations.

Figure 5.The purity change of QCC.The blue and red line represent the variation of reliability with G magnitude and Galactic latitude,respectively.

3.2.The Morphological Indexes

Inspired by LQAC2 (Souchay et al.2012),we analyzed the morphological indexes of QCC.We apply the photometry function of IRAF to the PSF of each source and compare it to the PSFs of other stars near that source.The optical images used in the calculation process came from SDSS,with a total of four bands from blue to infrared,namelyg,r,i,z,and the corresponding central wavelengths are 477.0,623.1,762.5,and 913.4 nm.The parameters SHARP,SROUND,GROUND determined by IRAF’s DAOFIND provide comprehensive morphological data of each source.The morphological indexes of each source are calculated by Andrei et al.(2012):

whereMPCis the morphological index of each source for the parameterPin the colorC,PQrepresents the parameterPof quasarQ,Psis the mean value of parameter P of the stars on the same SDSS field as quasarQ,σsis the standard deviation of these stars’ parameters.

After matching QCC with SDSS DR16,we obtain 663,087 common sources,among which,39,785 sources are identified as extended sources by SDSS.We randomly selected 18,400 extended sources and 18,400 point-like sources.See Figure 8,we plot the percentage histograms of the SHARP,SROUND,and GROUND7SHARP represents the ratio of,the difference between the height of the center pixel of the PSF and the mean of the surrounding non-bad pixels,to the height of the best fit Gaussian function at that point.SROUND computes the ratio of a measure of the bilateral symmetry of the object to a measure of the four-fold symmetry of the object.GROUND measures the ratio of,the difference in the height of the best fitting Gaussian function in x minus the best fitting Gaussian function in y,over the average of the best fitting Gaussian functions in x and y (https://photutils.readthedocs.io/en/stable/api/photutils.detection.DAOStarFinder.html).morphological indexes of these sources.The median values of the morphological indexes of the extended sources in all four bands is slightly larger than that of the pointlike sources as expected.In addition,both point-like and extended sources in all bands have small morphological indexes,only about 8% of the sources have a morphological index greater than 2,which means most of them are stellar-like sources.

3.3.Parallax,Magnitude and Proper Motion

As mentioned above,there are 1,186,690 sources in QCC that are also present in the Gaia EDR3 AGN catalog,whose astrometric properties can be found in Liao et al.(2021b).Therefore,the astrometric properties of the remaining 316,683 newly identified quasar candidates are worth investigating.Based on this goal,we divided the QCC catalog into two subsamples:the QCC-A subset consisting of 1,186,690 quasar candidates already identified by EDR3,and the QCC-B subset consisting of the newly identified 316,683 quasar candidates.Among QCC-B subset,there are 113,186 (36%) five-parameter sources,and 203,497 (64%) six-parameter sources.After cross-matching QCC-B with other AGN catalogs,we found that 106,928 sources have been identified as quasars (or quasar candidates),and the remaining 209,755 sources are newly identified.

The parallax,Gmagnitude and proper motion distribution can be found in Figures 9,10 and 11,respectively.The sources in QCC-B populate the dimmer end.The median of magnitude of the QCC-B sample is 20.49 mag,while for the QCC-A sample,the median of magnitude is 20.00 mag.The average proper motion and parallax are shown in Table 4.The mean parallax and μδof QCC-B are significantly different from other quasar candidates,and the standard deviations are all very large.One possible reason is that these sources are fainter and less observed,see Figure 12.As we proposed in Liao et al.(2021b),the number of good CCD observations along-scan greatly affects the astrometric solution of quasar candidates and cause a bias in the proper motion,especially for the sixparameter sources.Figure 13 shows the generalized moving mean8The generalized moving mean used the neighboring points on the celestial sphere to smooth each point by using a generalized weighting function(Bucciarelli et al.1993).To be compared with the smoothed maps and median parallax plot in Gaia EDR3(Lindegren et al.2021b;Fabricius et al.2021),the generalized moving mean also calculated each source in a 5°radius region and with more than 50 objects.(GMM hereafter) parallaxes of sources in QCC,the parallax distribution of sources in QCC-A is relatively uniform,but for QCC-B,sources located within±30°of the ecliptic plane show significant parallax bias.We find most of these sources have less than 200 good AL observations,see Figure 14.

Another cause of the parallax and proper motion bias might be stellar contamination.Bailer-Jones et al.(2019) constructed a supervised classifier based on Gaussian Mixture Models to probabilistically classify extragalactic objects in Gaia DR2.See Figure 15,we plot the color-color diagram for our selection to compare with the training set colored according to the true classes used in Bailer-Jones et al.(2019).We find that for the QCC-B subset,some sources locate at the quasar-star overlap area,which indicates the presence of star contamination.A rough estimate shows that the purity of QCC-B is about 42.2%–58.5%.9Two ways are implemented for the purity testing.A): we cross-matched QCC-B with SDSS DR16 and found that the purity of QCC-B was 58.5%.B):Gaia Collaboration et al.(2022b) provided a low purity (50%–70%) QSO candidate table with 6.6 million sources (QCT hereafter) and a QSO subsample with 95% purity (QSS hereafter).For the 316,683 sources in QCC-B,54,343 sources are found in QSS,38,266 sources are found in QCT but not in QSS.Then we matched the remaining 224,074 sources with SDSS DR16,and the purity of these sources is about 28% according to the spectroscopic classifications of 79,154 common sources.Based on these results,the purity of QCC-B is estimated to about 42.2%–44.6%.Using the same methods,the purity of the overall QCC is about 87.8%–91.3%,which is consistent with our conclusion in Section 3.1.

Figure 6.The purity sky distribution of QCC.The map shows the sky density with each cell of approximately 0.84 deg2,using the Hammer Aitoff projection in Galactic coordinate with zero longitude at the center.

Figure 7.The sky distribution of the integer frame detection count in W2 band for QCC sources.The map shows the sky density with each cell of approximately 0.84 deg2,using the Hammer Aitoff projection in Galactic coordinate with zero longitude at the center.

According to above investigation,there is obvious stellar contamination in QCC-B,which will lead to confusion when using QCC.So it is necessary to further reduce the stellar contamination and select purer sample.Heintz et al.(2018)came up with an idea that S/Nμ=μ/σμ<2 is a more strict and effective criterion for identifying quasars.We investigated the μ/σμdistribution of the quasars in Gaia EDR3 AGN catalog,90% of these quasars have a μ/σμless than 2.After applying this criteria to QCC-B,we find the mean parallax and proper motion bias is significantly reduced,see Table 4.Moreover,the standard deviations of the best-fit Gaussian distributions in Figure 9 are 1.055,1.106 and 1.074 for the QCC-A,QCC-B and QCC-B (μ/σμ<2) subset,respectively.In Figure 11,the corresponding standard deviations of the best-fit Gaussian distributions are 1.062,1.840 and 0.990 for μα*,and 1.072,1.870 and 1.087 for μδ,respectively.Stricter criteria of proper motions significantly reduce the standard deviations of the bestfit Gaussian distributions of parallax and proper motion,and the distribution of the sources with μ/σμ<2 is closer to the quasar region in Figure 15,which indicates a more reliable quasar candidates.To further reduce stellar contamination in the sample,we select 99,673 sources withGBP−G≤0.8 andG−GRP≤0.8 in the QCC-B(μ/σμ<2)subset.Their average parallax,μα*and μδare 1.6±48.2 μas,0.2±38.7 μas yr−1and −15.3±41.0 μas yr−1respectively,which indicates smaller bias and residuals in these astrometric parameters.We have also marked these sources as reliable quasar candidates(RQC)in our catalog,and the purity of RQC we estimated was about 73.1%–85.2%.10Referring to the approach to estimate the purity of QCC-B in this section,the purity of RQC estimated by method A and B is 85.2% and 73.1%–78.9%,respectively.

Figure 8.Histogram of the morphological indexes from images from the SDSS.The left panels represent the morphological indexes of the point sources,and the right panels represent the extended source.From top to bottom,followed by g,r,i,z filters.The average morphological indexes are labeled in the upper right corner of each figure.

Figure 9.The normalized parallax distribution for the sources in QCC.The yaxis has been logarithmized,each bin of x-axis is 0.01 mas.The red lines represent the best-fit Gaussian distributions.

Figure 10. G magnitude distribution for the sources in QCC.The y-axis has been logarithmized,each bin of the x-axis is 0.02 mag.

Figure 11.The normalized proper motions distributions for the sources in QCC.The red lines represent the best-fit Gaussian distributions.

Figure 12.Number of good AL observations distribution for the sources in QCC.The y-axis represents the proportion of the number in each bin to the total sample,each bin of the x-axis is 10.

Figure 13.The generalized moving mean parallaxes of sources in QCC-A(A)and QCC-B(B).The map uses the Hammer Aitoff projection in Ecliptic coordinates.

Figure 14.The good AL observations of sources QCC-B.The map uses the Hammer Aitoff projection in Ecliptic coordinates.

Figure 15.The color-color diagram for the sources in QCC-A(A)and QCC-B(B).Figures(C)and(D)are the color distribution of sources in QCC-B(μ/σμ ≤2)and QCC-B(μ/σμ>2),respectively.The red dots represent 10,000 randomly selected sources in each sample,and the contours in each figure show the variation in source density of the whole sample on a linear scale.

4.Discussion

4.1.Quasar Candidates in the Galactic Plane

As seen in Equation (1),to lower the possibility of stellar contamination,the quasar-like objects identified in Gaia EDR3 and QCC have ruled out the objects within the Galactic plane(∣sinb∣ ≤0.1).The coolest brown dwarfs and the most heavily dust-reddened stars will exhibit similar WISE colors as quasars near the Galactic plane (Stern et al.2012).Kirkpatrick et al.(2011) showed that stars of spectral class later than T1 dwarfs have W1-W2 ≥0.8 mag,which means that the color selection criterion from the WISE data is not working effectively near the Galactic plane.The most reliable way to identify a quasar near the Galactic plane is the spectrum method,such as LAMOST Spectroscopic Survey of the Galactic Anti-center(LSSGAC)(Liu et al.2013).With the spectral data,Huo et al.(2017)presented a sample of 151 quasars discovered in an area near the Galactic Anti-Center.Machine learning is another important method.Fu et al.(2021) synthesized quasars and galaxies behind the Galactic plane and applied the XGBoost algorithm to Pan-STARRS1(PS1)(Flewelling et al.2020)and AllWISE photometry for quasar classification,in which they obtained the Quasars behind the Galactic Plane (GPQ)candidate catalog with 160,946 sources located at |b|≤20°.Since the quasars are very important,under the situation of lacking spectrum data,we intend to check the reliability of these quasar candidates by analyzing their astrometric solutions.

Cross matching near the Galactic Equator is heavily affected by the confusion sources,which may lead to problematic crossmatching (Gaia Collaboration et al.2022a).Therefore,we carefully matched with the PS1 source_id provided by GPQ and the gaiadr3.panstarrs1_best_neighbour provided by GaiaArchive.Note that although this will improve the accuracy of the match,the problematic matching may still exist.We obtained 133,798 common sources of GPQ and Gaia EDR3.Among them,there are 76,225 five-parameter sources and 37,684 six-parameter sources in the magnitude range from 6 to 21Gmag.For comparison,we have selected the quasar candidates near the Galactic plane (|b|<10°) in FRS,all candidates from EDR3_AGN catalog and spectroscopic confirmed quasars from LAMOST DR7 (LAMOST DR7Q hereafter) (http://dr7.lamost.org/v2.0/).See Tables 5 and 6 for the mean parallax and proper motion of each quasar sample.Except for GPQ,the mean parallaxes of five-parameter sources for each sample are consistent with each.Additionally,for the six-parameter sources,the mean parallaxes of GPQ and LAMOST DR7Q are obviously different from EDR3_AGN catalog.For the mean proper motions of these samples,GPQ is evidently different from other catalogs in both five parameter and six parameter sources.Assuming that quasars in different Galactic latitudes have similar astrometric system errors,the significant positive mean parallax and negative mean proper motion of GPQ six-parameter sources might be caused by stellar contamination.We will discuss the systematic errors in different sky regions in detail in Appendix.

Table 4 Comparison of Parameters between Gaia EDR3 AGN and QCC,all Averages are Derived from Data after GMM

The Gaia team provided a parallax bias correction model,which proposes that the parallax bias is at least related to thegband magnitudeG,ecliptic latitude β and photometric parameter νeff.For the faint sources,this model is derived from the quasar candidates of EDR3 AGN (Lindegren et al.2021a).To investigate the effectiveness of this model in the Galactic plane,we apply it to the GPQ and LAMOST DR7Q quasar samples.Again,we use FRS and EDR3_AGN catalog as comparisons.See Table 5,for the five-parameter sources,the parallax biases of the FRS and LAMOST DR7Q samples are corrected to about 5 μas,while 26.5 μas and 0.5 μas for the GPQ and EDR3 AGN catalog,respectively.For the sixparameter sources,the parallax biases of GPQ and LAMOST DR7Q samples deviate significantly from zero,suggesting the correction model is not working effectively.These results indicate that (i) The crowded sky near the Galactic Equator may have caused significant GPQ matching errors;(ii) There might be a stellar contamination with the GPQ sample,especially in the six-parameter sources;(iii) Compared to high Galactic latitude regions,the photometric data obtained from the Galactic plane follow different probability distribution.Therefore,the parallax bias correction model provided by the Gaia team is not working effectively in the Galactic plane.

4.2.Quasar Candidates with Abnormal Astrometric Behavior

There are lots of spectroscopically identified quasars contained in the catalogs listed in Tables 1 and 3.However,the astrometric parameters of some quasars are significantly abnormal.For example,some quasars have large bias in proper motion and parallax,or have obvious position difference between Gaia DR2 and EDR3.This indicates that Gaiaʼs highprecision observations could detect the jitter of these quasars,or as mentioned in Shen et al.(2021) and Chen et al.(2022),they might be special quasars,such as quasar pairs and lensed quasars,which are important in exploring the evolution of galaxies and finding double black holes.On the other hand,these quasars are not suitable for establishing the celestial reference frame.Therefore,it is necessary to exclude them from the quasar candidates used for the celestial reference frame.

Table 5 Parallaxes of GPQs,Gaia-FRS,LAMOST DR7Q and Gaia EDR3 AGN before and after Correction

Table 6 Proper Motions of GPQs,Gaia-FRS,LAMOST DR7Q and Gaia EDR3 AGN

We made a preliminary attempt to select such abnormal astrometric quasars.We have selected SDSS quasars with more than one corresponding source in Gaia EDR3 within 1″.These quasars may be affected by nearby sources,therefore,they have large positional errors.For the sources with a single Gaia matched within a 1″ radius of the SDSS position,inspired by Lindegren et al.(2021b),we have selected some abnormal astrometric quasar candidates based on several astrometric parameters.These parameters with large values indicate a bad fit of each source or probably are unsolved double stars.We checked the SDSS images of these spectroscopically confirmed quasars with such astrometric behavior and found that most of them are extended-source obviously.Figure 16 shows four sources of such quasars: (A) and (B) are SDSS quasars with two Gaia matches within 1″ radius,while (C) and (D) are two arbitrary quasars among sources selected based on the above parameters.Among them,Figure 16 (D) is an image of J013958.43+321631.6,whose corresponding values are 10.673 754 mas,5.370 112 for astrometric_excess_noise and astrometric_excess_noise_sig,respectively.Although most of them have near-zero parallaxes and proper motions,their positions are not reliable due to the large astrometric jitter observed by Gaia or the presence of other sources very close to them (<1″).Therefore,we should remove these sources when establishing the celestial reference frame.More details could be found in Wu et al.(2022).Based on this study,we have found 284 abnormal astrometric quasars in QCC and flagged them in our catalog.

Figure 16.Four SDSS DR16 quasars images with abnormal astrometric behavior.

4.3.Limitation of this Work

Selecting quasar candidates with astrometric and mid infrared data has been proved to be effective and reliable.Based on this,we have selected millions of quasar candidates with Gaia and AllWISE data.These quasar candidates with high completeness and purity will be important in the process of realizing the celestial reference frame.

However,there are still some limitations in our approach.First of all,as we emphasized in Section 3.3,Gaia EDR3 use only 34 months observations,which greatly affects the astrometric solutions.These results will affect the purity of the quasar sample selected by astrometric methods.With longer observations in the future,this situation will be improved.Second,the number of quasars identified by this method heavily depends on the number of objects with infrared data provided by AllWISE,which means our quasar candidate catalog ruled out many sources that have not been observed by AllWISE.As discussed in Section 3.1,the completeness and purity will decrease with fewer WISE observation.Therefore,our quasar sample is less complete and pure than the quasar candidates provided by Gaia(Gaia Collaboration et al.2022a).Third,due to the heavy dust in the crowded areas such as the Galactic plane and LMC/SMC,the WISE color loses its effectiveness to distinguish brown dwarfs and the dustreddened stars from quasars.In such cases,the selection results with our method should be used with caution.Further efforts are required to identify quasars in these crowded areas.Implementing spectroscopic surveys to these areas is the most reliable way.Using machine learning methods with explicit account for the Galactic extinction and reddening to provide external catalogs specially for the Galactic plane is also a feasible approach.

As claimed by Høg(2014),astrometric detection of quasars,i.e.,to identify quasars only from the characteristics of zero proper motion and parallax,which is unbiased by any assumptions on spectra,might lead to discovery of a new kind of extragalactic point sources (Heintz et al.2015).This issue can be verified with more Gaia observations and more accurate astrometry data in the future.

5.Summary and Conclusions

Quasars are one type of active galactic nuclei.Because of their bright centers and point-like appearances,quasars are the perfect objects to establish the celestial reference frame.The ICRF3 is established by 4588 sources at radio wavelengths,there are about 22%of sources show great offset in optical and radio positions (Charlot et al.2020).With the high-precision astrometric parameters for more than 1.8 billion sources of Gaia EDR3,lots of quasar candidates can be identified.To establish a non-rotating celestial reference frame in the optical band,we need a reliable catalog with a large number of quasars.In this paper,we used the astrometry data of Gaia EDR3 and color data of AllWISE to identify quasar candidates and made a comprehensive evaluation of them.

A QCC of 1,503,373 sources(about 90%purity)is obtained by astrometric and mid-infrared methods in Gaia EDR3,which has 1,186,690 (78.9%) candidates in common with Gaia EDR3ʼs AGN catalog.The purity of 316,683 newly identified quasar candidates is about 42.2%–58.5%.Compared with LQAC5,the completeness of our catalog is around 80%,and we randomly select 4171 common sources of our catalog and SDSS DR16,according to the spectrums of SDSS,about 99.7% of which are quasars,94.0% are point-like sources.Compared to the previous similar research (Guo et al.2018),we have selected more quasar candidates (1,503,373 versus 662,753) with higher purity (90% versus 77%).Star contamination is present in the newly identified subset and the purity of this subset improved significantly after we used more stringent astrometric and color conditions.In addition,we found that the purity of quasar candidates selected by mid-infrared and astrometric data decreases around the LMC/SMC,area near the Galactic Equator and at the fainter magnitude.

We find that the parallax correction model of Gaia EDR3 cannot be directly applied to sources near the Galactic plane,especially to the six parameter sources.We also select the quasars with abnormal astrometric behavior,which are not suitable for establishing the celestial reference frame and should be excluded from the quasar candidates for such a purpose.We can foresee that with the future release of Gaia data,the identification of quasars using astrometric methods will have increasing reliability.

Although Gaia have provided more than six millions quasar candidates in Gaia DR3 (Gaia Collaboration et al.2022b),the reliability of the quasar candidate list needs to be tested by the quasar catalog obtained by other methods.The quasar candidate catalog obtained by astrometric and mid-infrared methods will play an essential role to verify the future release of Gaia data.

Acknowledgments

We thank the anonymous reviewers for their valuable suggestions.We used data from AllWISE to achieve this work:AllWISE makes use of data from WISE,which is a joint project of the University of California,Los Angeles,and the Jet Propulsion Laboratory/California Institute of Technology,and NEOWISE,which is a project of the Jet Propulsion Laboratory/California Institute of Technology,WISE and NEOWISE are funded by the National Aeronautics and Space Administration.This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos.esa.int/gaia),processed by the Gaia Data Processing and Analysis Consortium (DPAC,https://www.cosmos.esa.int/web/gaia/dpac/consortium).Funding for the DPAC has been provided by national institutions,in particular the institutions participating in the Gaia Multilateral Agreement.We are also very grateful to the developers of the TOPCAT(Taylor 2005) software.This work has been supported by the Youth Innovation Promotion Association CAS with Certificate Number 2022259,the grants from the Natural Science Foundation of Shanghai through grant 21ZR1474100,and the National Natural Science Foundation of China(NSFC)through grants 12173069,and 11703065.We acknowledge the science research grants from the China Manned Space Project with NO.CMS-CSST-2021-A12 and NO.CMS-CSST-2021-B10.

Appendix The Astrometric System Error of Different Sky Regions

In Section 4.1,we found possible stellar contamination in GPQ,especially the six parameter sources in this catalog.These conclusions are based on the assumption that sources in different sky regions possess similar systematic errors.The sources in GPQ are close to the Galactic Equator,so we investigated the variation of astrometric system error with Galactic latitude.We found that most of the sources in GPQ are located at the region of |b|>5°,and only 3993 (3.51%)sources have a Galactic Latitude less than 5° (and larger than−5°).This means that the GPQ and EDR3_AGN catalogs have some overlapping sky regions,which is 20°≤|b|≤5° (overlap-region here after).Therefore,we calculated the mean proper motion and corrected parallax of sources at different Galactic latitudes.Figure A1 shows the distributions of astrometric system errors.First,both five parameter and six parameter sources with |b|<5° in GPQ have a obvious systematic errors.The number of sources in these regions is very small (about 3000),which makes the average value of system errors easy to be significantly affected by some extreme values,so these system errors might be unreliable.Second,at the overlap-region,the systematic errors of most sources in GPQ is larger than that of sources in EDR3_AGN,especially for six parameter sources.Third,for EDR3_AGN,the systematic error of the sources has no obvious change at high Galactic latitude.

Figure A1.The astrometric system error of different Galactic latitude for five parameter sources (A) and six parameter sources (B).