Wang Chuang
(School of Business Administration,Shenyang Pharmaceutical University,Shenyang 110016,China)
Abstract Objective To provide references for improving the authenticity and reliability of the retrospective study results,thus improving the quality of evidence in the real world and strengthening drug supervision and decision-making.Methods Literature review was used to study the data sources,the characteristics of retrospective research,the sources and the corrections of selective bias in the real world.Results and Conclusion The biases in retrospective study mainly come from admission rate bias,patient rate bias,survivors bias,health user bias and symptom bias.
Keywords:retrospective studies;selection bias;real-world study
After the emergence of the concept of realworld study (RWS),the United States and many European countries initially conducted a series of research on RWS.The theoretical system,technical methods,experimental design of real-world data(RWD),real-world evidence (RWE) and RWS have been formed throughout a large number of clinical explorations and summary of experience.RWE generated by RWS based on RWD is widely applied in many aspects,including new drug research and development,registration and marketing,drug indications expansion,drug instructions change,adverse reaction research,clinical efficacy,and safety of drug combination.RWS has been recognized and applied by enterprises and governments because of its advantages of highly restoration and close to the real medical settings,maximizing the authenticity and extrapolation of research results,shortening the research cycle,and saving research costs.At the same time,domestic experts,scholars,and regulatory authorities have also carried out the exploration of RWE and RWS.Wu Jieping Medical Foundation and China Chest Cancer Research Collaboration Group(2018) released research guidelines on RWS related concepts,ideas,processes,research design types,data sources and quality control,bias and mixed control,etc.[1].ChinaREAL (2019) published a series of theoretical achievements on database construction,observational research and effectiveness research design,bias generation and control,statistical analysis methods,etc.,which constructed the theoretical framework and relevant technical specifications of RWS in China[2,3].RWS indicates a new direction of research.It is not an independent research but a variety of different types of research[3].According to the different problems and research design,it can be divided into observational research and timeliness research.Since RWS resembles clinical practice to the greatest extent,the selection of research objects and the differences of the sample population,there will always be a variety of biases or mixed factors in observational research and effectiveness research.This will affect the quality of RWE.Using appropriate statistical analysis methods to avoid and weaken bias or confusion in RWS has become significant.This paper mainly explains the application,data sources and characteristics of retrospective study in observational research and the source of selective bias in research design.It discusses the methods to correct the bias of retrospective study selection.
Observational studies include retrospective and prospective studies.Retrospective research,which takes the present as the result to trace back to the past,is one of the most convenient,commonly used and most close to clinical practice in the real world.It provides a research method that saves time and cost.According to the problem solved,it is applied to etiology study.For instance,the existing recorded results are used to infer the cause of disease through the proper causal sequence,such as the retrospective analysis of the common pathogenic factors of chronic prostatitis[4].Secondly,it is applied to the study on the therapeutic effect.It can be used in the study of intervention,exposure or drug treatment effect which provides high-quality evidence about the safety and effectiveness of treatment,such as the retrospective study on the effectiveness of calcineurin inhibitor in the treatment of idiopathic membranous nephropathy[5]and on the clinical application and safety of PD-1 monoclonal antibody in the treatment of non-small cell lung cancer[6].Thirdly,it is applied to adverse reaction studies.It is used for adverse reactions to intervention programs or drugs,such as analysis of adverse reactions of “Xianling Gubao” preparation based on the data of the national ADR spontaneous reporting system[7],and studying the adverse reactions signals of influenza IV vaccine based on the data of the US adverse event reporting system[8].Fourthly,it is applied to drug combination analysis.It can be used to study the clinical characteristics and combination analysis of patients,such as a retrospective study based on the clinical characteristics and combination analysis of 12 554 patients with “Ciwujia” injection[9].Fifthly,it is applied to pharmacoeconomic evaluation of intervention programs,such as using the retrospective database to evaluate the pharmacoeconomics of Agkistrodon halys hem coagulase for injection in surgery[10],using the retrospective data from hospital information system (HIS) to compare the safety,effectiveness,and economy of azithromycin in the treatment of Mycoplasma pneumoniae pneumonia in children[11].Retrospective research has become an essential part of RWS.
The RWD in the retrospective study can be derived from the electronic medical data or paper case reports of patients in the information system of medical institutions that have been recorded in the retrospective observational study or obtained the approval for clinical research on preparations,including electronic medical record (EMR),HIS and the case report form (CRF).The commonly used retrospective databases include SuValue®,the largescale inpatient electronic database established by the Institute of Essential Clinical Medicine of the Chinese Academy of Chinese Medical Sciences and the spontaneous reporting system (SRS) of the National Center for ADR Monitoring.
The existing health care data is not the data collected for the problems identified before the research.The RWD has the following characteristics.The existing data used in the retrospective study are generated for medical and management purposes rather than a specific research purpose.The current data exist before the start of the study,and some people have verified the safety and effectiveness of drugs based on the existing data and the evidence generated.The amount of data is tremendous,but the data is scattered and heterogeneous,and the integrity and accuracy of data (such as missing data,abnormal data,and contradictory data) also have some problems.The covered population is the patients in medical institutions,whose number is small.The data contains much information about medication,inspection,examination,diagnosis and treatment.Since the coverage time is long,the number of cases covered is huge.The data and research problems are highly matched and accessible.
The first task of retrospective research is to establish the criteria for inclusion/exclusion of research objects and select appropriate samples in the research design.Among them the selection bias of the research object is the expected bias in retrospective research based on RWD.It refers to the systematic error caused by the difference in some characteristics between the selected patient sample and the target population (or excluded patients).According to different reasons,the selection bias can be divided into admission rate bias,patient rate bias,healthy user bias,symptom bias,and survivors bias.
Admission rate bias refers to the differences in disease severity and prescription allocation ratio among patients in different hospitals when using inpatients or outpatients as research objects,which may lead to bias in research results[12].In the retrospective study,the inclusion/exclusion criteria should be well designed.For instance,the data of 23 575 patients from EMR and HIS,who had used“Suxiao Jiuxin” pills at least once in 37 top class hospitals in China from 2001 to 2015 were extracted.Then,the clinical application characteristics and drug combination of “Suxiao Jiuxin” pills were analyzed retrospectively.Since most of the included cases came from the top class hospitals,it may cause selection bias[13].
As for patient rate bias in database-based studies,there is no distinction between current and new cases[1].The patient rate is an indicator of the number of new cases in a population,indicating the frequency of new cases of a particular disease in a certain group.It usually refers to the frequency of incidence within a certain limited range and a short period.It applies to the outbreak of diseases in local areas,such as food poisoning,infectious diseases,and occupational poisoning[14].For example,Convid-19,which broke out in China and other countries from the beginning of 2020,shows a high infection rate with fast transmission speed.Based on the RWD on combating Convid-19,a retrospective study was conducted to develop a new vaccine,which was merely included in the existing cases but not promptly incorporated into new cases or new drugs.It is bound to lead to the deviation of the authenticity and effectiveness of the research results due to the patient rate bias.
Survivors bias means that the current drug users only reflect those who can tolerate treatment and are likely to be effective.For example,there are 3 000 patients’ data in a research database.After a series of inclusion,only 500 patients can be available for analysis.If some of them are excluded from the study due to data missing,that is,whether the patients with complete data are comparable to those excluded due to data missing,this may be the factor leading to the bias of survivors.If the patients with missing information withdraw or even die due to the poor efficacy of some drugs,the results of the excluded data will seriously affect the accuracy,reliability,and extrapolation of the research[1].
Healthy user bias means that patients with specific health behaviors also tend to follow doctor’s advice,including effective medication,diet,and physical activity.Some of the selected subjects are those with certain health behaviors,and who have a more positive attitude towards disease and health or better living habits.Compared with the excluded patients with unhealthy behaviors,they are not representative of the general population,which leads to bias.For example,in the study of the effectiveness of drugs for the treatment of coronary heart disease,a large number of patients who don’t smoke or drink alcohol since they are more concerned about their health,will cause selection bias.This can lead to overestimation of the effectiveness of drugs for the treatment of coronary heart disease,and then the authenticity of the research results will be affected by selection bias.
Symptom bias refers to the presence of a symptom factor in addition to the disease and exposure– a clinical symptom or sign,which is not a risky factor for the disease.However,people see a doctor because of this symptom,which improves the detection rate of early cases but leads to an overestimation of the degree of exposure.Finally,we may come to the wrong conclusion that the symptomatic factors are related to the disease.This bias tends to occur in the etiological study of cancer using the control case study.A typical example is a case-control study on the relationship between women taking compound estrogen and endometrial cancer in 1975.Women taking compound estrogen are more likely to have early endometrial cancer because they are prone to bleeding.The objectors reinvestigated the cases of endometrial cancer in the oncology department and gynecology department of the same hospital and found that 79% of the cases taking estrogen were early cases.In comparison,only 58%of the cases without taking estrogen were early cases,which indicates that the case selection was affected by the exposed factor.The cases with the exposure factor would have early clinical symptoms.There are apparent systematic differences between the selected cases and the non-selected cases,resulting in selection bias[15].
Due to the quality and characteristics of retrospective data in the real world and the inclusion and exclusion criteria of research objects and sample selection,it often brings about a lot of bias and confusion to retrospective research.Selection bias is only one of the inevitable systematic errors in retrospective research.The selection bias can affect the authenticity of research conclusions.It causes information bias and more mixed factors in retrospective studies,which further affects the validity and reliability of research conclusions,thus reducing the quality of RWE and causing systematic bias in conclusion extrapolation.Therefore,more attention should be paid to the application of statistical methods in retrospective research.How to reduce and control the selection bias should be regarded as the primary task in research design.Only through controlling the generation of bias from the source and reducing the degree of bias can we ensure the authenticity,reliability,and extrapolation of research conclusions to the maximum extent.In the design of research,we should control the above five biases according to the specific problems.
For the possible admission rate bias,the representativeness of the database population to the source population should be taken into consideration[8].For some uncommon diseases,the number of hospitals from which the sample cases are selected in the information system is too large,which tends to cause selection bias due to the lack of representativeness and the different disease severity and diagnosis and treatment level in different hospitals.However,if the number of hospitals is too small,it may lead to the insufficient sample size of research objects.Therefore,it is necessary to balance the representativeness and sample size.It is suggested that not only the samples of top class hospitals,but also more sample data from other hospitals should be included in the research design.The potential differences of treatment effect in different hospital groups should be analyzed objectively to reduce admission bias,which can reflect the effect of disease treatment.
For the possible patient rate bias,we should adopt a new retrospective study for new drug users and incorporate more new research objects in time.Particularly,the retrospective study should be carried out for the rapid and noticeable epidemic changes since its results are closely related to the reliability.Besides,the lagging results will affect the extrapolation of the research to a certain extent.Therefore,it is highly recommended that we should focus on the variation of pathogenic factors in the retrospective study of epidemic diseases and infectious diseases,adopting the RWD of more new patients or new drug users as many as possible to expand the sample size,reduce the bias of incidence,and reflect the safety and effectiveness of drug treatment for epidemic diseases and infectious diseases accurately.
For the possible survivors bias,we should consider grouping new drug users,current drug users and past drug users to compare the differences of potential bias in each group[8].Besides,we should strengthen the follow-up of patients who have lost their data due to poor treatment effects.When their data are made up in time,the number of samples excluded from the study due to lack of data can increase,which will reduce the survivor bias and improve the reliability and extrapolation of research results.
For the possible healthy user bias,we should refer to the patient’s compliance,conduct a general survey on the patient’s health behavior in advance in the research design.Then,the patient’s health behavior and bad habits are classified to estimate the frequency.According to the probability distribution and sample size of patients with health behavior,the research objects are selected,and the sample size of patients with different health behaviors is determined.The distribution of restored to objects with different behaviors can be restored to the real world to the maximum extent,thus making the research more representative and universal.This can correct the overestimation of treatment effect caused by the including more patients with healthy behaviors and higher compliance.
For the possible symptom bias,we should expand the time of the research object collection as far as possible,which can cover the transformation time and all occurrence links of the disease stage completely.Meanwhile,patients’ cooperation with the disease treatment should be improved to deduce the exposed factors of the disease through the data of patients past medical history and medication history.Lastly,the criteria in the research design should be made strictly.The proportion of selected cases affected by exposed factors is excluded as the sample.The excessive exposure proportion is corrected to the greatest extent.The systematic difference between selected cases and non-selected cases can be reduced,and the reliability of the study will be improved.
Selection bias is a kind of systematic error,and there is no absolute method to eliminate it.Only through full consideration and reasonable design before research can one or more possible selection biases be reduced to a reasonable range.Then,the conclusions and evidence obtained from the research can be widely recognized.