Kousick BISWAS
· Biostatistics in psychiatry (10) ·
Prevention and management of missing data during conduct of a clinical study
Kousick BISWAS
This paper is the seco nd in a 3-part series focusing on missing data.
In a clinical study missing data can occur for various reasons, with or without any actual loss of study participants because of drop-outs. Poorly designed assessment schedules or use of inefficient data collection tools can result in missed visits or omissions of questions and, thus, missing data. Some simple adjustments in the assessment schedule or minor modifications in the data collection tools can reduce these types of problems and, at the same time, reduce participants’ burden and inconvenience. If implemented properly, an efficient missing data monitoring system can retrospectively salvage the majority of the missing data items except those that are the result of the drop-out of participants. The effectiveness of such a monitoring system depends on how quickly the missing data items are identified and how promptly participating site personnel are notified of the need to provide missing information.
During the conduct of a clinical study missing data can occur because of the actions or lack of actions by both the study participants and the research personnel. A common cause of missing data is excessively burdensome requirements on participants which lead to decreased willingness to participate fully or to attempts by participants to short-circuit data collection by providing incomplete information.[1]It is important to understand the reasons for missing data in a particular study before deciding on the measures that can be put in place to prevent them. The main reasons for missing data in clinical studies can be classified into the three broad categories listed below.
a) Data missing because of participant action or inaction:
i. discontinue study investigational intervention but staying on the study;
ii. drop-out from the study;
iii. miss some scheduled clinical assessments because of inadequate compensation, unnecessarily excessive number of visits, inconvenient timing, long travel times, and so forth;
iv. refusal to undergo some required study procedures;
v. refusal to provide data in face-to-face interview or in self-completion questionnaire because participant considers the information too sensitive; and
vi. unintentionally missing items on long selfcompletion questionnaires.
b) Missing data because of investigator action or
inaction:
i. failure to fully explain the full extent of the requirements of the study to participants;
ii. failure to effectively encourage the active participation of participants in the study procedures;
iii. asking leading questions that limit the likelihood that participants respond fully; and
iv. insufficient oversight of self-completion instruments to ensure that questionnaires are satisfactorily completed.
c) Missing data because of study design:
i. each assessment or clinical intervention is too long for participants to sustain attention;
ii. more assessment visits are scheduled than needed to test the study hypothesis;
iii. unnecessary information is collected;
iv. self-completion questionnaire are not fully understood by all participants because of complicated or confusing language;
v. questionnaires have complicated skip patterns; and
vi. insufficient resources are available to collectthe required information (staff time, private clinic space, parking for participants, etc.).
Most of these problems can be addressed by relatively simple alterations to the design of the study and to the instruments and procedures used to collect data. Developing a formal operational plan for data collection based on the assessment schedule for the study can help minimize missing data.
2.1 Improvement in study design and implementation that will reduce missing data
Every study should be designed with the objective of complete data collection in mind. Experience from previous studies should be used to identify the design issues that could potentially result in missed visits and other types of missing data. Study participants most often miss evaluation or treatment visits because the time, location, and/or frequency of visits are inconvenient, or because the evaluations themselves are physically or psychologically stressful. To maximize participation rates, steps need to be taken to decrease the inconvenience and burden of the study procedures:
a) design a ‘lean’ study, that is, limit the number of assessments to the minimum needed to achieve the study objective;
b) consolidate instruments with overlapping questions by merging or lumping assessments to make them more efficient and less timeconsuming to administer;
c) minimize the number of in-person clinic visits by supplementing in-person visits with telephone surveys, e-mail messaging, and use of data on the individual available in hospital databases;
d) allow reasonable time for each visit and have flexible scheduling based on participant availability;
e) compensate well for the time and effort demanded from each participant;
f) in long-term follow-up studies provide increasing incentives to the participants for longer retention in the study;
g) selectively collaborative with investigators and research sites that have a track record of faithfully following the research protocol and of high study-completion rates.
2.2 Improve data-collection instruments to minimize missing data
Missing data can also occur because of poorly designed questionnaires and data-collection instruments. All instruments used in the study should be made as user-friendly as possible for the respondent, the interviewer, and the data entry personnel.[2]Language on self-completion forms needs to be simple and unambiguous. If respondents enter code numbers for different responses, all possible non-overlapping choices should be provided on the form, as close as feasible to the respective item. The flow of questions should be logical and instructions should be clear and in simple language. The amount of information collected needs to be minimized to limit the length of the forms without compromising the data needed to achieve the goals of a study. ‘Lean’ data collection limits data points to those that will be directly used in the assessment of outcomes. For electronic forms, particularly for those with internal skips, automatic hiding or unhiding of questions should be used to eliminate respondent confusion about which questions need to be answered and to simplify dataentry. Pop-up messages to remind users to complete certain sections on an electronic form can also eliminate the unintentional omissions of questions.
Wherever possible researchers should record the reasons for dropping out of the study. It is also possible to collect information about respondent’s intention to continue participating in the study (e.g., stated intention of attending next clinic visit and, if not planning to attend, reasons for not doing so). This information is important when determining appropriate statistical methods to use for treating missing data (i.e., can it be considered missing at random (MAR)?) and can provide useful covariates to be used in missing data models.
Use of modern data capturing technologies, within the scope of organizational IT regulations, can greatly decrease missing data. In-person clinic visits can be burdensome for study participants. If the participants are equipped with PDAs, tablets or other data recording instruments capable of electronic data collection and of generating reminders and alerts, the requirements for inperson visits can be dramatically reduced. Participants can stay at home or at work to do the assessments using the electronic device and bring the devices back to the clinic for data transfer during their next planned visit.
The following measures can help to maximize the completeness and accuracy of electronic data collection:
a) use simple, short questions and options with built-in data checks (e.g., range checks for numeric variables) for each data point;
b) require completion of ALL items so that omissions are only accepted if questions about the reasons for the omission are answered;
c) enable off-line completion of the data collection module (i.e., without internet connection) and temporary storage of data in the device prior to synchronization with the master database.
2.3 Role of study personnel in minimizing missing data
Missing data in a study is directly dependent on retention of the participants for the duration of the study and on their commitment to complete all the procedures specified in the study. Site investigators and other supporting site personnel play a major role in maximizing participant retention in the study and in highlighting the importance of completing the protocol. The following are some of the measures that site personnel can implement to minimize drop-outs and missing data.
a) Study participants often lose interest in the study over time so study staff need to emphasize the importance of completing the study protocol when obtaining consent to participate in the study and repeatedly highlight the importance of their continued participation at every clinic visit.
b) Create a welcoming and caring atmosphere for the participants and their family members –the entire study staff must be trained to treat participants with respect and to create an atmosphere where they can feel comfortable.
c) Offer appropriate alternate treatment to participants who need to discontinue the study treatment for any reason. Participants need to believe that study staff consider participants’healthcare needs a higher priority than the study itself.
d) Keep in touch (e.g., by telephone contacts) with the participants in between the in-person visits– periodic update of contact information, well being and other health issues helps to develop a bond between the study personnel and the study participants.
e) Be flexible, within the allowable time window, in scheduling in-person visits, and provide transportation and child care facilities if the participant needs them. This shows that study personnel are appreciative of difficulties participants need to overcome to participate in the study.
f) Continue on going educational efforts highlighting the importance of sustained engagement with the study to help move scientific knowledge in the disease area forward. These efforts help participants feel that their contribution is important and encourages them identify with the study’s needs and objectives.
One of the core data management tasks in a study is to monitor the data collection and the quality of the collected data. It is common practice to run data checking programs to identify missing data items along with other data quality indicators throughout the course of the study. The objective of these quality control procedures is to capture the missing data items or missing forms very early in the process so that the majority of these missing items can be salvaged. In general, data checking programs are run regularly over the span of the study and they are shared with the participating study sites to speed up the turnaround time for making necessary corrections. For multicenter studies, a data coordinating center should be established. Sites are encouraged to make an honest effort to salvage identified missing data items if it is allowed by the study informed consent. In order to provide sites enough time to react to these reports and before they lose the participants (either by completion of the study protocol or drop-out), the study management team needs to process incoming data as quickly as possible and to generate these missing-data reports at regular intervals.
The author reports no conflict of interest related to this manuscript.
1. National Research Council of the National Academies.The Prevention and Treatment of Missing Data in Clinical Trials.Washington DC: National Academies Press, 2010.
2. Edwards P. Questionnaires in clinical trials: guidelines for optimal design and administration.Trials2012;11: 2. (www.trialsjournal. com/content/11/1/2) (accessed 31 July 2012)
Dr. Biswas is currently serving as the Deputy Director of the Perry Point Coordinating Center, Cooperative Studies Program, U.S. Department of Veterans Affairs. He is also an Assistant Professor at the Department of Epidemiology and Public Health, School of Medicine, University of Maryland. He is the assigned biostatistician for various national and international VA, NIDA and University of Maryland clinical trials. During his tenure at the VA CSP he has published widely in peer-reviewed journals and presented talks about clinical trials and biostatistics at national and international conferences.
· Erratum ·
The grant number for the study reported in the original article entitled “Effectiveness of a rehabilitative program that integrates hospital and community services for patients with schizophrenia in one community in Shanghai” that was published in the third issue of the 2012 volume of the journal [Shanghai Arch Psychiatry 2012; 24(3): 140-148.] was incorrect. The correct funding number is as follows: 2009235. The editors apologize for this mistake.
10.3969/j.issn.1002-0829.2012.04.008
US Department of Veterans Affairs Cooperative Studies Program Coordinating Center, VA Maryland Health Care System, Perry Point, MD, USA *Correspondence: Kousick.Biswas@va.gov