This article is part of the network’s archive of useful research information. This article is closed to new comments due to inactivity. We welcome new content which can be done by submitting an article for review or take part in discussions in an open topic or submit a blog post to take your discussions online.
Friedman et al (1998) asserted that ‘no study is better than the quality of its data’. This quote highlights the importance of recording and storing data which is of good quality; that is, valid, auditable, accurate data that can easily be replicated, and that measures the intended variables of the research question.
As a result, systems for managing data must be in place. Data management in clinical research relates to the processes of gathering, recording, monitoring, analysing and reporting on data. The process of data management should be ongoing and should begin at the early stages of protocol development, and end only when statistical analysis is complete.
This article seeks to clarify the concept of data management as applied to clinical trials.
Various guidelines standardise the specific requirements for data management during clinical research. For example, the Good Clinical Practice (GCP) guidelines, harmonised in 1996 following the International Conference on Harmonisation, aim to standardise the design and management of clinical trials to ensure that participant rights are protected and that data generated from the trial is gathered in a valid and replicable manner.
The ICH-GCP guidelines offer a list of essential documentation which should be gathered during the trial, and highlight the importance of documentation for each activity in the trial process to create a clear audit trail. Developed particularly with Westernised countries in mind, the ICH-GCP guidelines are well-aligned with other standards such as the 2004 Clinical Trials Regulations (UK/European), and Code of Federal Regulations (US) for investigational drug trials. Patient Privacy and Confidentiality is specified by the ICH-GCP, but is also covered by other national legislation, such as the Health Insurance Portability and Accountability Act of the United States and the Protection of Personal Information Act of South Africa.
However, guidelines are required which are suited to clinical research other than investigational drug trials, and also suited to a wider range of locations, including resource-poor countries. This article aims to provide practical information which is applicable in a wide range of settings.
The ICH-GCP guidelines highlight the importance of ongoing documentation to create a clear audit trail. Documentation should show evidence of what researchers and staff intend to do (for example, achieved through Standard Operating Procedures – SOPs), and evidence that these intended actions were carried out, for example, logs and reports which are signed and dated. The ICH-GCP guidelines also state that it is necessary to have an ‘irrefutable link between electronic record and electronic signature (11.10(j))’ - for example, through the use of secure signatures which cannot be forged and are tightly coupled to an audit trail.
Data handling protocol
Ideally, all information pertaining to data management should be detailed in a data handling protocol – that is, a protocol specifically related to data management. This document should include:
· signature page
· protocol abstract
· primary and secondary aims of the study
· study personnel
· timelines and key activities
· database design and validation and monitoring guidelines
· data flow and tracking
· data entry procedures
· electronic data transfer
· query handling
· database back-up and recovery
· database close and lock
· archiving and security
Data management staff include a range of roles, including (but not limited to) filing clerks, data capturers, monitors, data managers, etc. To ensure high quality data and to comply with general GCP principles, these staff members are suitably trained for their role, with up-to-date CVs and, ideally, evidence of continued professional development. An up-to-date list of data management staff should also be present in the data handling protocol, which details the level of data access/records to which each staff member is allowed.
Standard Operating Procedures
A Standard Operating Procedures (SOP) is a document which outlines the standard practice for performing a task or system action . Every activity and system that is part of a clinical research trial or project is based on an SOP. SOPs are encouraged by GCP guidelines, and aim at reducing variability in the data collected, by ensuring that all staff members perform processes in a similar manner, thereby promoting the replicability of the data. Standard SOPs required for a data management system would include:
· Data entry guidelines (who is responsible, how this is performed , how errors are caught etc.)
· Database backup and storage (details on how this is done, how often and staff responsible)
· CRF tracking and storage (details of storage scheme, confidentiality, responsible staff, etc.)
· Data monitoring, verification and correction (who is responsible for monitoring the data, how often this is done and methods used)
· Importing electronic data
· Database close, lock and unlock (information on how this is performed, how security is maintained, who gives the instruction to lock, the pre-conditions required for the database to be locked, who can lock and unlock database etc.)
A document which details the available SOPs, their version number, date, author and the person who approved them should also be maintained and updated whenever necessary.
Databases and software
Databases are critical to recording and maintaining high quality data. Spreadsheets do not have the features required for this purpose. The database is developed according to the protocol of the individual research project, and should be verified in a test environment to prove functionality and validity.
All test data should be filed with the data management documents, and any changes to the database structure must be must versioned and re-tested.
Various Clinical Data Management Systems (CDMS) are designed to meet this need, for example Epi data, OpenClinica and Oracle clinical. These systems provide the necessary software features that meet the various regulatory requirements identified above.
Microsoft Access is one of a number of low-cost databases that have the necessary functionality built to meet the regulatory requirements, but this will require programming resources to implement.
Microsoft Excel is a spreadsheet application that is useful for the analysis of the data, but it is a poor choice for the long term recording and management of clinical trial data and it cannot meet the regulatory requirements, even with substantial programming. .
A note on Double Data Entry
Double Data Entry (DDE) is the technique whereby data transcribed on a form is captured twice by different people and programmatically compared, thereby eliminating (or limiting) errors in the data capture process. This can reduce error rates in the data capture process to less than 1 in 1000 key depressions. While this is an excellent technique for resolving errors in the actual capture of the data, it does nothing for data transcription errors, or other errors that may affect the quality of the data. Reynolds-Haertle and McBride  found that this technique did not improve data accuracy overall; normal data monitoring procedures would uncover such errors. For this reason, DDE need not be implemented for clinical trials with a well-managed monitoring process, although DDE does have its place in other business applications.
Data queries occur when there is a discrepancy in data – for example, data appears to be inconsistent with expectations (e.g. when a male is listed as being pregnant). Errors can occur at several stages of the data entry process, and these errors must be captured, checked and amended through an audit-trailed system.
A range of quality assurance steps should be in place to catch errors prior to data analysis, for example performing monthly edit checks following data entry, and monitoring data during review and after the initial ‘soft’ database lock. Also, a random sample of CRFs should always be reviewed for accuracy and completeness against the source.
When errors are found, they should be checked following the Data Verification and Correction SOP using a Data Clarification Form (DCF). The DCF should contain information such as the protocol number, discrepancy, detail about where the discrepancy occurred in the CRF, and should include the names of both the requester and the investigator, as well as space to sign and date the form (see attached DCF). Once the requester has filled in the form, it should be sent to the clinical team, resolved, signed and dated, before amendments being made on the CRF. The DCF should be kept and filed with the CRF.
The data review processes and details should be kept for auditing purposes, for future system enhancements, and for training.
Electronic Data Capture
In some trials/clinical research, electronic data capture is becoming a popular option. There are benefits to capturing data immediately onto an electronic system; it is easier to monitor, and can be faster (because there is no need to type up results following paper data capture). It saves on staffing, space, transportation and cost because of the lack of paper documentation.
However, there are some drawbacks that also have to be considered – for example, it is not as easy to check the source of the data when it is entered electronically, so extra monitoring has to take place relating to recording the authorship of each file. Secondly, there is no record of the visit or measurement from which the data was taken. Also, each staff member needs to be trained in computer skills and the relevant programmes, which can take additional time and budget and may be less appropriate for resource-poor settings. Finally there is a suggestion [3,4] that error rates can be higher during electronic data capture as compared with written data capture.
The process of importing electronic data (usually laboratory data) into the database that holds the clinical data should be documented as a SOP. The import facility must meet the regulatory requirements, such as keeping an audit trail against the electronic signature of the person who imports the data. There must also be an irrefutable trail from the source of the test result to the database.
End of study
At the end of the study, the data should be reviewed for one final time and ‘soft-locked’ while final queries are resolved and documented. Following the resolution of all queries, the database can be finally locked and archived according to the procedures in the SOPs/Data handling protocols.
Visit Global Health Trials' extensive Templates Library for relevant resources, guidelines, SOPs and checklists that you can download and adapt for your trial.
2. Single vs. double data entry in CAST. Reynolds-Haertle RA, McBride R . Control Clin Trials. 1992 Dec;13(6):487-94.
3. Evaluating the Accuracy of Data Collection on Mobile Phones: A Study of Forms, SMS, and Voice. Patnaik S, Brunskill E , Thies W. International Conference on Information and Communication Technologies and Development. IEEE/ACM, April 2009.
4. Comparison of Electronic Data Capture (EDC) with the Standard Data Capture Method for Clinical Trial Data. Walther B, Hossin S, Townend J, Abernethy N, Parker D, Jeffries D. PLoS ONE 6(9): e25348. doi:10.1371/journal.pone.0025348
This article was originally posted on Global Health Trials. Please visit us for more resources relating to trials.
I really appreciated this article. It provides a global view of data management. This is really useful for beginners who want to start in data management.