SDTM

SDTM (Study Data Tabulation Model) defines a standard structure for human clinical trial (study) data tabulations and for nonclinical study data tabulations that are to be submitted as part of a product application to a regulatory authority such as the United States Food and Drug Administration (FDA). The Submission Data Standards team of Clinical Data Interchange Standards Consortium (CDISC) defines SDTM.

On July 21, 2004, SDTM was selected as the standard specification for submitting tabulation data to the FDA for clinical trials and on July 5, 2011 for nonclinical studies. Eventually, all data submissions will be expected to conform to this format. As a result, clinical and nonclinical Data Managers will need to become proficient in the SDTM to prepare submissions and apply the SDTM structures, where appropriate, for operational data management.

Background

SDTM is built around the concept of observations collected about subjects who participated in a clinical study. Each observation can be described by a series of variables, corresponding to a row in a dataset or table. Each variable can be classified according to its Role. A Role determines the type of information conveyed by the variable about each distinct observation and how it can be used. Variables can be classified into four major roles:

A fifth type of variable role, Rule, can express an algorithm or executable method to define start, end, or looping conditions in the Trial Design model.

The set of Qualifier variables can be further categorized into five sub-classes:

For example, in the observation, 'Subject 101 had mild nausea starting on Study Day 6,' the Topic variable value is the term for the adverse event, 'NAUSEA'. The Identifier variable is the subject identifier, '101'. The Timing variable is the study day of the start of the event, which captures the information, 'starting on Study Day 6', while an example of a Record Qualifier is the severity, the value for which is 'MILD'.

Additional Timing and Qualifier variables could be included to provide the necessary detail to adequately describe an observation.• The SDTM addition to PROC CDISC does not convert existing SDS 2.x content to SDTM 3.x representations.

Datasets and domains

Observations are normally collected for all subjects in a series of domains. A domain is defined as a collection of logically-related observations with a topic-specific commonality about the subjects in the trial. The logic of the relationship may relate to the scientific subject matter of the data, or to its role in the trial.

Typically, each domain is represented by a dataset, but it is possible to have information relevant to the same topicality spread among multiple datasets. Each dataset is distinguished by a unique, two-character DOMAIN code that should be used consistently throughout the submission. This DOMAIN code is used in the dataset name, the value of the DOMAIN variable within that dataset, and as a prefix for most variable names in the dataset.

The dataset structure for observations is a flat file representing a table with one or more rows and columns. Normally, one dataset is submitted for each domain. Each row of the dataset represents a single observation and each column represents one of the variables. Each dataset or table is accompanied by metadata definitions that provide information about the variables used in the dataset. The metadata are described in a data definition document named 'Define' that is submitted along with the data to regulatory authorities.

Submission Metadata Model uses seven distinct metadata attributes to be defined for each dataset variable in the metadata definition document:

Data stored in dataset variables include both raw (as originally collected) and derived values (e.g., converted into standard units, or computed on the basis of multiple values, such as an average). In SDTM only the name, label, and type are listed with a set of CDISC guidelines that provide a general description for each variable used by a general observation class.

Comments are included as necessary according to the needs of individual studies. The presence of an asterisk (*) in the 'Controlled Terms or Format' column indicates that a discrete set of values (controlled terminology) is expected to be made available for this variable. This set of values may be sponsor-defined in cases where standard vocabularies have not yet been defined (represented by a single *) or from an external published source such as MedDRA (represented by **).

Special-purpose domains

The CDISC Version 3.x Submission Data Domain Models include special-purpose domains with a specific structure and cannot be extended with any additional qualifier or timing variables other than those specified.

Additional fixed structure, non-extensible special-purpose domains are discussed in the Trial Design model.

The general domain classes

Most observations collected during the study (other than those represented in special purpose domains) should be divided among three general observation classes: Interventions, Events, or Findings:

In most cases, the identification of the general class appropriate to a specific collection of data by topicality is straightforward. Often the Findings general class is the best choice for general observational data collected as measurements or responses to questions. In cases when the topicality may not be as clear, the choice of class may be based more on the scientific intent of the protocol or analysis plan or the data structure.

All datasets based on any of the general observation classes share a set of common Identifier variables and Timing variables. Three general rules apply when determining which variables to include in a domain:

The CDISC standard domain models (SDTMIG 3.1.2 and SENDIG 3.0)

Special-Purpose Domains:

Interventions:

Events:

Findings:

Trial Design Domains:

Special-Purpose Relationship Datasets:

Limitations and Criticism of standards

One criticism of the SDTM standards is that they are continually changing, with new versions released frequently. CDISC claims that SDTM standards are backward compatible. But the claim is unreliable. It is not possible to map the data from EDC DBMS to SDTM standards until the clinical trial completes. New domains, for example the exposure as collected (EC) domain, were added recently. However, backward compatibility with earlier domains is not always possible. [1] The standards are not reliable, and well evolved. The controlled terminology is very small subset of National Cancer institute terminology. [2]

According to the CDISC SDS Team, CDISC standards evolve, as any standard does. The SDTM is the model, the implementation guides provide guidance in implementing the model for specific use cases. The initial publications of the SDTM and the SDTMIG (SDTM Implementation Guide for Human Clinical Trials) published in 2004 were not complete by any means, they could not provide guidance for every possible type of data collected in all human clinical trials. Instead they provided a general model (the SDTM) and an implementation guide (the SDTMIG) for the majority of data seen in most clinical trials, and have been evolving ever since to cover more and more use cases. The SDTM is a general class model, and hopefully can be applied to all data that we collect, but the implementation guides still as yet do not cover all use cases. Sponsors with use cases that are not covered yet must apply the principles of the SDTM in creating what we refer to as "custom" domains. Over time, as more and more use cases are represented in the implementation guides, these use cases become covered by what we call "standard" domains ("standard" = published in an implementation guide, "custom" = not published yet). It would be remiss if CDISC did not continue to enhance the standards to cover these additional use cases, and I am sure all would agree it would have been an impossible task in 2004 (or now for that matter) to wait to publish anything until every use case is covered. So the issue of backward compatibility is more complicated than described in the preceding paragraph. True, there are certainly model changes that may not be backward compatible, but new domains are a different issue. Before a domain is published (standard), sponsors still must submit this data, so they create custom domains. Different sponsors may map the same data differently, and even call their custom domains something different. This facilitates the today for their specific use cases and submissions, but does nothing to help the tomorrow, where all sponsors should submit the data the same way. True, when a new standard domain is published, sponsors should rework their custom domains that have that type of data, but they are also free to continue to use their custom domains in the versions of the standards they were working under. But it of-course may be beneficial to the sponsor to remap to the standard domain, since this is now the "standard" way all sponsors would submit, and benefit from improved reviewability by regulators. So yes, sponsors should transform their custom domains to standard when standard are published, but this is an improvement for all, and should not be looked at as a limitation. ... The EC (Exposure as Collected) domain that the author above mentions is an excellent example of a domain that makes it easier, not harder to submit. The EX (Exposure) domain is, and has always been, supposed to be submitted in the Protocol Specified Unit and in the most reviewable manner possible. This means that if a specific drug administration is collected in mg, but the dosage is prescribed (and likely eventually labeled) in mg/kg, then the EX data should be represented in mg/kg, which means the collected data should be converted to the Protocol Specified Unit (mg/kg in this case). EC allows the source data collected in mg to be represented, and summarized (for lack of a better word) into an EX record in mg/kg. This maintains a clear audit-trail of collected data to the EX summarized (again, for lack of a better word) data. Additionally, what if a single drug administration is given in multiple injections, in-order to maintain a dosage blind, or perhaps because the dose is better applied for safety reasons to multiple areas of the body. The collected data would likely represent each injection, when really what a reviewer wants to see if the summarized dosage across the multiple injections. EC allows each injection to be represented, and EX is again the summarized administration, which is likely what the Protocol represents and the eventual product labeling. So, yes, EC represents a change, but a positive change for something that sponsors could not do within the model before. So in summary, while the statements the author of the previous section make are true, in this authors opinion they are incomplete and misleading. Hope this helps shed some light on why the CDISC standard must evolve, and are evolving in a mutually beneficial way, for regulators and industry.

References

  1. Phuse, 2011. "SDTM Implementation Guide – Clear as Mud" (PDF). lex jansen. PHUSE. Retrieved 17 December 2015.
  2. CDISC, Terminology. "National Cancer Institute". cancer.gov. NIH. Retrieved 17 December 2015.

See also

This article is issued from Wikipedia - version of the 9/15/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.