学习笔记_SDTM(3). Assumptions for Domain Models_sdtm dm specification-程序员宅基地

技术标签: 学习  SDTM  

1. General Domain Assumptions

1. 1 Order of the Variables

The order of variables in the Define-XML document must reflect the order of variables in the dataset. The order of variables in CDISC domain models has been chosen to facilitate the review of the models and application of the models.

  • Variables for the 3 general observation classes must be ordered with Identifiers -Topic - Qualifier - Timing variables.
  • Within each role, variables must be ordered following instruction.

1.2 SDTM Core Designations

Three categories are specified in the Core column in the domain models:

  • A Required variable is any variable that is basic to the identification of a data record (i.e., essential key variables and a topic variable) or is necessary to make the record meaningful. Required variables must always be included in the dataset and cannot be null for any record. 必须有非空值
  • An Expected variable is any variable necessary to make a record useful in the context of a specific domain. Expected variables may contain some null values, but in most cases will not contain null values for every record. When the study does not include the data item for an expected variable, however, a null column must still be included in the dataset, and a comment must be included in the Define-XML document to state that the study does not include the data item.必须有,可为空值
  • A Permissible variable should be used in an SDTM dataset wherever appropriate. Although domain specification tables list only some of the identifier, timing, and general observation class variables listed in the SDTM, all are permissible unless specifically restricted. 若无特殊规定可以有

1.3 Additional Guidance on Dataset Naming

SDTM datasets are normally named to be consistent with the domain code.

e.g., the Demographics dataset (DM) is named dm.xpt.

Domain codes beginning with the letters X, Y, and Z have been reserved for the creation of custom domains (optional, and not required for custom domains).

1.4 Splitting Domains

Sponsors may choose to split a domain of topically related information into physically separate datasets.

  • A domain based on a general observation class may be split according to values in --CAT. When a domain is split on --CAT, --CAT must not be null.
  • The Findings About (FA) domain may alternatively be split based on the domain of the value in --OBJ.

The following rules must be adhered to when splitting a domain into separate datasets to ensure they can be appended back into 1 domain dataset:

  1. The value of DOMAIN must be consistent across the separate datasets as it would have been if they had not been split (e.g., QS, FA). 域名一致
  2. All variables that require a domain prefix (e.g., --TESTCD, --LOC) must use the value of DOMAIN as the prefix value (e.g., QS, FA).使用domain的值作为前缀
  3. –SEQ must be unique within USUBJID for all records across all the split datasets. 唯一–SEQ
  4. When relationship datasets (e.g., SUPPxx, FAxx, CO, RELREC) relate back to split parent domains, IDVAR would generally be --SEQ. When IDVAR is a value other than --SEQ (e.g., --GRPID, --REFID, – SPID), care should be used to ensure that the parent records across the split datasets have unique values for the variable specified in IDVAR, so that related children records do not accidentally join back to incorrect parent records.
  5. Permissible variables included in one split dataset need not be included in all split datasets.
  6. For domains with 2-letter domain codes (i.e., other than SUPPxx and RELREC), split dataset names can be up to 4 characters in length. The 4-character dataset-name limitation allows the use of a Supplemental Qualifier dataset associated with the split dataset.

If splitting by --CAT, dataset names would be the domain name plus up to 2 additional characters (e.g., QS36 for SF-36).
If splitting Findings About by parent domain, then the dataset name would be the domain code, “FA”, plus the 2-character domain code for parent domain code (e.g., “FACM”).

  1. Supplemental Qualifier datasets for split domains would also be split. The nomenclature would include the additional 1 to 2 characters used to identify the split dataset (e.g., SUPPQS36, SUPPFACM). The value of RDOMAIN in the SUPP-- datasets would be the 2-character domain code (e.g., QS, FA).
  2. In RELREC, if a dataset-level relationship is defined for a split Findings About domain, then RDOMAIN may contain the 4-character dataset name, rather than the domain name “FA”.

1.5 Origin Metadata

1.5.1 Origin Metadata for Variables

The origin element in the Define-XML document file is used to indicate where the data originated. Its purpose is to unambiguously communicate to the reviewer the origin of the data source. data could be collected (on the CRF, from a vendor, or from a device), derived, or assigned.

  • CRF data should be traceable to an annotated CRF
  • Derived data should be traceable to some derivation algorithm

1.5.2 Origin Metadata for Records

A derived origin means that all values for that variable were derived, and
that collected on the CRF applies to all values as well. 一列数据来源应一致。
In some cases, both collected and derived values may be reported in the same field. For example, some records in a Findings dataset such as Questionnaires (QS) contain values collected from the CRF; other records may contain derived values, such as a total score. When both derived and collected values are reported in a variable, the origin is to be described using value-level metadata in the Define- XML document. 若不一致,在Define- XML中说明。

1.6 Assigning Natural Keys in the Metadata

A sponsor should include in the metadata the variables that contribute to the natural key for a domain. In a case where a dataset includes a mix of records with different natural keys, the natural key that provides the most granularity is the one that should be provided.

2. General Variable Assumptions

2.1 Variable-naming Conventions

SDTM variables are named according to a set of conventions, using fragment names.
Variables with names ending in “CD” are “short” versions of associated variables that do not include the “CD” suffix

e.g., --TESTCD is the short version of --TEST.

Values of –TESTCD must be limited to 8 characters and cannot start with a number, nor can they contain characters other than letters, numbers, or underscores.

Because QNAM serves the same purpose as --TESTCD within supplemental qualifier datasets, values of QNAM are subject to the same restrictions as values of --TESTCD.

Values of other “CD” variables are not subject to the same restrictions as --TESTCD:

  • ETCD (the companion to ELEMENT) and TSPARMCD (the companion to TSPARM) are limited to 8 characters and do not have the character restrictions that apply to --TESTCD.
  • ARMCD/ACTARMCD is limited to 20 characters and does not have the character restrictions that apply to --TESTCD.

Variable descriptive names (labels), up to 40 characters, should be provided as data variable labels for all variables.

2.2 Two-character Domain Identifier

In order to minimize the risk of difficulty when merging/joining domains for reporting purposes, the 2-character domain identifier is used as a prefix in most variable names.

Exceptions:

  • Required Identifiers (STUDYID, DOMAIN, USUBJID)
  • Commonly used grouping and merge keys (e.g., VISIT, VISITNUM, VISITDY)
  • All Demographics (DM) domain variables other than DMDTC and DMDY
  • All variables in RELREC and SUPPQUAL, and some variables in the Comments and Trial Design datasets

Required identifiers are not prefixed because they are usually used as keys when merging/joining observations. The --SEQ and the optional Identifiers --GRPID and --REFID are prefixed because they may be used as keys when relating observations across domains.

2.3 Use of “Subject” and USUBJID

"Subject" is used to generically refer to both patients and healthy volunteers in order to be consistent with the recommendation in FDA guidance.
To identify a subject uniquely across all studies for all applications or submissions involving the product, a unique identifier (USUBJID) should be assigned and included in all datasets.

  • The unique subject identifier (USUBJID) is required in all datasets containing subject-level data.
  • USUBJID values must be unique for each trial participant (subject) across all trials in the submission.
  • The same person who participates in multiple clinical trials (when this is known) must be assigned the same USUBJID value in all trials.

2.4 Variable Lengths

The maximum SAS v5 transport file character variable length of 200 characters should not be used unless necessary.
Sponsors should consider the nature of the data and apply reasonable, appropriate lengths to variables.

For example:

  • The length of flags will always be 1.
  • –TESTCD and IDVAR will never be more than 8, so the length can always be set to 8.
  • The length for variables that use controlled terminology can be set to the length of the longest term.

3. Coding and Controlled Terminology Assumptions

4. Actual and Relative Time Assumptions

4.1 Formats for Date/Time Variables:

ISO 8601

Spaces are not allowed in any ISO 8601 representations.

  • dates: YYYY-MM-DD.
  • times: hh:mm:ss(.n+)?(((+|-)hh:mm)|Z)?.
  • date & time: YYYY-MM-DDThh:mm:ss (e.g. 2001-12-26T00:00:01).

5. Other Assumptions

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/Mandy_Wang98/article/details/122959894

智能推荐

Eclipse 插件精选介绍 -程序员宅基地

文章浏览阅读91次。Eclipse插件之EasyExplorer http://dev2dev.bea.com.cn/bbsdoc/20060124184.html 能够在 Eclipse 里面打开资源管理器来浏览当前选中的文件.可以在选项里面输入命令:explorer.exe /n,/e,{0}来打开一个显示树和文件夹内容的浏览窗口. 通过对以上explorer.exe的参数分析,我们可能会有个希..._eclipse插件介绍

牛客真题(35)-青草游戏_给出m个草的点燃位置,在第c秒点燃,最后有几种食物烤熟-程序员宅基地

文章浏览阅读155次。今天继续刷牛客真题,青草游戏分析:找规律,通过规律,判断游戏的输赢。问题:1、Python中if else 的简写方式;方式C++代码:#include<iostream>using namespace std;int main(){ int n; cin>>n; int a[n]; for(int i=0;i<n;i..._给出m个草的点燃位置,在第c秒点燃,最后有几种食物烤熟

ASP.NET MVC Model验证(一)_asp.net viewmodel 验证-程序员宅基地

文章浏览阅读1.2k次。前面对于Model绑定部分作了大概的介绍,从这章开始就进入Model验证部分了,这个实际上是一个系列的Model的绑定往往都是伴随着验证的。也会在后面的篇幅中讲解MVC框架中Model验证的机制,以及一些Model验证的方式讲解,本章只是一个简单的示例篇幅,对于有基础的朋友可以直接跳过了(不能耽误大家时间)。_asp.net viewmodel 验证

内核驱动 (五)看门狗_ida_simple_get-程序员宅基地

文章浏览阅读1.6k次。尽管在linux系统中,对于S3C2440开发板来说,默认是已经配置了看门狗定时器,如:Device Drivers ---&gt; [*] Watchdog Timer Support ---&gt; [*] Disable watchdog shutdown on close (NEW) //如果选中, 用户一旦 open 看门狗..._ida_simple_get

iPhone/iOS图片相关(读取、保存、绘制、其它相关)_ios cglayergetcontext-程序员宅基地

文章浏览阅读3.2w次,点赞3次,收藏26次。一.读取图片1.从资源(resource)读取UIImage* image=[UIImage imageNamed:@"1.jpg"];2.从网络读取NSURL *url=[NSURL URLWithString:@"http://www.sinaimg.cn/qc/photo_auto/chezhan/2012/50/00/15/80046_950.jpg"];UIImag_ios cglayergetcontext

某城市电话号码由三部分组成,分别是:      地区码—— 空白或三位数字;      前缀—— 非‘0’或‘1’开头的三位数字;      后缀—— 4位数字。_某城市电话号码由三部分组成,分别是: 地区码:空白或3位数字; 前缀:非0或1开头的3-程序员宅基地

文章浏览阅读1.3w次,点赞6次,收藏42次。软件测试技术(第二版)题目:某城市电话号码由三部分组成,分别是:地区码——空白或三位数字;前缀——非‘0’或‘1’开头的三位数字;后缀—— 4位数字。假定被测程序能接受一切符合上述规定的电话号码, 拒绝所有不符合规定的电话号码。要求:(1)请选择适当的黑盒测试方法,写出选择该方法的原因,并使用该方法的步骤,给出测试用例表。(2)如果所生成的测试用例不够全面,请考虑用别的测试方法生成一些补充的测试用例。..._某城市电话号码由三部分组成,分别是: 地区码:空白或3位数字; 前缀:非0或1开头的3

随便推点

【抽象代数概念速查】magic square-幻方_magic square线性代数-程序员宅基地

文章浏览阅读240次。如果对角线和反对角线的数加和也都等于magic number_magic square线性代数

CGLIB原理及实现机制学习_cglib methoddelegate-程序员宅基地

文章浏览阅读330次。什么是CGLIBCGLIB(Code Generator Library)是一个强大的、高性能的代码生成库。其被广泛应用于AOP框架(Spring、dynaop)中,用以提供方法拦截操作。Hibernate作为一个比较受欢迎的ORM框架,同样使用CGLIB来代理单端(多对一和一对一)关联(延迟提取集合使用的另一种机制)。为什么使用CGLIBCGLIB代理主要通过对字节码的操作,为对象引入间接级别,以控制对象的访问。我们知道Java中有一个动态代理也是做这个事情的,那我们为什么不直接使用Java动态_cglib methoddelegate

感恩过去,憧憬未来 | 数图2023年成长回顾-程序员宅基地

文章浏览阅读349次,点赞5次,收藏7次。龙腾千里,不忘初心,展望2024年,我们将继续秉持以客户为中心的服务理念,进一步提升产品质量,加大技术创新力度,以满足市场的不断变化和客户的需求,致力于为合作客户企业达成降本增效的良好成果。我们坚信,在全体员工的共同努力下,我们将在新的一年里取得更加丰硕的成果,为公司的未来发展奠定更加坚实的基础,也为我们在零售业和服务领域的广阔前景描绘更加美好的蓝图。数图在2023年取得了丰硕的成果,不仅在合作企业、荣誉奖项、员工规模等方面有了显著提升,还在质量管理、技术创新等方面展现了卓越的实力。岁月不居,时节如流。

JSON详解_json(json(appendix)->0)-程序员宅基地

文章浏览阅读294次。JSON的全称是”JavaScript Object Notation”,意思是JavaScript对象表示法,它是一种基于文本,独立于语言的轻量级数据交换格式。XML也是一种数据交换格式,为什么没有选择XML呢?因为XML虽然可以作为跨平台的数据交换格式,但是在JS(JavaScript的简写)中处理XML非常不方便,同时XML标记比数据多,增加了交换产生的流量,而JSON没有附加的任何标记,在_json(json(appendix)->0)

numpy 找到矩阵中值为nan的元素 numpy.isnan_numpy找出矩阵中nan值-程序员宅基地

文章浏览阅读4.4k次。numpy.isnan(a,axis=None,keepdims=no value)官方文档Test element-wise for NaN and return result as a boolean array.示例import numpy as npa = np.array([[1, 2], [3, np.nan]])print(np.isnan(a))结果[[False False] [False True]]_numpy找出矩阵中nan值

如何防止用一用户同时登陆-程序员宅基地

文章浏览阅读63次。记录用户登陆信息[登陆时间,用户ID]每隔一段时间(如5秒怎么样?)检查存储的登陆信息。如果发现多个,则比较各个的登陆时间,如果自己用的登陆时间比其他的早,则提示并注销。实现:1.把所有登陆用户存在这个全局Application中(建议把这个Application放在基类,方便调用):protected IList<IList> LoginInfo..._如何保证一个账号不允许同时登录