Case Study Banner

CASE STUDY

SAS to Open-source data migration for optimized PK/PD analysis in Oncology trials

industry-iconCLIENT :Confidential
industry-iconINDUSTRY :Pharmaceutical
industry-iconDURATION :4 months
CLIENT :
Confidential
INDUSTRY :
Pharmaceutical
DURATION :
4 months

Business case

A global pharmaceutical leader’s Oncology clinical trial department faced major challenges in transforming and standardizing clinical trial SAS datasets after acquiring another company. The incoming clinical trial SAS datasets needed to be accessed within the R environment (Posit Workbench) to create Analysis Ready Datasets (ARD).

Each dataset had to assemble a specific set of attributes from different domain datasets, which were being read and parsed in the R Posit environment. The structural inconsistencies within the acquired company’s domain datasets made it difficult to establish a unified approach for data merging and transformation. This variability, combined with the need to manually intervene across different study setups, led to a process that was labor-intensive, error-prone, and hard to scale.  Also, some of the Study Data Tabulation Model (SDTM) and Analysis Data Model (ADaM) datasets were a decade old and didn’t comply with Clinical Data Interchange Standards Consortium (CDISC) guidelines.

To meet internal standards, this client wanted the data to be restructured into PK/PD Analysis-Ready Data Sets (ARD). This process required complex data mapping, generation of custom attributes, and strict adherence to in-house standard operating practices.

Amid all concerns over timelines, quality, and compliance, there was a critical need for a scalable, automated solution to handle study variability and ensure data integrity.

Our solution

After carefully identifying the challenges, i2e worked within the Posit environment and wrote a semi-automatic R script to generate ARD files and created a mapping document, along with an automated script for quality control of the ARD files.

Our data engineering experts also took care of the data processing by  

  • Establishing a process in the Posit environment to retrieve and process data from different domain datasets like SDTM and ADaM.  
  • Creating a column mapping document based on ARD specifications set out by the acquiring company standards.  
  • Developing R scripts to automate the mapping of columns from source datasets and generate ARD files.  
  • Designing a programming plan document detailing NCA datafile programming requests, source data details, datafile specifications, merging algorithms, and programming notes.  

Challenges overcome

  • Understanding and applying the correct joining logic for different domain datasets.
  • Addressing differences in attributes across studies, including cyclic patient visits in some studies.  
  • Optimizing two-decade old legacy study data to meet Clinical Data Interchange Standards Consortium (CDISC) guidelines.
  • Achieving programmatic QC with a 20-point check list fulfillment condition

Benefits

  • A fully automated programmatic QC script that scans the analysis ready datasets (ARD) for data accuracy.
  • Optimized legacy study data to meet current data standards, enabling effective analysis, compliance, and storage.  
  • Migration from a proprietary data environment to open source eliminated licensing costs and simplified integrations. 

Results

Results