1. Project Overview
Clinical Biomarker Data Reconciliation Analysis
This project analyzes a simulated clinical biomarker dataset to identify missing samples, shipment delays, and inconsistencies between subject visit records and laboratory sample tracking data.
The analysis demonstrates how SQL queries and data visualization tools can be used to investigate operational issues commonly encountered in biomarker-driven clinical trials.
Tools Used: SQL, Excel, Power BI, Data Cleaning, Data Visualization
2. Business Context
In clinical trials, biospecimen data is often tracked across multiple systems including laboratory LIMS systems, shipment manifests, and sponsor-provided datasets. Discrepancies between these sources can lead to missing samples, incorrect subject visit mapping, or delays in biomarker analysis.
This project demonstrates how SQL queries can be used to investigate these discrepancies and generate operational metrics that help clinical teams monitor sample completeness and laboratory performance.
3. Dataset Description
The dataset represents a simplified clinical trial biomarker sample tracking system. Each row corresponds to a biospecimen sample associated with a subject visit at a clinical site.
The data includes key operational fields used to monitor sample collection, shipment logistics, laboratory receipt, and biomarker result reporting during a clinical study.
Key fields include:
• sample_id – Unique identifier for each biospecimen sample
• subject_id – Clinical trial subject identifier
• site_id – Clinical site where the sample was collected
• visit_number / visit_date – Scheduled study visit when the sample was expected
• collection_date – Date the biospecimen was collected at the site
• ship_date – Date the sample shipment was sent to the laboratory
• ship_delay_days – Calculated delay between collection and shipment
• temp_excursion_flag – Indicates whether the sample experienced temperature deviation during transport
• protocol_deviation_flag – Indicates whether the sample handling deviated from protocol requirements
• lab_received_date – Date the central laboratory received the sample
• lab_result_date – Date the biomarker result was reported
• lab_tat_days – Laboratory turnaround time in days
• sample_status – Final status of the sample
4. SQL Analysis
Query 1: Core Operational Metrics.
This query calculates core study-level operational metrics including expected samples, missed collection rate, late shipment rate, and temperature excursion rate.
Query 2: Breakdown by Site.
This query summarizes total samples, missing collections, and temperature excursion rates by site to identify operational risk areas.
Query 3: Turn Around Time (TAT) by Site.
This query calculates average laboratory turnaround time and the percentage of samples exceeding seven days by site.
5. Operational Dashboard (Power BI)
Power BI dashboards were created to summarize operational performance across clinical trial sites, including sample collection quality, shipment timelines, and protocol adherence.
6. Key Findings
Sample Collection Completeness
The analysis identified that 8.3% of expected biomarker samples were missing across the dataset. Site-level analysis showed variation in collection performance, with some sites exceeding 12% missing sample rates, suggesting inconsistent sample collection procedures.
Shipment Turnaround Delays
Approximately 43% of samples exceeded a 7-day laboratory turnaround time, indicating potential delays in sample shipment or laboratory processing. Site-level variation suggests that logistics performance may differ across clinical sites.
Temperature Excursions
Temperature excursions occurred in 5.3% of shipments, indicating that a subset of samples may have experienced environmental conditions outside protocol requirements during transport.
Protocol Deviations
Protocol deviations were observed in approximately 18% of samples, with certain sites demonstrating higher deviation rates. This may indicate site-level differences in protocol adherence or operational training.
Operational Monitoring Value
These findings demonstrate how SQL-based analysis and operational dashboards can be used to monitor study execution, identify high-risk sites, and support proactive investigation of sample handling or shipment issues during a clinical trial.