Support the design, development and maintenance of data pipelines for processing Research and Development data from diverse sources (Clinical Trials, Medical Devices, Pre-Clinical, Omics, Real World Data) utilizing the AWS technology platform.
Create and optimize ETL/ELT processes for structured and unstructured data using Python, R, SQL, AWS services and other tools.
Build and maintain data repositories using AWS S3 and FSx technologies. Establish data warehousing solutions using Amazon Redshift.
Build and maintain standard data models.
Develop data quality frameworks, validation processes and KPIs to ensure accuracy and consistency of data pipelines.
Implement data versioning and lineage tracking to support data traceability, regulatory compliance and audit requirements.
Create and maintain documentation for data processes, architectures, and workflows.
Implement modern software development best practices (e.g. Code Versioning, DevOps, CD/CI).
Support collaboration with RnD Researchers, Data scientists and Stakeholders to understand data requirements and deliver appropriate solutions in a global working model.
Maintain compliance with data privacy regulations such as HIPAA, GDPR
May be required to develop, deliver or support data literacy training across R&D.
Qualifications
Bachelor’s Degree in Computer Science, Statistics, Mathematics, Life Sciences, or other relevant scientific fields; Master’s Degree preferred
3-5 years of experience in data engineering, with at least 1.5 years focusing on healthcare, research or clinical related data
Strong knowledge of data engineering tools such as Python, R and SQL for data processing.