Data Scientist (Healthcare) 100% Remote
What you will do:
The Data Scientist will be responsible for driving insights from the vast amounts of patient and environmental data available within our data warehouse.
- Experience with machine learning and statistical analyses are needed.
- Work closely with researcher teams to design analysis specifications, including input data specifications, data cleaning, algorithms, and interpretation of results.
- Develop and implement algorithms on existing data warehouse records and identify new external data sources to be ingested to the data warehouse to strengthen analyses.
- Analysis will address a wide variety of clinical and research outcomes.
- Research and implement AI algorithms, apply off-the-shelf AI and data-centric tools, and collect, store, and maintain data.
- The successful candidate will have demonstrated competence in developing highly scalable artificial intelligence systems with multiple dependencies across teams.
What gets you the job:
Programming Languages
- Python (for preprocessing, data analysis, machine learning, scripting)
- SQL (for database querying and management)
- SAS (common in healthcare data analysis)
- MATLAB (for algorithm development, though less common in healthcare)
- R (for statistical computing and bioinformatics)
Data Science & Machine Learning Frameworks
- TensorFlow, PyTorch, Keras (for deep learning and complex machine learning, including neural networks and advanced AI)
- Scikit-Learn (for classical machine learning)
- XGBoost or LightGBM (for gradient boosting in structured data)
- Large Language Models (LLMs) (for text generation, summarization, etc.)
- AWS SageMaker (for end-to-end machine learning development, training and scalable machine learning in a managed cloud enviornment)
- Natural Language Processing (NLP) Tools and Frameworks (e.g., Hugging Face, AWS Comprehend Medical for extracting insights from clinical text data)
- AWS Bedrock (for accessing pre-trained LLMs and foundation models without managing infrastructure)
Healthcare-Specific Knowledge
- HL7 (Health Level Seven International standards for electronic health information exchange)
- FHIR (Fast Healthcare Interoperability Resources standard for exchanging healthcare information electronically)
- ICD-10 Coding (for medical diagnosis and procedure classification)
- HIPAA Compliance (handling sensitive patient data securely)
- Clinical Terminologies (e.g., SNOMED, LOINC)
Data Tools & Platforms
- SQL-based Databases (e.g., PostgreSQL, MySQL, Microsoft SQL Server)
- Data Warehousing (e.g., AWS Redshift, Google BigQuery)
- Data Visualization Tools (e.g., Tableau, Power BI, Plotly)
- NoSQL Databases (e.g., MongoDB, Cassandra)
- Apache Hadoop or Apache Spark (for big data processing)
- ETL Tools (e.g., Informatica, Talend, AWS Glue)
Statistical & Analytical Techniques
- Regression Analysis (linear, logistic, multinomial, ordinal, etc.)
- Descriptive Statistical tests (correlaton, covariance, chi-square, univariate, multivariate analyses)
- Clustering Techniques (e.g., k-means, hierarchical clustering etc)
- Dimensionality Reduction (e.g., Principal Component Analysis (PCA),
- Natural Language Processing (NLP) (for analyzing clinical notes or electronic health records)
- Predictive Modeling (for patient outcomes, risk analysis)
- Time Series Analysis (useful for patient monitoring, trend analysis)
- Survival Analysis (for patient outcome predictions)
- A/B Testing (for clinical trials or health interventions)
Cloud Computing & DevOps Skills
- AWS, Google Cloud, or Azure (cloud platforms for scalable computing)
- AWS Lambda and Step Functions (serverless computing and workflow automation)
- Docker, Kubernetes, ECS, EKS or AWS Fargate (for containerization and orchestration of data applications and reproducibility)
- CI/CD Pipelines (for automating deployment and monitoring of machine learning models)
Electronic Health Records (EHR) Systems
- Experience with Epic or Cerner (popular EHR systems in healthcare)
Data Governance & Security
- Data Privacy (understanding of privacy laws such as HIPAA, GDPR)
- Data Anonymization or De-identification techniques (for research and compliance)
- Auditing & Compliance Tools (for ensuring secure and compliant data handling)
- Responsible AI Practices (AI governance frameworks ensuring healthcare regulations and ethical standards)
\Bachelors Degree computer science, artificial intelligence, informatics or closely related field
Masters preferred