Summary
As the Delivery Lead, you will be the driving force behind building and maintaining robust site reliability engineering (SRE) functions for our growing data organization. You will act as the primary point of contact, ensuring smooth operations and high availability of our data infrastructure and services. This role demands a blend of technical expertise, project management skills, and a passion for data-driven solutions.
Responsibilities
Build and Lead: Develop and implement comprehensive support and SRE processes from the ground up.
Cross-functional Collaboration: Partner with data engineers, scientists, and IT teams to ensure seamless integration and optimal performance of data systems.
Incident Management: Own incident response, troubleshooting, and resolution, minimizing downtime and impact on data operations.
Monitoring and Optimization: Establish proactive monitoring and alerting mechanisms to identify and address potential issues before they escalate.
Performance Tuning: Continuously optimize system performance, scalability, and reliability to meet the evolving needs of the data organization.
Automation: Drive automation initiatives to streamline support workflows and improve efficiency.
Documentation: Maintain thorough documentation of processes, procedures, and system configurations.
Qualifications
Proven experience in building and managing support and SRE functions in a data-centric environment.
Strong experience with storage technologies such as NFS, HDFS and Amazon S3, as well as dynamic resource management frameworks. (Kubernetes)
Strong understanding of data infrastructure, cloud technologies, and DevOps principles.
Excellent communication and interpersonal skills to collaborate effectively with diverse teams.
Demonstrated ability to lead and execute complex projects with minimal supervision.
Proactive problem-solver with a passion for continuous improvement.
Experience in the biotech industry is a plus.