Salary is $180,000 to $210,000 + bonus + stock options
Skilled Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a strong background in cloud infrastructure, containerization, and software development, with a focus on enhancing system reliability, performance, and scalability. You will work closely with development and operations teams to build and maintain a robust infrastructure that supports our applications and services.
Key Responsibilities:
- Design, implement, and manage scalable cloud infrastructure on AWS.
- Utilize Docker for containerization and Kubernetes for orchestration of containerized applications.
- Develop and manage infrastructure as code using AWS CloudFormation.
- Monitor system performance, reliability, and availability; troubleshoot and resolve issues as they arise.
- Collaborate with development teams to optimize application performance and reliability.
- Implement and maintain CI/CD pipelines to streamline deployment processes.
- Work with NoSQL databases, specifically DynamoDB, to manage and optimize data storage solutions.
- Write and maintain scripts and automation tools in Python, Ruby, or Java to improve operational efficiency.
- Participate in on-call rotation to ensure high availability of services.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.
- 10+ years of experience in a Site Reliability Engineering or DevOps role.
- Proficiency in Docker and Kubernetes, with hands-on experience in container orchestration.
- Strong experience with AWS services, particularly EC2, S3, Lambda, and CloudFormation.
- Familiarity with NoSQL databases, especially DynamoDB.
- Programming skills in Python, Ruby, or Java.
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
- Strong problem-solving skills and a proactive mindset.
- Excellent communication and teamwork abilities.