Site Reliability Engineer

Tandym Group • charlotte, nc, us • 2d ago

Tandym Group is seeking a Site Reliability Engineer to support a financial client based in Charlotte.
Responsibilities:

Run the production environment by monitoring availability and taking a holistic view of system health
Support the applications with OnCall rotation support.
Provide stability to our applications and facilitates rapid feature development by taking active control on direction of the service and be proactive
Automate and eliminate manual work and look for opportunities for automation
Maintain and implement the SLO implementation adoption and automation
Production Readiness/Health Scoring & Error Budget Tracking
Runbook standards, maintenance, and updates

Qualifications:

Experience using DevOps tools and technologies such as GitLab, and Infrastructure as Code tools such as Terraform
Strong troubleshooting skills and building and enhancing the observability using monitoring tools
Proactive approach to Observability maturity, identifying problems, performance bottlenecks, and areas for improvement for observability
Leading incident response and supporting application teams.
Blameless postmortems Developer feedback for enhanced logging, runbooks and addressing technical debt.
Promoting observability best practices Experience in monitoring tools Dynatrace & Splunk Experience in public cloud platforms, preferably AWS and Api gateways
Experience developing API or Microservices or Frontend is a plus
Experience using source version control (SVC) such as Git

Desired Skills