Site Reliability Engineer

Themesoft Inc. • dallas, tx, us • 2m ago

SRE Engineer/ Dallas, TX Location / FTE / Hybrid Role.

Job Description:

The Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment.

The role

Monitor application performance, take steps to improve overall application performance and stability, and follow through with implementation.
Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually.
Able to troubleshoot issues handling OS, Networking, databases in a cloud-based environment/on-premises environment and handle live production incidents, debug/troubleshoot application, and infrastructure issues, follow and implement SRE best practices.
Coordinate with Product owners/business representatives to define Service Level Objectives and error budgets for key functionalities of the projects
Participate in design reviews of software/components with build teams to ensure that they are built right.
Review products prior to production deployments to validate compliance with Service level objectives
Conduct system analysis, and configuration management and develop improvements for system software performance, availability, and reliability.
Work closely with software engineers and QA to ensure the system is responding properly to non-functional requirements such as performance, security, and availability.
Document system knowledge as acquired over time, create runbooks and ensure critical system information is readily available to those who need it.
Maintain and monitor deployment of the servers, docker containers, databases, and general backend infrastructure.
Participate in production feedback sessions, problem management calls to identify opportunities for product improvement.

What you’ll bring

Bachelor’s Degree in Computer Science or related; or equivalent combination of education and experience
5+ years experience in full-stack application support/SRE role
Experience in JavaScript, Typescript and web development technologies
Proficient in scripting languages such as PowerShell and/or Python
Troubleshooting experience of complex application incidents built in AWS stack
Experience in conducting design reviews of software components and leading performance, capacity and chaos experiments.
Extensive Experience with observability platforms (Data dog) is required. Experience with built-in browser side diagnostic tools is expected.
Knowledge of DevOps methodologies and the tools involved such as CI/CD concepts, CI/CD tools (Jenkins, Code Pipeline, etc.), and automation and configuration tools (Puppet, Ansible, etc) a plus.
Hands on experience with AWS public cloud is a must, Project implementation experience on public cloud is a plus.
Ability and willingness to adapt to new application stacks and new technology concepts as the business evolves over time
Excellent communication skills, both verbal and written
Ability to collaborate with local and remote teams in different time zones
Ability to present/lead technical discussions with product, cloud COE, security and other support teams

Regards,

Purnima Pobbathy

Technical Recruiter

purnima@themesoft.com

Themesoft INC