W2 ONLY - NO 3RD PARTIES - NO CORP TO CORP
We are seeking an experienced and highly motivated Performance & Observability Engineer who will be responsible for assessing, strategizing and building solutions for complex application and infrastructure observability needs to ensure production stability and visibility. Position will be responsible for driving performance and stability improvements of critical business applications, the architecture, and integrations to ensure optimal end user experience by working closely with Application development, Infrastructure, Database and middleware teams.
Responsibilities:
- Configure, maintain our applications and infrastructure’s observability capabilities in partnership with SRE, AIOps and Assess code, conf, or infra changes readiness for production
- Monitor and develop SLOs and SLIs through customer user journey; Advise on SLA; Establish error budgets
- Strategize, analyze and tune applications for Performance and availability via DevSecOps principles and technologies as needed to continuously (DevOps) validate Load, Stability, Scalability, and Reliability standards of the application are achieved.
- Define, Create, and maintain monitoring dashboards. Implement application performance management, log analytics, error analytics, business analytics solutions using APM tools, but not limited to Dynatrace, Splunk, Akamai etc.
- Responsible for monitoring application stability trends (cloud/on-Prem) and identifying opportunities to improve performance and/or availability
- Automate system scalability and continually work to improve system resiliency, performance and efficiency; Makes recommendations for design changes for improved reliability
- Working with Incident/ Problem manager and help drive blameless RCA, postmortems and RCRs by providing engineering and tool expertise
- Define, configure, review, assess and use Artificial Intelligence based analytics, and fine-tune it to make a good balance between faster automatic root cause analysis and false positives
- Develop next-gen, smart, automated monitoring & alerting solutions in software delivery and production environment. Implement observability and monitoring framework using tools/frameworks like Dynatrace, ServiceNow, Ansible, Splunk, Akamai, Open Telemetry and AWS monitoring capabilities CloudWatch etc
- Independently contribute toward finding game changer solutions using cutting-edge technologies and Experience in enabling next-gen alerting, problem-solving, and self-healing solutions for IT assets
Qualifications and Skillset:
- Position requires a Bachelor’s degree (or foreign equivalent) or Advanced degree in Information Technology, Computer Science Engineering, or a related field plus eight(8+) years of experience in the job offered or in a related occupation
- Experienced in Dynatrace, LoadRunner and Splunk for performance analysis, alerting and monitoring solutions implementation, while being proficient in performance testing process and automation
- Expertise in cloud-native technologies, including containerization, microservices architecture, and serverless computing
- Demonstrate strong analytical, problem solving, debugging, troubleshooting skills and identifying root cause of issues
- Proficient in utilizing and supporting APM tools in a large enterprise environment
- Proficient with application server instrumentation (Java), Real User Monitoring, Synthetic Monitoring
- Experience in the fields of Agile, DevOps, Site Reliability Engineering
- Strong knowledge in Event Management in ServiceNow preferred
- Exceptional communication and collaboration skills, enabling effective engagement with stakeholders at all levels of the organization
- Ability to coach team members and engineering teams
Certifications preferred:
- Dynatrace certified associate
- AWS Solution Architect - Associate
- Splunk core certified Power user
- Splunk core certified Advanced Power user
- Secondary AWS certification (optional)