CHANDLER Arizona
Plano, TX
Rancher/Prometheus/Grafana experience with Kubernetes will get an interview
The manager wants heavy container experience...more than cloud
Lots of hires with this team
Must have skills:
- Container experience is the main focus
- Kubernetes, Rancher/Prometheus/Grafana
- Terraform - TFE
- Openshift
- AZURE
- Strong SRE experience
Interview process: Video call
1st round – technical with a panel
2nd round – technical with a panel
3rd – Hiring manager
Description:
- Responsible for reliability and support of Container Platform on-prem and external clouds (Azure /AWS /Google)
- Monitor and troubleshoot Container platform (Openshift), Rancher (RKE) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc.
- Perform deep dives into systemic and latent reliability issues, Incident management, problem management
- Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues.
- Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
- Responsible for application onboarding and provide troubleshooting support through the lifecycle of the applications on the container platform.
- Identify and drive opportunities to improve automation to reduce TOIL and improve operational excellence.
- Partner with risk, and compliance teams to bring visibility and implement right controls and remediation of vulnerabilities.
- Ensure resiliency during implementation and identify/fix resiliency problems by collaborating with engineering teams.
- Be a key stakeholder in the design of cloud services and work with Architecture, engineering, product teams
- Participate in 24x7 on-call coverage follow the sun model
- BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.
- Minimum 5+ years of hands-on experience supporting Kubernetes /Openshift / RKE / EKS Container platform.
- Experience with Python, Ansible, Golang, and shell scripting
- Kubernetes /Openshift /Terraform certifications are a plus
- Strong experience in major services related to Compute, Storage, Network and Security
- Experience with monitoring tools like Prometheus and Dynatrace, as well as cloud native tools like Azure Monitor and Log Analytics
- Strong understanding and background of working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and Ping Identity or other SSO solutions.
- Advanced knowledge of Linux OS, DNS, DHCP, Kerberos and Windows Authentication
- Experience with CI/CD tools git /Jenkins, GitOps model
- Excellent understanding of Linux /Windows operating systems administration
- Experience in Container security and vulnerability remediation.
- Systematic problem-solving approach, sense of ownership and drive
- Ability to juggle competing priorities and adapt to changes in project scope.
- Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
- Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.
- Experience in Openshift, RKE, CSP Kubernetes services such as AKS and EKS
- Experience in Terraform, ArgoCD, Tekton, and K-native technologies.
- Experience in agile deployment methodologies (GitOps)
- Knowledge of various container runtimes
- Familiarity with the operator deployment pattern.
- Experience working in a highly available multi-datacenter environment
- Experience working with monitoring tools such as Prometheus, Splunk, Dynatrace, Sysdig, or similar tools.
- Understanding of cost management, inventory management, FinOps model.