Title: Site Reliability Engineer - ON PREM
Company: NVIDIA
Location: HYBRID in Santa Clara
Pay: $50-$65/HR, depending on experience
Interviews: two 45-minute phone screens
Hours: 8-5, oncall from 8-8, but very rare to have to come onsite after off hours
Duration: 6 month contract likely extensions, can convert to a full time employee
Job Description:
NVIDIA is looking for a seasoned SRE to join its multifaceted and fast-paced Infrastructure, Planning and Processes organization. They are an ON PREM data center. It is extremely important to have experience with Linux operating systems and an understanding of Kubernetes as well. You will be working on systems deployed in NVIDIA's internal cloud making them available and reliable for our end users. This person will also monitor system performance and troubleshoot issues related to CPU, memory, disk, and network utilization. This team manages roughly 7 data centers with a little less than 20,000 machines used internally by NVIDIA employees. An example of an issue that this SRE will assist in includes when the machines have a "boot order" and the sequence to start the machine changes from the original order after being used by a certain user. Prior to being refreshed, this SRE will assist in remotely restoring the boot order, developing automation to restore it, and more.
Required Skills:
- 4+ years of experience with SRE systems admin knowledge
- Familiarity with kubernetes
- Proficiency in linux systems (and preferably windows as well)
- Strong understanding of Ansible for configurations and running playbooks
- On premise data center experience - this is not a cloud environment
- Virtualization experience Kibana, Grafana, Splunk etc.
- Experience with BMC (Redfish), KVM, and IPMI tools.
- Ability to run automated tests with code (python, bash, etc.,)
Pluses:
- Windows server infrastructure
- Open Stack experience (mySQL, Prometheus, Jenkins, etc.,)
Exact compensation may vary based on several factors, including skills, experience, and education.
Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.