As a Senior Site Reliability Engineer (SRE) at Striveworks, you will be challenged—and trusted—on day one to take ownership of specific product deployments by maintaining, optimizing, and enhancing our on-premises and cloud computing environments. You will play a crucial role in the successful deployment of our software solutions to clients. You will be responsible for executing technical aspects of implementation projects and for ensuring the seamless integration, customization, and configuration of our software. Your expertise will play a critical role for the company as we deploy new instances of Striveworks’ machine learning operations (MLOps) capabilities to customer infrastructure.
You are right for this opportunity if you value and possess technical expertise and you enjoy pushing the boundaries of your capabilities. You will be responsible for maintaining Striveworks’ software deployments both on-premises and with various cloud service providers, using Infrastructure-as-Code (IaC) methodologies.
Your day-to-day will include:
- Automating IaC to stand-up virtual machines and deploying containers, services, and other infrastructure; leaning on expertise to deploy custom Kubernetes clusters in AWS, Azure, GCP, on-premises, or hybrid cloud environments
- Working with platform developers and DevOps to define requirements and build solutions for customer use cases of the platform
- Software deployments to unclassified, CUI, Secret, and Top Secret DOD networks
- Incident response and initial triage of critical system faults
The Senior SRE works on the DevOps team and acts as a liaison between DevOps, platform developers, and professional services teams, taking on operational tasks to ensure the efficient functioning of Striveworks’ customer solutions. The Senior SRE monitors, automates, and improves software reliability, performance, and availability, which supports the IT needs for various projects. They work alongside a team of software engineers and data scientists to help them deploy and operate their work as functional products, learning from them so that building effective AI solutions becomes second nature. They may provide guidance and leadership to junior SRE team members.
You will directly contribute to the success of mission-critical systems within national security and commercial clients. You will be expected to wear multiple hats and to step into vacuums where improvements are needed, and you will be given the breadth to explore new technologies and solutions.
The anticipated base pay range for this position is $175,000 to $220,000/year. Striveworks’ total compensation package includes a competitive base salary, annual performance-based equity grants, and a lucrative yearly cash bonus.
This position offers a hybrid work environment (~30% remote and ~70% on site) but demands proximity to a DOD Sensitive Compartmented Information Facility (SCIF) in Pinehurst, NC, or Tampa, FL.
The Right Fit
During our hiring process, we spend a lot of time discussing shared values.
Why? We passionately believe that fostering an environment where people can self-actualize and pursue greatness is the best way to achieve our individual and collective goals.
What does this mean for you? We want to create an environment where you can thrive and achieve your goals, where you know the team shares your goals, and where you make and accept decisions for the team with humility. At Striveworks, we want your say/do ratio to be 1, and we want you to know that being part of a top-tier team means that there is no smartest person in the room. If that makes sense, we’re already on the same page.
Here’s what we’re looking for:
- 6+ years of direct, hands-on experience in:
- Microservice deployment in Kubernetes
- Diagnosing and resolving issues within containerized environments
- Helm Chart and Kustomizations development/deployment
- Python and Bash programming
- Automation and IaC (e.g., Terraform, Ansible)
- Cloud infrastructure (e.g., AWS, Azure, GCP, or OpenStack)
- Managing and troubleshooting Linux systems (e.g., RHEL, Ubuntu, Centos)
- Software deployments to on-premises and cloud-based unclassified, CUI, Secret, and Top Secret networks within the DoD
- The ability to work cross-functionally with platform developers to define requirements and build solutions for customer use cases of the platform
- The ability to respond professionally and competently to incident reports and triage of critical system faults
- Active Top Secret security clearance and intimate familiarity with DOD networking, tools, infrastructure, security requirements, and policies
The Wish List
We are very interested in candidates who possess the above qualifications, and we appreciate and consider the addition of:
- Experience deploying, maintaining, or contributing to CNCF projects
- Proficiency with US federal information system security policies, including Security Technical Implementation Guides (STIGs), NIST 800-171, NIST 800-53, CMMC, ICD 503
- Experience with DevSecOps/DevOps and CI/CD for the administration and deployment of GPU-enabled servers
- Experience with network-attached storage (NAS) and storage area network (SAN) technologies
- Deep knowledge of DevOps principles and practices for deploying and managing service mesh in cloud environments
- Experience with both blue-green and Canary deployment strategies
- Experience designing, managing, and optimizing workloads across multiple cloud providers
- Experience with Kubernetes and cloud-native applications and services in denied, disrupted, intermittent, and limited (DDIL) impact environments
- DOD 8570 IAT II certification (Security+ CE); proficient with security automation and familiarity with API security, container security, and cloud security
The Benefits
- Top-of-market salary and total compensation
- Generous equity plan
- Health/vision/dental insurance
- Unlimited PTO
- Parental leave
Build, Deploy, and Maintain AI for an Unpredictable World
AI is driving a new Industrial Revolution. But most AI tools only work when the world looks the same tomorrow as it did yesterday. That’s rarely the case.
Striveworks was formed to fix this problem. Our platform lets teams build AI models, deploy them into unpredictable environments, and watch them deliver trustworthy results—day after day. Our approach has transformed AI outcomes for organizations where failure is never an option. As a result, Striveworks was recognized as an exemplar in the National Security Commission on Artificial Intelligence Final Report.
In 2023, Striveworks placed on the Deloitte Technology Fast 500 list as one of the fastest-growing technology companies in North America. In 2024, Striveworks was honored with a Built In Best Places to Work award—for the third year running.
Striveworks is an Equal Opportunity Employer and does not discriminate in employment on the basis of race, color, religion, belief, sex (including pregnancy and gender identity or expression), national origin, social or ethnic origin, political affiliation, sexual orientation, marital status, disability, genetic information, age, membership in an employee organization, retaliation, parental status, military service, or other non-merit factors. Striveworks will not tolerate discrimination or harassment of any kind.
If you require assistance or a reasonable accommodation in the application process, please contact Operations at hr@striveworks.us.
In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete an employment eligibility verification form upon hire.
Striveworks is a participating employer in the E-Verify program.