Role Title: Site Reliability Engineer (SRE)
Project: Product Management Transformation
Location: Springfield, IL - remote
Type of contract: W2
Duration: 2 Years
Job Description:
We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, performance, and scalability of our Microsoft Dynamics 365 platform. In this role, you will optimize platform infrastructure, automate processes, monitor system health, and address potential issues before they impact users. You will work closely with development teams to maintain a reliable platform that aligns with business goals and delivers a smooth user experience.
This position is well-suited for an engineer with experience in both software development and systems engineering who enjoys working in a dynamic environment.
Job Responsibilities:
- Build and maintain scalable, reliable infrastructure for the Microsoft Dynamics 365 platform to ensure high performance and uptime.
- Implement monitoring, alerting, and logging tools to identify and resolve system issues proactively, minimizing downtime.
- Work with development teams to improve application performance and reliability by addressing bottlenecks and optimizing resource usage.
- Automate routine operations such as deployments, scaling, and patching, using tools like PowerShell, Azure DevOps, or Terraform.
- Assist in designing and implementing disaster recovery and failover strategies to ensure business continuity during critical failures.
- Troubleshoot and resolve issues related to platform performance, networking, security, and other infrastructure areas.
- Maintain CI/CD pipelines for Dynamics 365 development, ensuring smooth deployment of updates.
- Optimize cloud infrastructure in Microsoft Azure, including virtual machines, databases, and networking components, to support Dynamics 365.
- Ensure compliance with regulations such as HIPAA and GDPR by implementing appropriate security measures.
- Contribute to capacity planning and performance tuning to support future growth and scalability of the platform.
Required Skills & Qualifications:
- 3-5 years of experience as a Site Reliability Engineer (SRE) or DevOps Engineer, with experience managing cloud-based platforms, ideally Microsoft Dynamics 365.
- Proficiency in cloud environments, particularly Microsoft Azure, and experience managing resources like virtual machines, databases, networking, and storage.
- Experience with automation tools such as PowerShell, Azure CLI, or Terraform for infrastructure as code (IaC).
- Knowledge of monitoring tools such as Azure Monitor, Prometheus, or Grafana for tracking system performance and availability.
- Experience managing CI/CD pipelines using tools like Azure DevOps or similar platforms to ensure smooth software delivery.
- Strong problem-solving and troubleshooting abilities to resolve performance and infrastructure issues in real-time.
- Good communication skills for collaboration with cross-functional teams, including developers, IT, and business stakeholders.
Preferred Qualifications:
- Microsoft certifications such as Microsoft Certified: Azure Administrator Associate or Microsoft Certified: Azure Solutions Architect.
- Experience with containerization and orchestration tools like Docker and Kubernetes in the context of running and managing Dynamics 365 applications.
- Familiarity with database performance tuning and optimization, especially with SQL databases in the cloud.
- Experience working in highly regulated industries like healthcare, with a focus on compliance with HIPAA and GDPR standards.