Devops Engineer Sre And Saas Support
Opening / Selling Statement -We are seeking a Mid-Level DevOps Engineer with Site Reliability Engineering (SRE) experience to contribute to the transition of Crew Management Applications to a web-based SaaS model hosted on AWS. The successful candidate will work under the guidance of a Senior DevOps Engineer, supporting critical system reliability, automation, and monitoring tasks while actively contributing to the successful implementation of key deliverables.
Required Skills -DevOps, Site Reliability Engineering (SRE), Kubernetes, AWS EKS
Job Duties -- Support Key Deliverables: Assist in implementing metrics collection, developing dashboards, conducting reliability audits, and creating runbooks as outlined in the project goals.- Collaboration: Work closely with the Senior DevOps Engineer, development teams, and support teams to ensure seamless operations and effective communication between stakeholders.
- CI/CD and Automation: Contribute to the development and optimization of CI/CD pipelines and automation scripts to support efficient and consistent deployments.
- Observability Implementation: Assist in configuring and maintaining monitoring solutions using OpenTelemetry and Grafana to enhance system visibility.
- Production Support: Participate in 24/7 Tier II production support on a rotational basis, addressing technical escalations and contributing to system stability.
- Documentation: Collaborate in the preparation of technical documentation, including runbooks, playbooks, and training materials for Tier I and II support teams.
- Dashboards and Metrics: Support the development of Grafana dashboards for monitoring services, including Kubernetes platform components and internally developed services.
- Issue Investigation: Assist in identifying and resolving issues reported from lower-tier support teams, ensuring timely resolution and minimizing customer impact.
- Game Day Scenarios: Participate in the execution of Game Day scenarios to prepare for potential system failures and improve operational readiness.
- Reliability Contributions: Work on tasks related to reliability audits, including submitting merge requests for simpler issues and escalating more complex problems to senior team members.
Job Requirements -- Experience: 3 5 years in DevOps, SRE, or related roles with a focus on cloud-hosted, microservices-based environments.
- Technologies: Familiarity with Kubernetes, AWS EKS, Terraform, ArgoCD, OpenTelemetry, and Grafana.
- DevOps Practices: Basic understanding of CI/CD pipelines and infrastructure-as-code (IaC) principles.
- Incident Management: Experience in troubleshooting and resolving technical issues in production environments.
- Collaboration: Ability to work effectively as part of a team under the direction of senior engineers.
- Documentation: Basic skills in technical writing, including the ability to contribute to incident runbooks and operational playbooks.
- On-Call Readiness: Willingness to participate in 24/7 rotational production support as required.
- Experience contributing to dashboards and monitoring systems for production environments.
- Familiarity with automated remediation processes and system optimization practices.
- Background in supporting SaaS environments or cloud migrations.
Required Skills : DevOps
Basic Qualification :
Additional Skills :
This is a high PRIORITY requisition. This is a PROACTIVE requisition
Background Check : NoDrug Screen : No