Lead Site Reliability Engineer

apartmentRobert Half placeCamden calendar_month 
We are searching for a Lead Site Reliability Engineer to join our team in Camden, New Jersey. This role primarily involves leading and scaling our Site Reliability Engineering team, with a focus on designing, building, and deploying cloud and on-premise infrastructure.

The successful candidate will play a crucial role in improving system reliability, reducing downtime, and enhancing operational efficiency.

Responsibilities:
  • Take the lead in incident response, root cause analysis, and post-mortem reviews to reduce downtime and continually improve system reliability.
  • Design, deploy, and maintain both on-premise and cloud infrastructure, meeting the organization's requirements for scalability, performance, and security.
  • Utilize Infrastructure as Code (IaC) tools to ensure all infrastructure is repeatable and versioned.
  • Develop and maintain automation scripts and tools to improve operational efficiency and manage deployment and monitoring.
  • Provide architectural recommendations and implementations to evolve the Software Development Life Cycle (SDLC).
  • Manage the team structure of the Site Reliability Engineering team.
  • Measure and communicate key metrics for each environment.
  • Facilitate knowledge sharing across each team in the engineering organization.
  • Serve as a resource for team members in non-engineering roles.
  • Participate in flexible on-call rotations. • Proficiency in Amazon EC2, Ansible, Apache ANT+, Apache Tomcat, Atlassian Jira, AB Testing, Agile Scrum, Automation, AWS Technologies, and Cluster Analysis is required.
  • Extensive experience in site reliability engineering or similar roles.
  • Strong understanding of system and network architecture.
  • Ability to develop and maintain CI/CD process for SaaS and cloud applications.
  • Ability to troubleshoot complex system issues.
  • Excellent problem-solving and analytical skills.
  • Strong interpersonal and communication skills.
  • Demonstrated ability to work in a team environment.
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent work experience.
  • Experience in working with cross-functional teams to ensure system reliability, availability, and performance.
  • Knowledge of best practices related to data encryption and cybersecurity.
  • Familiarity with containerization and orchestration technologies.
  • Ability to handle on-call duties on a rotating schedule.
  • Proven track record of designing and implementing scalable and reliable systems.
  • Ability to mentor entry level engineers and help them grow in their roles.
  • Strong documentation skills to ensure system processes and procedures are recorded for team use.
local_fire_departmentUrgent

Mechanical Reliability Engineer

apartmentSun ChemicalplaceWilmington (DE), 27 mi from Camden (NJ)
Mechanical Reliability Engineer - Newport, DE The Mechanical Reliability Engineer role provides leadership and direction for the mechanical integrity and mechanical reliability programs sitewide. Position directly supervises the Reliability...
electric_boltImmediate start

Electrical Reliability Engineer

apartmentSynerfac Technical StaffingplaceMorgantown (PA), 45 mi from Camden (NJ)
Our client, a leader in the metal manufacturing industry, is seeking an Electrical Reliability Engineer to add to their team in Morgantown, PA. This person will be responsible for electrical maintenance engineering activities to maximize safety...
business_centerHigh salary

Mechanical Reliability Engineer

apartmentSynerfac Technical StaffingplaceMorgantown (PA), 45 mi from Camden (NJ)
Our client, a leader in the metal manufacturing industry, is looking to add a Mechanical Reliability Engineer to their team in Morgantown, PA. Responsibilities:  •  Responsible for safety design modifications and improvements to equipment...