Cloud Reliability & Operations Engineer
Amber People New York
Cloud Reliability & Operations Engineer
New York
Base: $160,000 - $170,000
Our client is looking for a senior cloud reliability and operations engineer to join our IT department. This individual will be working on developing the operating model and supporting the firm’s cloud hosting zones across a number of providers.
This is a key role which focuses on quality, availability, and performance to ensure the firm’s cloud applications and services meet the demands of the firm’s digital users today and in the future. The individual will need to be proficient in a variety of observability technologies, including availability and performance monitoring and tuning, and automation to help define and mature our cloud management and reporting capabilities.This role will also help transition 24x7 operational responsibilities to the standard operation teams by enabling new tooling, capability, training and documentation to allow for the traditional operations team to take on the new cloud centric responsibilities.
After the support model is established this role will serve as the L3 escalation point for cloud based incidents and admin escalations from ops, appdev, and infrastructure teams.
Key Responsibilities- On-going production operations of AWS and Azure hosted infrastructure and applications
- Drive the development and use of new and self-service tooling to support the operating model for the cloud
- Improve resiliency for all cloud applications and infrastructure and ensure that HA, DR, Data Protection requirements are appropriately engineered and implemented for each workload
- Stand up cloud environments based on established standards and guard-rails
- Use configuration management, orchestration and management tooling to ensure cloud environments meet operational and security standards
- Be a subject matter expert in reducing and resolving production incidents by identifying preventive controls and driving proactive efforts
- Act as the gatekeeper for all access escalations across all cloud environments
- Drive to a new operating model – enable tooling and process so that all L1/L2 operations can be done by more traditional NOC teams and remain the L3 escalation point for cloud incidents and requests
- Track system uptime and availability and promote incremental increases to change velocity
- Drive innovation and prioritization and engineering of new cloud capabilities to bolster the operating model
Required:
- 5+ years of reliability and operations experience – Linux, Windows, DevOps, Infrastrcuture, Network, Cyber
- 3+ years of experience with cloud – AWS, Azure, VMWare
- Expertise in troubleshooting cloud environments – finding and fixing critical production issues
- Practical experience with modern scripting languages – Python, Powershell, Perl, PHP, Shell
- Experience implementing Infrastructure as Code – Terraform, CloudFormation, Ansible etc..
- Expertise in management and monitoring cloud tooling – cloudwatch, splunk, datadog, ELK, Prometheus, cloudtrail etc..
- Experience with AIOps platforms to automate and shift-left operations functions
- Experience supporting mission critical applications and infrastructure on a 24x7 basis
- Working knowledge of cloud security principles and best practices
- Working knowledge of cloud networking – DNS, SG, NACL, firewalls,
- Expertise in driving good hygiene in cloud environments – in place patching, immutability, compliance monitoring (aws config), clean up of technical debt, IAM
- Experience with a DevOps delivery model for infrastructure, applications, and configuration
- Designing operational state to be policy and automation driven
- Strong communication skills
- Ability to multitask, work well under pressure and prioritize work against competing deadlines and changing business priorities
Desired:
- Experience with Google Cloud Platform
- Software development experience
- AWS Certifications
tanishasystemsNew York
Position: Network Operations Engineer (Data Center Network)Salary upto $90k Few points Rolling workdays means Tues to Sat / Wed to Sun and So on Location Buffalo NY mandatorily 100% work from office all 5 days Open Roles: 2 Position: Network...
MMC GroupNew York
Job Description:
The IT Security Operations Engineer will be part of Infrastructure Security group in Global Security team located in Greater New york. Primary focus will be on designing and building Security Operations Centers for our clients...
Intone NetworksMontvale (NJ), 23 mi from New York
Job Title: Technical Operations Engineer III / Application Support Engineer/Application Support AnalystLocation: Montvale, NJ - Hybrid Role, will need to be located near Montvale, expected to come into the office twice a week.Long TermClient...