Site Reliability Engineer : 2 days from Sanjose, CA office 3 da
Act as a bridge between traditional IT operations and software development, bringing a software engineering approach to system administration. Job Responsibilities Creating and supporting automation scripts (shell/ansible/python) for infrastructure deployments, validations and monitoring to improve operational tasks Scheduling monitoring scripts using cron and airlfow Monitoring using tools including Dynatrace, Apica, Grafana etc Database handling Build CICD pipelines Incident handling and problem management Mandatory Skills Experience in Ansible/ Python Monitoring Tools Dynatrace/Apica/Grafana Required Education Bachelors degree in computer science or a related field.
Required Experience 14 plus years of IT Infrastructure experience Extensive experience working with linux flavors like rhel/centos os, shells, filesystems and utilities Experience in programming languages like Python, ansible Knowledge of distributed computing and experience working with container orchestration frameworks including on-prem and rancher kubernetes and good knowledge on kubernetes objects Experience working with Storage, ONTAP is preferable: volume, aggregates, back ups, DR planning Experience scheduling monitoring scripts using cron and airlfow Experience with monitoring tools including Dynatrace, Apica, Grafana etc Database knowledge including sql and nosql dbs Experience building CICD pipelines (preferred) Cloud platform knowledge (specifically AWS) is required