DevOps/Site Reliability Engineer

apartmentSeven Seven Softwares placeAlpharetta calendar_month 
Responsibilities:
  • Maintain applications once they are live by measuring and monitoring availability, latency and overall system health with a focus on business activities and continuously evaluate cost and waste
  • Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, capacity planning and launch reviews.
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity; includes automation for other various operational needs.
  • Troubleshooting infrastructure issues, reviewing log files, updating documentation, and having knowledge base with resolutions
  • Work closely with the application Development team to understand the platform and create tools/utilities to help with production management
  • Working with upstream data providers and upstream consumers, and reducing the amount of escalation to development teams
  • Develop scripts and assist with code changes along with operational tasks/activities.
  • Work closely with Application Development to ensure that the support team has excellent knowledge of the application set, own and maintain support knowledgebase and documents
  • Use of analytical skills to find trends in the environment and drive out problems.
  • Lead effort to determine improvement areas to stabilize the plant.
  • Identify risks, responsive, and works with a sense of urgency plus works within a team or independently
  • Test and tune network, hardware, and software configurations to maximize performance
  • Ability to interface with different teams like IT Dev managers, Infrastructure teams and lead as a Subject Matter Expert (SME) for the application(s) supported.
  • Understand the overall business flow of supported application systems and its interface with clients
  • Taking ownership and managing production requests, questions, issues and perform Root Cause Analysis for outages/incidents
  • Understand the overall business flow of supported application systems and its interface with clients
  • Be flexible to provide weekend on call rotation and available for offshore time lead
  • Within the Application Support space, to be accountable for the Production Environments as well as the non-Production Environments for the existing GBT team and be part of 24/7 production support coverage.
Skills Required:
  • 5+ years of experience in a production environment with a solid software development background and understanding of performance tuning, end-to-end troubleshooting, networking fundamentals and appropriate attention to detail.
  • Ability to focus, provide resolutions for production issues in a high demanding and pressured environment
  • Requires experience in designing, developing, and implementing technical solutions, or significant experience in deep technical support
  • Strong experience in scripting language (Shell scripting, Python, Perl, etc., ) and cloud driven development
  • Strong database skills with DB2, Sybase or Oracle
  • Hands-on experience with Autosys or other batch scheduling software
  • Strong experience in Continuous Integration and Continuous deployment
  • Strong experience in environment on demand for both Virtual Machines and containers
  • Knowledge and hands-on experience on with monitoring tools like Splunk, IP Soft, Sockeye
  • Practical experience on Agile Methodology (e.g. Scrum)
  • Knowledge or experience with automating deployments using Jenkins, Train or Windeploy
  • Ability to diagnose technical problems, debug, optimize code, and automate routine tasks
  • Hands-on experience in application and database troubleshooting/issue resolution in a fast-paced environment
  • Excellent communication and ability to think out of the box for process improvements.
  • Bachelor's/Master's Degree in Computer Science, Information Systems or related field
Skills Desired:
  • Knowledge of Cloud based deployment, security, networking concepts in Azure and AWS
  • Knowledge or experience with algorithms, data structures, complexity analysis and software design
  • Interest in designing, analyzing and troubleshooting large-scale distributed systems. .
apartmentIntellisoft Technologies IncplaceAlpharetta
to acquire the latest technologies in this fast-paced technology world. We believe in our employees and provide them with excellent benefits. Job Title: Site Reliability Engineer Duration: 6 Months Contract to Hire on W2 Location: Alpharetta, GA Need: GC...
thumb_up_altRecommended

Lead Cloud Platform Engineer

placeAtlanta, 17 mi from Alpharetta
Cybersecurity, Site Reliability Engineering, and Engineering Leads  •  Collaborate with and oversee managed services providers responsible for various aspects of delivering the container management platform ecosystem  •  Identify & propose emerging software...
local_fire_departmentUrgent

Site Reliability Engineer - Atlanta

apartmentVegatron SystemsplaceAtlanta, 17 mi from Alpharetta
JOB POSITION - SITE RELIABILITY ENGINEER DURATION - 12-18+ month contracts - 2 resources JOB TYPE - Remote - 100% Huge Multi-year/Multi environment project migrating to Azure This is a massive Azure migration with a Technology Giant...