DevOps Developer role at IBM in Durham

IBM in Durham is hiring a DevOps Developer

This job might already be filled.

Introduction
At IBM, work is more than a job – it’s a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you’ve never thought possible. Are you ready to lead in this new era of technology and solve some of the world’s most challenging problems? If so, lets talk.

Your Role and Responsibilities
The developer and Site Reliability Engineer (SRE) teams both care about reliability, availability, performance, scalability, efficiency, and feature and launch velocity. However, SRE’s operate under different incentives, mainly favoring service long-term viability over new feature launches. SRE’s are responsible for ensuring services are resilient, responsive and have an up time appropriate to customer’s needs whilst controlling capacity and performance. Additionally, improving these services in a highly dynamic environment.

In summary, SRE is an engineering discipline that combines software, infrastructure and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Day-to-day, SRE’s use automation to limit time spent on operational work and proactively identify potential risk factors and convert them into actionable improvements.

Responsibilities:

Build automation to reduce toil and engineer solutions to reliability
Take ownership of the monitoring of applications, services, and infrastructure
Ensure consistent and thorough observability and monitoring across all environments (development, beta, production)
Work closely with development teams to capture meaningful and detailed heuristics to measure the health of each application
Design and implement monitoring checks for new services prior to launch
Apply continuous improvement to removing noise from alerting systems
Work with others across the team (Developers, DevOps Engineers, Sys Admins and the Release Manager) during software releases
Champion the testability of the monitoring system

Required Technical and Professional Expertise

Background in software engineering (projects and experience in Javascript, C#, Java, Go)
Experience automating problems or tasks to reduce toil (Powershell, shell, python etc.)
Knowledge of building and using observability, defining metrics or measures and dashboards, use of observability tools (Sysdig, Kibana, Prometheus, Grafana, Zabbix)
Experience with a logging and analytics framework (Splunk, LogDNA, or ELK stack)
System design knowledge (cloud-native architectures, best practices for availability and resiliency, practices and methods for problem isolation)
Experience with pipeline tools for deploying and managing applications (Travis, Jenkins)
Confident with infrastructure-as-code tools (Ansible, Terraform, Blueprints)
Confident with source control (Github, perforce)
Experience with cloud services and platforms (IBM Cloud, AWS, GCP, MS Azure)
General Linux knowledge
Network and security knowledge
Happy working using Agile practices, and JIRA