o developer should have to choose between security and reliability. We will ensure that Snyk is dependable, with a reliable track record that encourages developers to embed Snyk in their workflows. You will join a team that owns the reliability (SLOs) of key customer workflows and runs the applications that most influence Snyk’s reliability. You will be responsible for building a culture of high standards around reliability, observability, and resilience practices. You’ll Spend Your Time: Pair-programming to collaboratively improve the services that power Snyk Establishing SLIs and SLOs for the key customer workflows that your team owns Diagnosing the factors that most threaten SLOs and identifying necessary improvements Improving observability, measurement and diagnostics for key customer SLIs and SLOs Creating and fine tuning error budgets, with dashboards and alerts to monitor them Reducing time to recover with faster deployment lead times Improving application design to partition workloads by customer criticality Sharing the practices and tooling you develop across other engineering teams Implementing capacity management and load testing capabilities for core services Working with teams to ensure that monitoring and alerting are instrumented to be customer impact focused. The goal is that no one should get out of bed at 3am for non-customer facing issues Raising the bar on Production Readiness, Incident response and analysis, and working with R&D teams to meet this bar Participating in our on-call rotation (compensated) What You’ll Need: