See job offer description.
As a Senior Site Reliability Engineer at GoDaddy, you will be responsible for owning reliability outcomes for security platforms by defining SLIs/SLOs and error budgets; building actionable alerting, dashboards, and runbooks. You will architect and implement high availability, capacity planning, and disaster recovery for IDS/IPS, DDoS mitigation, and supporting services. You will design zero/minimal-downtime maintenance and upgrade strategies for OS, firmware, and signature updates, automate deployments, configuration, and compliance using SaltStack and Python, and operate and improve a heterogeneous stack including TrendMicro TippingPoint IPS, Suricata, NetScout/Arbor Sightline/TMS, HAProxy, Nginx, Juniper, Palo Alto, Kentik/KProxy. You will build and evolve observability through tools like Icinga alerting, Grafana dashboards, InfluxDB metrics, and rsyslog pipelines, driving SLO-based alerting and noise reduction. Responsibilities also include leading incident response within a 24/7 on-call rotation as incident commander, driving rapid mitigation, running blameless postmortems with durable fixes, and reducing toil through self-service tooling, APIs, and automated health checks. You will champion reliability reviews and game days/chaos testing, ensure audit-ready operations aligned to WebTrust and PCI-DSS, uphold change management, configuration baselines, and access controls. Collaboration with Network Engineering, Security Architecture, Hosting, and Product teams is essential, along with mentoring 23 contractors. Maintaining high-quality operational documentation, SOPs, and architectural diagrams is also required. Your experience should include 5+ years in SRE/production operations or platform engineering supporting large-scale, mission-critical systems with a focus on network/security platforms. Expertise in SaltStack or similar tools (Puppet, Ansible), strong Linux administration and troubleshooting, deep understanding of TCP/IP, routing, L4L7, load balancing, proficiency in Python, observability tools at scale, Git-based workflows, and infrastructure as code are needed. Proven effectiveness in 24/7 operations, incident management, excellent technical writing, and mentoring skills are vital. Preferred qualifications include hands-on administration of IDS/IPS and DDoS platforms, HAProxy and Nginx experience, Juniper and Palo Alto administration, a relevant Bachelors degree, industry certifications (Security+, CISSP, Linux+), web hosting or managed service environments experience, incident response, change management, compliance audits, hybrid cloud/on-premises knowledge, and understanding of WebTrust and PCI-DSS.