See job offer description.
At GoDaddy, we are searching for an outstanding Site Reliability Engineer with exceptional skills to join our ambitious team in India. This role offers the chance to create, build, and maintain the infrastructure that powers the dreams of millions of entrepreneurs worldwide. You will be at the forefront of driving reliability, observability, and cost efficiency across our large-scale systems. By crafting for resilience, automating operations, and proactively preventing incidents, you will ensure that our systems run flawlessly. Responsibilities include implementing end-to-end observability using Prometheus, Grafana, CloudWatch, and ServiceNow while defining and maintaining SLIs/SLOs/SLAs across infrastructure and applications; architecting and automating AWS infrastructure using CDK, CloudFormation, Python, Go, or Bash, with deployments orchestrated via GitHub Actions or Jenkins; managing and troubleshooting containerized workloads across Docker, Kubernetes (EKS), ECS, and Fargate while ensuring configuration consistency through Ansible, Puppet, or Chef; designing, building, deploying, and maintaining large-scale, production-grade systems in AWS with full ownership of system reliability, performance, and availability; driving platform reliability by proactively identifying risks, planning for scale and performance, and collaborating with engineering teams to embed reliability and cost awareness into all builds; leading incident management with blameless postmortems and standardized SOPs for response, deployments, capacity, disaster recovery, and security using tools like BigPanda, Site24x7, and ServiceNow; enhancing infrastructure and CI/CD pipelines to improve performance and cost-effectiveness, taking ownership of capacity planning, forecasting, and governance. The role requires 5+ years of proven SRE experience supporting production-scale systems with strong understanding of SLIs/SLOs, distributed systems reliability, and troubleshooting complex production issues; deep hands-on expertise with AWS services (EKS, ECS, Fargate, EC2, S3, RDS, SQS, SNS, CloudFormation, CDK, IAM, CloudWatch); proficiency in incident management tools (BigPanda, Site24x7), ServiceNow integration, and configuration management (Ansible, Puppet, Chef); strong automation skills in Python, Go, Bash with expertise in CI/CD pipelines using GitHub Actions, Jenkins, and container orchestration; and skills in monitoring and observability tools including Prometheus, Grafana, and CloudWatch. A Bachelor’s degree or equivalent experience in computer science, engineering, or a related technical field is preferred. The position offers diverse benefits and supports a culture focused on inclusivity, equity, and belonging.