Experts
Site Reliability Engineer in IT & DevOps Department
in GoDaddy - Canada, Ontario

Remote
Full-time
Mid
Permanent

Job description

See job offer description.


GoDaddy is seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. This role will focus on automating and maintaining our storage infrastructure with a focus on Ceph, ensuring the reliability, scalability, and performance of our systems.

  • Automate and maintain day-to-day operations of storage systems to support application demands
  • Develop and maintain tools and automation scripts to streamline storage operations and improve efficiency
  • Monitor system performance, identify issues, and implement solutions to ensure high availability and reliability
  • Participate in agile concepts such as daily stand-up meetings, task tracking boards, design and code reviews, automated testing, continuous integration, and deployment
  • Continuously improve system reliability, performance, and capacity through proactive monitoring, automation, and optimization

More Offers From GoDaddy

Remote
Full-time
Senior
Permanent
Hybrid
Full-time
Senior
Permanent

Senior Technical Program Manager in GoDaddy United States, Tempe, Arizona

Remote
Full-time
Senior
Permanent

Technical Support and Sales Rep – United States... in GoDaddy United States, United States

17 USD/hour starting, approx. $35,360 annually with potential $50,000 - $60,000 including incentives

Hybrid
Full-time
Mid
Permanent

Manager, SRE- WP Platform in GoDaddy United States, Arizona

Remote
Full-time
Senior
Permanent

Benefits

We offer a range of total rewards that may include paid time off, retirement savings (e.g., 401k, pension schemes), bonus/incentive eligibility, equity grants, participation in our employee stock purchase plan, competitive health benefits, and other family-friendly benefits including parental leave. GoDaddy’s benefits vary based on individual role and

Job requirements

  • 2+ years of professional experience with Ceph, working in a production environment
  • 2+ years of experience in site reliability engineering or a similar role
  • 2+ years of professional experience with Ceph, including deployment, configuration, and management of Ceph clusters and systems
  • Experience working on Linux/Unix systems, with a focus on automation and operating at scale
  • Proficiency in Python or Bash
  • Experience with Ansible, Terraform, or SaltStack
  • Experience with Nagios-based monitoring tools, such as Icinga2
  • Experience with observability tooling, such as Prometheus, Grafana, Mimir, and Loki
  • Solid understanding of core networking concepts and protocols, particularly in relation to Linux/Unix systems
  • Experience with containerization and orchestration tools (e.g., Docker, Kubernetes)
  • Exposure to and experience working with compute platforms (e.g., OpenStack, AWS)
  • Familiarity with ability to contribute to CI/CD pipelines and automation workflows