Service Reliability Engineer

Location: Sydney (4 days in office, 1-day WFH)
Reports to: Technical Operations Director, APAC
Department: Global Technical Operations

The Opportunity:

A leading music organisation is now growing their Global Technical Operations hub in Sydney and looking for a Service Reliability Engineer (SRE) to join their team.

This is more than a traditional ops role – it’s an opportunity to bring a software engineering mindset to reliability, automation, and scalability in a global, high-impact environment.

What You’ll Do:

You’ll join a collaborative, hands-on team responsible for the stability, performance, and scalability of global platforms. Working closely with development, infrastructure, and security teams, you’ll help build a resilient environment that keeps music flowing – from studio tools to streaming systems.

Design and maintain high-availability, high-performance systems for global applications.
Automate everything – from infrastructure provisioning to deployment and scaling – using tools like Terraform, Ansible, and Python.
Build robust monitoring and observability frameworks with AWS CloudWatch, Dynatrace, Prometheus, Grafana, or Splunk.
Optimize CI/CD pipelines to improve reliability and deployment speed.
Participate in on-call rotations, troubleshoot incidents, and lead post-incident reviews.
Champion SRE principles – embed SLOs, SLIs, and error budgets into everyday engineering.
Collaborate across Dev, Infra, and Security teams to create a culture of continuous improvement and reliability.

About You

You’re a technically strong and level-headed engineer who loves automation, thrives in complex environments, and knows how to balance pragmatism with perfection.

Background in systems administration (Linux/Windows) in a large-scale environment.
Proficient in at least one programming language (Python, Go, or Java).
Hands-on experience with AWS (GCP or Azure a bonus).
Deep understanding of networking, containers (Docker/Kubernetes), and Infrastructure as Code (Terraform, Ansible).
Experience with monitoring and observability tools such as Dynatrace, Prometheus, Grafana, or Datadog.
Calm, collaborative communicator with strong analytical and problem-solving skills.

Bonus Points For:

Experience with ServiceNow or ITIL processes.
Knowledge of chaos engineering, resilience testing, or advanced capacity planning.
Previous experience managing distributed, global systems in production.

Culture & Perks

Early Friday finish (1pm)
Annual bonus $
Optional 1% additional super with MLC
Global collaboration and career growth opportunities

Interested?
Apply now or contact Sophia Parrelli at Talent International for a confidential chat.