Service Reliability Engineer
Location: Sydney (4 days in office, 1-day WFH)
Reports to: Technical Operations Director, APAC
Department: Global Technical Operations
The Opportunity:
A leading music organisation is now growing their Global Technical Operations hub in Sydney and looking for a Service Reliability Engineer (SRE) to join their team.
This is more than a traditional ops role – it’s an opportunity to bring a software engineering mindset to reliability, automation, and scalability in a global, high-impact environment.
What You’ll Do:
You’ll join a collaborative, hands-on team responsible for the stability, performance, and scalability of global platforms. Working closely with development, infrastructure, and security teams, you’ll help build a resilient environment that keeps music flowing – from studio tools to streaming systems.
- Design and maintain high-availability, high-performance systems for global applications.
- Automate everything – from infrastructure provisioning to deployment and scaling – using tools like Terraform, Ansible, and Python.
- Build robust monitoring and observability frameworks with AWS CloudWatch, Dynatrace, Prometheus, Grafana, or Splunk.
- Optimize CI/CD pipelines to improve reliability and deployment speed.
- Participate in on-call rotations, troubleshoot incidents, and lead post-incident reviews.
- Champion SRE principles – embed SLOs, SLIs, and error budgets into everyday engineering.
- Collaborate across Dev, Infra, and Security teams to create a culture of continuous improvement and reliability.
About You
You’re a technically strong and level-headed engineer who loves automation, thrives in complex environments, and knows how to balance pragmatism with perfection.
- Background in systems administration (Linux/Windows) in a large-scale environment.
- Proficient in at least one programming language (Python, Go, or Java).
- Hands-on experience with AWS (GCP or Azure a bonus).
- Deep understanding of networking, containers (Docker/Kubernetes), and Infrastructure as Code (Terraform, Ansible).
- Experience with monitoring and observability tools such as Dynatrace, Prometheus, Grafana, or Datadog.
- Calm, collaborative communicator with strong analytical and problem-solving skills.
Bonus Points For:
- Experience with ServiceNow or ITIL processes.
- Knowledge of chaos engineering, resilience testing, or advanced capacity planning.
- Previous experience managing distributed, global systems in production.
Culture & Perks
- Early Friday finish (1pm)
- Annual bonus $
- Optional 1% additional super with MLC
- Global collaboration and career growth opportunities
Interested?
Apply now or contact Sophia Parrelli at Talent International for a confidential chat.