
Site Reliability Engineer
Job Description
Our client is a federal government organisation with offices throughout Australia. Due to growth, they are seeking an Site Reliability Engineer to join the team in their Richmond or Geelong office.
Opportunity
- Market Day Rate
- 2-year term (12 month initial term + 12 month extension)
- Richmond or Geelong
- Baseline clearance desirable
Key duties and responsibilities
- Working closely with the ICT Service Reliability Technical Lead in the implementation of our Platform on monitoring, alerting, and reporting Infrastructure and applications as Code,
- Developer Experience, and Modern Application Development
- Participation or leading discovery sessions developing monitoring and reporting requirements for new services and applications
- Developing and maintaining automation tools for deployment, monitoring, and recovery as well as long term architectural strategies for the monitoring tools in use.
- Working with development teams to integrate the monitoring suite into the development release cycle (DevOps) including assisting with regression testing
- Assisting in managing new projects relating to monitoring, reporting business intelligence, creating dashboards and alerts
- Implementing and maintaining event monitoring solutions, including on boarding new services, in a multi-vendor environment
- Monitoring system capacity and scalability to handle increasing workloads and developing proof of concepts and value projectsApply best practice ITIL methodology for incident management and service requests.
- May be working as part of a team (e.g., Business Analyst, Test Analyst, MuleSoft Developers), actively participating in prioritising and tracking system improvements via monthly sprints.
- Attend daily stand ups to review and discuss system improvements with developers and other third- party vendors.
- Produce deliverables in accordance with project schedules.
- Identify opportunities for process improvements, and liaise with business areas and teams to recommend enhancements.
- Maintain documentation, processes and decisions to support knowledge transfer and future enhancements including the team’s.
About you?
- Extensive experience with configuring and using monitoring and observability tools such as Dynatrace or Splunk
- Proficiency in scripting languages such as SPL, SQL, Python, and DQL
- Experience working with AWS cloud platform
- Familiarity with containerization platforms Kubernetes and Docker
- Strong knowledge of Linux/Unix systems and networking principles.
- Knowledge of security best practices and compliance standards preferably in federal government
- 5+ years of experience in a similar role, preferably in a large-scale environment.
- Experience with automation and configuration management tools like Terraform, Ansible, Chef, or Puppet and knowledge of CI/CD pipelines and tools such as GitLab CI, Github Actions.
- Understanding of microservices architecture and service mesh technologies.
- Experience in configuring mobile application monitoring (Dynatrace, Firebase, and Crashlytics)
Application Process
Please send your CV in confidence to samuel.beckett@talentinternational.com
Successful applicants will be required to complete a Key Selection Criteria response to be provided if shortlisted. Proof of Australian Citizenship is required along with national criminal history and federal background checks.