Site Reliability Engineer
Etihad
Date: 11 hours ago
City: Abu Dhabi
Contract type: Full time

Synopsis
The Site Reliability Engineer (SRE) will lead an SRE squad focused on enhancing service reliability, performance, and scalability. They will drive automation to reduce toil, optimize system uptime, and manage incident resolution efforts. Responsible for building monitoring systems, optimizing infrastructure, and implementing safe deployment practices, the SRE will also ensure alignment with SLAs/SLOs and contribute to system development and code reviews. The role requires expertise in large-scale distributed systems, cloud infrastructure, and IT governance, with a focus on continuous service improvement and operational excellence.
Accountabilities
Etihad Airways, the national airline of the UAE, was formed in 2003 and quickly went on to become one of the world’s leading airlines. From its home in Abu Dhabi, Etihad flies to passenger and cargo destinations in the Middle East, Africa, Europe, Asia, Australia and North America. Together with Etihad’s codeshare partners, Etihad’s network offers access to hundreds of international destinations. In recent years, Etihad has received numerous awards for its superior service and products, cargo offering, loyalty programme and more. All this ties into Etihad’s ambitious Journey 2030 strategy. The airline plans to double its fleet size and triple the number of customers over the next six years as it sets out to be the airline everyone wants to fly!
To learn more, visit etihad.com
Recruitment Fraud Alert
Beware of fraudulent job offers from individuals or organizations claiming to represent the Etihad group. We will never ask for personal information, bank details, or payment during the recruitment process. Interviews are conducted face-to-face or via video/telephone before any formal offer. If you are asked for money, please treat it as fraudulent.
The Site Reliability Engineer (SRE) will lead an SRE squad focused on enhancing service reliability, performance, and scalability. They will drive automation to reduce toil, optimize system uptime, and manage incident resolution efforts. Responsible for building monitoring systems, optimizing infrastructure, and implementing safe deployment practices, the SRE will also ensure alignment with SLAs/SLOs and contribute to system development and code reviews. The role requires expertise in large-scale distributed systems, cloud infrastructure, and IT governance, with a focus on continuous service improvement and operational excellence.
Accountabilities
- Team Leadership & Reporting: Lead an SRE squad handling operations and automation; represent team in senior management briefings; produce dashboards and progress reports.
- Toil Reduction & Automation: Identify and eliminate toil through automation of repetitive tasks, enhancing team efficiency and service reliability.
- Service Reliability & Uptime: Maintain and improve service availability by aligning with SLAs/SLOs, designing failover strategies, and hardening systems.
- Performance & Latency Optimization: Enhance service performance and reduce latency using profiling tools, distributed tracing, load testing, and bottleneck analysis.
- Change & Deployment Management: Implement safe deployment practices (e.g., canary releases, blue-green deployments), ensuring minimal risk and rapid rollback options
- Monitoring & Observability: Build and manage real-time monitoring and alerting systems to ensure service health and proactively detect anomalies.
- Incident Management & RCA: Lead incident resolution efforts, conduct root cause analyses (RCA), and develop response playbooks to reduce MTTR.
- Capacity & Cost Optimization: Perform infrastructure capacity planning and cost-efficient scaling to meet service demands.
- Development & Code Review: Contribute to system development, participate in design/code reviews, and ensure alignment with engineering best practices.
- Governance, Compliance & Documentation: Enforce IT governance standards, maintain documentation, perform quality assessments, and contribute to architecture and risk committees.
- 7+ years of experience with data structures/algorithms and software development in Two or more programming languages and operating and maintaining platforms with 3+ years of experience in a DevOps or SRE role.
- Experience working in computing, distributed systems, storage, or networking.
- Expertise in designing, analysing, and troubleshooting large-scale distributed systems.
- Ability to debug, optimize code, and to automate routine tasks.
- Systematic problem-solving approach, coupled with effective verbal and written communication skills.
- Strong communication capability, able to articulate technical issues in terms of business risk and opportunity.
- Knowledge of the technical aspects of cloud computing, data centres, networks and virtual infrastructure.
- Strong analytical and problem-solving skills are necessary , TSM processes & tools
Etihad Airways, the national airline of the UAE, was formed in 2003 and quickly went on to become one of the world’s leading airlines. From its home in Abu Dhabi, Etihad flies to passenger and cargo destinations in the Middle East, Africa, Europe, Asia, Australia and North America. Together with Etihad’s codeshare partners, Etihad’s network offers access to hundreds of international destinations. In recent years, Etihad has received numerous awards for its superior service and products, cargo offering, loyalty programme and more. All this ties into Etihad’s ambitious Journey 2030 strategy. The airline plans to double its fleet size and triple the number of customers over the next six years as it sets out to be the airline everyone wants to fly!
To learn more, visit etihad.com
Recruitment Fraud Alert
Beware of fraudulent job offers from individuals or organizations claiming to represent the Etihad group. We will never ask for personal information, bank details, or payment during the recruitment process. Interviews are conducted face-to-face or via video/telephone before any formal offer. If you are asked for money, please treat it as fraudulent.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Butchery Commis Filled
Dusit Hotels and Resorts in Davao,
Abu Dhabi
12 hours ago
Job DescriptionPRIMARY RESPONSIBILITIES: Participates in the preparation, service and cleaning activities of Butcher Shop. Cut, trim, bone, tie, and grind meats, such as beef, pork, poultry, and fish, to prepare meat in cooking form. Estimates requirements, order or requisition meat supplies to maintain inventories. Records quantity of meat received and issued to cooks and/or keep records of meat sales. Ensures...

Dusit Club Agent
Dusit Thani Dubai,
Abu Dhabi
13 hours ago
Job DescriptionPRIMARY RESPONSIBILITIES: The Dusit Club Receptionist reports directly to the Assistant Dusit Club Manager and is guided during her shift by a Club Reception Supervisor. The Dusit Club Receptionist will participate actively in the Front Office activity for all the Club Rooms and Suites of the Hotel. His/her mission is to reach the best satisfaction from all the guests...

Senior Civil Engineer (Offshore Project)
KBR, Inc.,
Abu Dhabi
1 day ago
TitleSenior Civil Engineer (Offshore Project)Belong, Connect, Grow, with KBR!The KBR team of teams delivers future-forward science, technology and engineering solutions and mission-critical services that help governments and companies around the world accomplish their most important objectives, while also helping achieve their sustainability goals. KBR Sustainable Technology Solutions provides holistic and value-added solutions across the entire asset life cycle. These include...
