Site Reliability Engineer

Scroll to content

Make an impact by working for sectors where technology is the enabler, where everything is ground-breaking and there’s a constant need to be innovative. Be part of the team that combines business knowledge, technological edge and a design experience - who complement and help each other in developing solutions and experiences for digital clients. Face challenges and learn other ways of thinking and seeing the world. There’s always room for your energy and creativity.


About the role

We are looking for a Solution Reliability Engineer (SRE) to leverage our Digital Delivery teams, working under the Agile methodologies, contributing for the continuous delivery and integration process defined with our clients to assure a regular product increment and guarantee the 360º observability processes to assure their reliability and stability.

This person will work inside DevOps squads, assuring an end-to-end vision of all solution architecture components, with focus on the integration and microservices ecosystem, helping analyze, building, validate, supporting, and managing the activities towards the best practice’s implementation, on the following main areas:

  • Microservices, Backend and integration services;
  • Integration & platform Security;
  • Reliability;
  • Testing Automation;
  • CI/CD;
  • Application and Infrastructure Monitoring;
  • Non-Functional Requirements Management and Implementation;
  • Stabilization.
  • Participate in the solution definition to ensure its operability;
  • Ensure resilience solutions, acting as a SPOC within the DevOps squads, raising awareness and creating teams backlog to address requirements related with solution reliability, observability, operational and non-functional requirements:
    • Collaborate in the definition & validation of performance tests;
    • Participate in the definition of resilience tests;
    • Ensure the solution observability;
    • Define monitoring requirements (e.g. log types);
    • Validate performance metrics and monitoring KPI's;
    • Challenge the best practices for CI/CD solution and its evolution.
  • Address backlog and develop the features prioritize according the business value, within Devops squads, related with microservices and integration architecture components with 2 major focus:
    • Work with developers during the software development lifecycle to ensure that developed services are operationalized;
    • Work with Operations stakeholders to fully understand and communicate the Root Cause Analysis and implement the lessons learned.
  • Look at monitoring KPI’s & logging efficiency to introduce new tools towards a more reliable solution;
  • Drive initiatives to make the solution (and all its components) more reliable, in order to decrease the source of tickets;
  • Promote continuous improvement practices withing the existing delivery model;
  • Promote the excellency of the agile methodologies’ principles within DevOps squads;

What are we looking for?

  • Strong experience with MicroServices;
  • Experience with Integration platforms (Tibco, WebLogic, software AG, OSB, OSM);
  • Experience in application reliability practices for client (internal and client) facing experiences;
  • Experience with Environments & Infrastructure (Unix/Linux);
  • Experience with Cloud (AWS, Oracle, Azure);
  • Experience with Containers (Docker, Kubernetes);
  • Good experience to address the following activities:
    • Software implementation and configuration to maintain infrastructure and application solutions;
    • System reliability, quality, and automation;
    • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve;
    • Provide primary operational support and engineering for multiple large distributed software applications;
    • Working with OSS & BSS complex solutions across several technologic domain (e.g. Online, Automation, QA).
    • Asset reliability risks evaluation;
    • Problem management processes;
    • Driving the collection of solution reliability metrics communicating to internal and external stakeholders;
    • JIRA ticketing tool for operational reporting;
    • Real time solution monitor solutions & tools (e.g: Kibana, Elastic Search, AppDynamics, Prometheus, Grafana);
    • Experience in business/technical assessments on solutions life cycle asset management processes.

Nice to have:

  • Agile certifications;
  • Cloud certifications;
  • ITIL v4;
  • At least 3 years of experience working on large scale, multiple agile team projects.


Soft Skills:

  • Ability to adapt to different contexts, teams and Clients
  • Motivation for international projects and ok if travel is included
  • Strong communication skills
  • Speak English fluently;
  • Being self-driven and working towards a common team or company purpose;
  • Experience managing internal and external stakeholders’ expectations;
  • Experience working with development teams and operational support teams;
  • Cultivate curiosity and thirst for learning;
  • Believe in the collective: teams and groups.

We want people who like to roll up their sleeves and open their minds. Believe this is you? Come join the Team! 

Or, know someone who would be a perfect fit? Let them know!

Av. Dom João II 34
1990-083 Lisboa Directions View page


R. Daciano Baptista Marques 245
4400-617 Vila Nova de Gaia Directions View page

Already working at Celfocus?

Let’s recruit together and find your next colleague.


Applicant tracking system by Teamtailor