Make an impact by working for sectors where technology is the enabler, where everything is ground-breaking and there’s a constant need to be innovative. Be part of the team that combines business knowledge, technological edge and a design experience - who complement and help each other in developing solutions and experiences for digital clients. Face challenges and learn other ways of thinking and seeing the world. There’s always room for your energy and creativity.
About the role
We are looking for a Solution Reliability Engineer (SRE) for Vodafone Enterprise and Consumer IoT Program.
This person will help building, supporting, and managing the activities towards the best practices implementation, on the following main areas:
- Testing Automation;
- Application and Infrastructure Monitoring;
- Non-Functional Requirements Management and Implementation;
- Participate in the solution definition to ensure its operability;
- Ensure the solution resilience, acting as a SPOC within the team:
- Collaborate in the definition of performance tests
- Participate in the definition of resilience tests
- Ensure the solution observability:
- Define monitoring requirements (e.g. log types)
- Validate performance metrics and monitoring KPI's
- Challenge the best practices for CI/CD solution and its evolution;
- Work with stakeholders to fully understand and communicate the Root Cause Analysis and implement the lessons learnt;
- Look at monitoring KPI’s & logging efficiency to introduce new tools towards a more reliable solution;
- Drive initiatives to make the solution (and all its components) more reliable – that is, less prone to cause support tickets;
- Work with developers during the software development lifecycle to ensure that developed services are operationalized.
Experience & Technical Skills
- Familiar with DevOps culture
- Experience in application reliability practices for client (internal and client) facing experiences
- Experience with Environments & Infrastructure (Unix/Linux);
- Experience with Cloud (AWS, Oracle, Azure);
- Experience with Containers (Docker, Kubernetes);
- Good experience to address the following activities:
- Software implementation and configuration to maintain infrastructure and application solutions;
- System reliability, quality, and automation;
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve;
- Provide primary operational support and engineering for multiple large distributed software applications;
- Working with OSS & BSS complex solutions across several technologic domain (e.g. Online, Automation, QA).
- Experience working in Operations, in the following processes:
- Asset reliability risks evaluation;
- Problem management processes;
- Driving the collection of solution reliability metrics communicating to internal and external stakeholders;
- JIRA ticketing tool for operational reporting ;
- Real time solution monitor solutions & tools (e.g: Kibana, Elastic Search, AppDynamics, Prometheus, Grafana).
- Experience in business/technical assessments on solutions life cycle asset management processes.
- Nice to have:
- Agile certifications;
- Cloud certifications;
- ITIL v4;
- At least 3 years of experience working on large scale, multiple agile team projects.
- Speak English fluently;
- Experience managing internal and external stakeholders’ expectations;
- Experience working with development teams and operational support teams;
- Cultivate curiosity and thirst for learning;
- Believe in the collective: teams and groups.
We want people who like to roll up their sleeves and open their minds. Believe this is you? Come join the Team!