This is an opportunity to join Oddschecker's growing infrastructure team and help with our continued migration to GCP (Google Cloud Platform). As a Site Reliability Engineer, you will play a key part in supporting different squads with the continued running of their applications.
Be responsible for ensuring the systems and applications we launch remain available, reliable and efficient at accomplishing their duties even as their duties scale and evolve.
Design, build and automate tools and processes to ensure and improve scalability, availability and performance across areas of technology. Build, integrate and run tools to inject, predict and identify infrastructure and service failures on an ongoing basis to help optimise our sites.
- Responsible for the maintenance and upkeep of Oddschecker platforms and services, which will require out of hours support and on call responsibilities.
- Work with our developers and architects to design and integrate systems that respond consistently to failures by gracefully degrading our services.
- Develop tools and procedures to be able to manage high demand on our systems e.g. degrading services gracefully, user prioritisation, removing low priority traffic, intelligent banners.
- Measure the capability of our infrastructure and applications to manage failures from failovers to full site outages. Make recommendations to the business on the levels of service that can be supported during different failure scenarios.
- Execute regular testing and measurement of our infrastructure and platforms to identify improvements in their reliability e.g. DR, performance and security testing.
- Design and run regular testing of applications in an off duty state (e.g. located on standby DR site, behind bannered services) to ensure they perform both functionally and from a performance standpoint.
- Instigate planned and spontaneous fire drills to continually test our system's ability to deal with failures and identify weak points that need improving.
- Work with all other squads to schedule and run the failover of our systems invoking DR and BCP processes as a business.
- Managing capacity, performance and availability for the IT infrastructure to ensure it meets all the needs of the business
- Provide technical guidance for service upgrades, rollouts and enhancements.
- Utilise tools and intuition to aid support teams in the identifying and mitigation of potential problems and vulnerabilities.
Develop engineering solutions to failures and all other problems that adversely affect site reliability and uptime. Including capacity, performance, stability and security issues.
Key characteristics and experience for the Site Reliability Engineer
You should be proficient in working as part of a team, a self-starter who has a keen interest in sports and the betting industry.
Knowledge of the following technologies:
- Working with Cloud based providers (Google Cloud Platform)
- Experience with Linux systems (CentOS).
- Working with HTTP web technologies (Tomcat/Apache/Nginx/HAProxy).
- Hands on troubleshooting for software and hardware issues
- Good understanding of database administration (Mysql, MongoDB)
- Experience in configuration management (Ansible / Puppet / Chef)
- Infrastructure as a code - Terraform, Google Deployment Manager
- Familiar with CI tools (Jenkins/Gitlab)
- Familiar with monitoring tools (Cacti, Icinga, Prometheus, Icinga2, Sensu)
- Strong scripting knowledge in Bash, Python
- Version Control best practices, Git, Mercurial
- Experienced with Kubernetes/Docker
- Familiar with NFS/LDAP/DNS
- Knowledge of CDN providers
- Working knowledge of Elasticsearch
- Experience with Go
- A desire to learn new technologies and apply them where appropriate to improve the quality of our software and processes
You will be entitled to a full Sky benefits package, which includes:
- 25 day's holiday plus bank holidays
- Contributory Pension
- Life Assurance
- Healthcare Plan
- Free SkyQ, Broadband & discounted Sky Talk after four weeks and completion of mandatory e-learning.
There are also further optional benefits, which you can choose to opt into.
On offer is a competitive salary, a planned out career, bonus and a great benefits package so apply now!