A leading financial services company located in Central London is seeking a Site Reliability Engineer to join their growing Infrastructure team on a permanent basis with Hybrid working.
Responsibilities:
Expand and fortify the IT architecture for optimal availability.
Implement continuous integration and deployment practices for seamless development workflows.
Leverage state-of-the-art technology for streamlined automation and repeatability.
Engage in Agile practices, including pair programming and daily standups.
Establish a competitive edge by constructing robust infrastructure.
Efficiently reduce TOIL and make informed trade-offs when necessary.
Foster strong relationships with business counterparts to gain deep insights into client needs.
Contribute to shaping a culture centered around Service Level Objectives in the engineering domain.
Proactively address challenges by identifying and confidently mitigating risks, issues, or control weaknesses in day-to-day operations.
Skills and Experience:
Proven track record of working with cloud-based infrastructure, particularly in AWS environments.
In-depth knowledge and hands-on experience with Terraform for infrastructure provisioning and management.
Extensive expertise in constructing, managing, and maintaining Kubernetes clusters within a high-availability, high-traffic Production setting.
Proficiency in one or more programming languages, with a preference for Go, Python, Ruby, or Node.
Comfortable troubleshooting in intricate environments using a range of monitoring and logging tools, including but not limited to Grafana, Prometheus