Lead Site Reliability Engineer
RemoteKöln, Nordrhein-Westfalen, GermanyEngineering
...is Germany's best-known AI company. We develop neural networks to help people work with language. With DeepL Translator, we have created the world's best machine translation system and made it available free of charge to everyone online. Over the next few years, we aim to make DeepL the world's leading language technology company.
Our goal is to overcome language barriers and bring cultures closer together.
What distinguishes us from other companies?
DeepL (formerly Linguee) was founded by developers and researchers. We focus on the development of new, exciting products, which is why we spend a lot of time actively researching the latest topics. We understand the challenges of developing new products and try to meet them with an agile and dynamic way of working. Our work culture is very open because we want our employees to feel comfortable. In our daily work we use modern technologies - not only to translate texts, but also to create the world's best dictionaries, and solve other language problems.
When we tell people about DeepL as an employer, reactions are overwhelmingly positive. Maybe it's because they have enjoyed our services, or maybe they just want to get on board with our quest to break down language barriers and facilitate communication.
We are constantly looking for outstanding employees! Currently, we offer remote work in Germany, the Netherlands, the UK, and Poland. Whether you would like to work from home in one of these countries or from one of our offices in Amsterdam, London, Cologne, or Paderborn: the choice is yours. No matter where you choose to work from, our way of working is designed to make you an essential part of the team.
What will you be doing at DeepL? Your Responsibilities
We are looking for a Site Reliability Engineer (SRE) to kickstart our SRE team. As an SRE, you will be the bridge between the Kubernetes platform and software development teams, ensuring the reliability, scalability, and performance of our systems. You will work closely with 1-2 cross-functional software development teams, cultivating a deep partnership to help them deploy and manage their applications on our in-house platform infrastructure, implementing and teaching best practices.
The SRE team's responsibility will be to constantly drive for improvement of our reliability with the goal of taking on-call for all production services.
You will work on standardizing our deployments, mentoring developers and collaborating with our internal developer platform team on improving the Kubernetes-based platform.
The ideal candidate will have a deep passion for collaboration and a background in software development.
- Develop and maintain a deep partnership with 1-2 software development teams to understand their application requirements and to help them deploy and manage their applications on the platform infrastructure
- Work closely with the platform team to ensure that the platform infrastructure provided to the SRE team is reliable, scalable, and meets the requirements of the organization's applications
- Develop and maintain the necessary tooling, such as monitoring, alerting, and logging systems, to ensure the health and availability of the platform infrastructure.
- Collaborate with the platform team to continuously improve the reliability and performance of the platform infrastructure through automated testing, monitoring, and proactive maintenance
- Troubleshoot issues related to the platform infrastructure and work with the platform team to resolve them in a timely manner
- Participate in incident response and post-mortem activities to identify and address the root cause of any platform-related incidents, and implement preventative measures to avoid similar issues in the future
What we offer
- We are a distributed workforce enabling our employees to work from the comfort of your home office in Germany, Poland, Netherlands, or the UK or in one of our comfortable offices
- State-of-the-art equipment for your workplace
- Almost completely open-source technology on the inside - if we run it, we can fix it ourselves
- Operation at scale for products used by more than 100 million people worldwide
- A friendly, international, and highly committed team with a lot of trust, and a concise decision-making processes
- Meaningful work: We break down language barriers worldwide and bring different cultures closer together
- You bridge the gap between developing software and running infrastructure, bringing solid experience in both
- You are a good communicator and have the desire to help the teams you are working with
- You're not afraid to get your hands dirty - DeepL is scaling rapidly, and there is always something to do
- You're experienced with Kubernetes, Prometheus and Grafana
- You have a background in software development, optimally in backend applications
- Security isn't just a buzzword for you
If you don't tick all of the boxes above but feel like you're the right person: Don't worry, give it a shot!
We are looking forward to your application!