Jobs
>
Site Reliability Engineer

Site Reliability Engineer

Permanent
Full time
Hybrid (Vilnius, Lithuania)
Platform Ops

We impact the lives of over 40 million consumers daily by working with clients in the Baltics, the USA, Central and South America, and the Caribbean. We operate one of the largest big data environments in the Baltics, with one of the most diverse data sets, and we pursue some of the most challenging analytical projects. Exacaster’s team is looking for a Site Reliability Engineer (SRE) to work on big data solutions for our clients in the telecommunications and finance industries and join our Platform Ops team!

About the role:

We are seeking a Site Reliability Engineer to build and manage our big data platforms across cloud and on-premise environments. This role focuses on ensuring the stability, scalability, and performance of the infrastructure supporting large-scale data processing systems. The ideal candidate will be experienced in building, maintaining, and optimizing distributed data platforms, both in the cloud and on-prem, while utilizing modern DevOps tools and practices.

Who should apply?

A person who has at least 2 years of hands-on experience in building, maintaining, and optimizing modern infrastructure. A new team mate should be confident working with Linux systems (especially RHEL), scripting and troubleshooting. Have solid experience with monitoring tools like Zabbix or Prometheus. If you’ve worked with Kubernetes to manage applications, use Terraform and Ansible to automate infrastructure, and are comfortable operating in AWS and/or Azure environments, you’ll fit right in.

Your daily tasks will include:

Maintain infrastructure for big data platforms, including Cloudera on-prem solutions and cloud-based environments on AWS or/and Azure.
Implement, manage, and optimize monitoring solutions using Zabbix and Prometheus to ensure the performance, availability, and reliability of data platforms.
Troubleshoot and resolve platform-related issues, focusing on system performance, reliability, and scalability of large data sets.
Use Terraform and Ansible for infrastructure automation, configuration management, and to ensure repeatable, scalable deployments.
Maintain CI/CD pipelines for seamless deployments and continuous integration (including static code analysis) for both infrastructure and applications.
Manage the reliability, performance, and scalability of databases such as PostgreSQL, RDS, MySQL or similar.
Manage Kubernetes clusters for orchestrating Big Data workloads and ensuring efficient resource utilization.
Proactively identify and resolve performance bottlenecks within Big Data platforms, including resource management, cluster tuning, and workload optimization.
Collaborate with network teams to ensure seamless communication between data services by optimizing data traffic and enhancing security practices.

We are looking for a person who has:

2+ years of hands-on experience in DevOps or SRE roles.
Proficiency in monitoring systems using Zabbix or Prometheus for tracking and alerting system metrics.
Knowledge of Linux (RHEL) systems, including scripting, system administration, and troubleshooting.
Hands-on experience with cloud environments, particularly AWS or/and Azure, including the deployment of cloud-native services and infrastructure.
Expertise in managing applications using Kubernetes.
Proficiency in using Terraform and Ansible for infrastructure automation and configuration management.
Experience with CI/CD pipelines and tools such as GitLab CI, ArgoCD, or similar.
Understanding of networking concepts including security, VPNs, and performance tuning in hybrid environments.
Knowledge of best practices and tools for optimizing system and application performance in large-scale distributed environments.

Nice to have things:

Experience managing and maintaining Cloudera or similar on-prem big data platforms.
Experience with Hadoop, Spark, or similar data processing frameworks.
Experience managing and maintaining Airflow.
Certifications in cloud platforms (AWS, Azure), Kubernetes, RHCSA or RHCE.
Experience in security best practices and tools for both cloud and on-prem environments.

We promise:

Monthly salary for this position from 3602 EUR to 5037 EUR gross per month for a full time role.
Participation in the company’s stock options program.
Flexible Benefits & Personal learning budget.
10 Growth Days per year - dedicated time for learning and development.
Ownership and dynamics in your role.
Hybrid work environment with preferably at least 1 day per week in the office.
All the support you need from our experienced team to become an even better professional.
And the most important thing – you will be part of a great international team!

Permanent
Full time
Hybrid (Vilnius, Lithuania)
Platform Ops

Apply now

This website uses cookies

FACTORIAL uses cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners who may combine it with other information that you've provided to them or that they've collected from your use of their services.