Talent.com
הצעת עבדה זו אינה זמינה במדינה שלכם.
Site Reliability Engineer- Infra

Site Reliability Engineer- Infra

TaboolaIsrael
לפני יותר מ-30 ימים
תיאור המשרה

Realize your potential by joining the leading performance-driven advertising company!

As a Site Reliability Engineer- infra, on our Infrastructure team at the TLV office, you will play a key role in ensuring the reliability, scalability, and performance of our critical systems. You will be responsible for managing and improving our core infrastructure, with a focus on automation, monitoring, and incident response. You will work with a wide range of technologies, including Kubernetes, monitoring and observability tools, configuration management systems, and core networking services.

To thrive in this role, you’ll need :

  • 5+ years of experience in a Site Reliability Engineering, Systems Engineering, or similar role.
  • Deep understanding of Site Reliability Engineering principles and practices.
  • Extensive experience with Kubernetes, including deployment, management, and troubleshooting.
  • Strong experience with monitoring and observability tools such as SensuGo, Zabbix, VictoriaMetrics, Prometheus, and ELK.
  • Proficiency in configuration management tools such as Puppet and Ansible.
  • Solid understanding of Linux internals and networking.
  • Experience with managing and maintaining core services such as DNS and networking.
  • Strong programming skills in Python and / or Go.
  • Experience with both on-premises and cloud environments.
  • Experience with KubeVirt.
  • Excellent troubleshooting and problem-solving skills.
  • Strong communication and collaboration skills.
  • Ability to work in a fast-paced, dynamic environment.
  • Ability to participate in on-call rotations including weekends.

Preferred Qualifications :

  • Experience with large-scale, distributed systems.
  • Experience with other cloud providers (e.g., AWS, Azure, GCP).
  • Contributions to open-source projects.
  • How you’ll make an impact :

    As a Site Reliability Engineer , you’ll bring value by :

  • Ensure the reliability, availability, and performance of our infrastructure services.
  • Manage and maintain our Kubernetes infrastructure, including KubeVirt.
  • Design, implement, and maintain our monitoring and observability stack (SensuGo, VictoriaMetrics, Prometheus, ELK).
  • Automate infrastructure provisioning, configuration, and deployment processes using Puppet and Ansible.
  • Manage and maintain core services such as DNS and networking.
  • Troubleshoot and resolve complex infrastructure issues in a timely and efficient manner.
  • Participate in on-call rotations and incident response.
  • Develop and maintain infrastructure-as-code (IaC).
  • Identify and implement proactive measures to prevent incidents and improve system reliability.
  • Collaborate with development teams to ensure smooth and reliable deployments.
  • Contribute to the design and implementation of new infrastructure solutions.
  • Drive improvements in system architecture, processes, and tools.
  • Mentor and coach other team members.
  • צור התראת עבודה עבור חיפוש זה

    Site Reliability Engineer • Israel