Site Reliability Engineer
3 days ago
Skills and Experience you will bring: At least 3 years of hands-on experience managing critical, high-availability production infrastructure, demonstrating success in maintaining reliability and maximizing application uptime. Proficient in at least one programming language (such as Python, Java, or Rust), with experience designing and building production-quality automation, tools, or software libraries. At least 3 years working with monitoring, log aggregation, and observability platforms such as Datadog, CloudWatch, Honeycomb, Splunk, or New Relic, using data-driven insights to proactively identify and resolve issues. Excellent analytical skills with the ability to understand end-to-end use cases, map system flows, debug complex issues, and anticipate potential failure points. Proven track record translating SLO’s and SLI’s into actionable improvements. Reliability, monitoring, and observability are not just words to you. At least 3 years of experience with cloud technologies, in particular AWS Services and tools such as Cloud Formation, Lambda, DynamoDB, SQS, SNS, EC2, S3, AWS CLI, Boto3. Solid foundation in Linux systems administration, networking, and security. Familiarity with the use and configuration of CI & CD pipelines such as Jenkins & AWS CodePipeline. Additional skills and experience that will be useful: Experience architecting and deploying serverless applications in cloud environments. Experience with infrastructure-as-code tools like Terraform or CloudFormation, enabling reproducible and scalable environments. Previous participation in production on-call rotations, with direct involvement in incident management and post-incident reviews. Demonstrated expertise in performance optimization for core AWS services, including Lambda, DynamoDB, API Gateway, SQS, EventBridge, and EC2. Experience supporting and improving systems with frequent, high-velocity deployment cycles. Familiarity with security compliance frameworks (e.g., OWASP, ISO, CSA, PCI), and hands-on experience conducting threat assessments and implementing remediation plans. Background in security practices, including penetration testing, threat modeling, and usage of both open-source and commercial security tools. Experience developing and implementing advanced deployment strategies for web application infrastructures—such as canary, A/B testing, blue/green deployments, or red/line patterns. Hands-on experience with chaos engineering—intentionally testing systems under extreme conditions to improve reliability and fault tolerance. Track record of championing system reliability, continuous improvement, and operational excellence throughout an organization. We’re looking for highly motivated, passionate site reliability engineers to join our growing team. At evertz.io, our teams are building services that are used by the biggest names in the exciting broadcast and media industry. Our services are hosted in AWS, with a Serverless First mindset. As part of this role you will work with our talented teams to help harden our multi-tenant SaaS platform. Using best in class observability tooling, you will be working to debug incidents, while also identifying and implementing improvements to the platform to ensure its continued reliability. Your drive to eliminate toil will see you automating processes and building the tools to do so. Evertz Microsystems (TSX:ET) is a leading global manufacturer of broadcast equipment and solutions that deliver content to television sets, on-demand services, WebTV, IPTV, and mobile devices (like phones and tablets). Evertz has expertise in delivering complete end-to-end broadcast solutions for all aspects of broadcast production including content creation, content distribution and content delivery. Considered as an innovator by their customers, Evertz delivers cutting edge solutions that are unmatched in the industry in both hardware and software. Evertz delivers products and solutions that can be found in major broadcast facilities on every continent. Evertz’ customer base also includes telcos, satellite, cable TV, and IPTV providers. We work in agile, low-bureaucracy, cross-functional teams spread across the world. It’s a highly creative work environment where the team is built on trust and is relaxed, open and welcoming to all. Evertz has engineering offices in Canada, England, Scotland, India, and now it's time for Poland ,[Ensure platform reliability and uptime by monitoring, maintaining, and optimizing multi-tenant SaaS infrastructure., Investigate and resolve production incidents, perform root cause analysis, and drive long-term fixes to improve system resilience., Develop tools and automation to streamline daily operational processes and improve efficiency., Implement and maintain observability solutions (monitoring, logging, alerting) to enable proactive issue detection and quick response., Define and refine SLOs and SLIs, turning performance metrics into actionable reliability improvements., Design, maintain, and optimize CI/CD pipelines to ensure fast, consistent, and secure deployments., Design, deploy, and maintain scalable cloud and application infrastructure., Apply Infrastructure as Code practices to ensure scalable and reproducible environments., Enhance system security and compliance, conducting assessments and implementing mitigation strategies aligned with industry standards., Participate in on-call duty.] Requirements: Python, Java, Rust, Observability platforms, Datadog, CloudWatch, Splunk, AWS, Linux, Networking, system administration, CI/CD, SLOs / SLIs, Terraform, CloudFormation, AWS services, Security, Serverless computing, Security Compliance, Performance Optimization Additionally: Private healthcare, International projects, spotkania integracyjne.
-
Site Reliability Engineer
7 days ago
Remote, Warszawa, Czech Republic Connectis_ Full timeCo najmniej 5 lat praktycznego doświadczenia na w roli Site Reliability Engineera. Praktyczna znajomość Infrastructure as Code (Terraform) do automatyzacji tworzenia i zarządzania zasobami. Praktyczna znajomość AKS (Azure Kubernetes Service) w zakresie zarządzania klastrami Kubernetes. Zdolność pisania skryptów w Pythonie do automatyzacji...
-
Staff Site Reliability Engineer
1 week ago
Remote, Czechia, Czech Republic Veeam Software Full time 1,200,000 - 2,400,000 per yearVeeam, the #1 global market leader in data resilience, believes businesses should control all their data whenever and wherever they need it. Veeam provides data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep...
-
Remote, Katowice, Czech Republic Jamf Full timeMinimum of 5 years of experience in IT. (Required) Experience identifying, tuning, and fixing issues with software in Java. (Required) Experience working with containerization and Kubernetes. (Required) Experience utilizing system monitoring tools, such as Grafana & Prometheus. (Required) Experience working within a form of the Agile development framework...
-
Remote, Czech Republic Superdevs Full timeWhat you bring 5+ years of experience in SRE, System Engineering or DevOps Experience in the administration and operation of production systems Strong knowledge of Linux administration, database management, and cloud infrastructures (AWS) Practical experience with databases such as PostgreSQL, MySQL, or Aerospike Experience working with version control...
-
Site Reliability Engineer II
3 days ago
Remote, Kraków, Czech Republic Akamai Technologies Full timeHave academic or industrial experience and PhD or master's degree in computer science or another quantitative field. Have an analytical background with experience in statistical data analysis. Demonstrate experience with coding in one or more of the following languages (Python, Java, C/C++). Possess experience with Internet protocols (DNS/HTTP/TLS/TCP). Have...
-
Director of Engineering, Platform @ Glia
5 days ago
Remote, Czech Republic Glia Full timePrevious hands-on experience in software engineering and technical operations in the cloud. Passion for technology - staying up to date with the newest tools, trends, and best practices in the engineering domain (cloud infrastructure, SRE, and Developer Experience) Experience managing multiple teams and Engineering Managers. The role As a Director of...
-
Site Reliability Engineer
7 days ago
Remote, Czech Republic Stackmine Full timemin. 5-letnie doświadczenie na podobnym stanowisku, doświadczenie w monitorowaniu aplikacji i systemów rozproszonych, znajomość narzędzi observability: Datadog, New Relic, Dynatrace, Grafana + Tempo/Faro/Alloy, praktyczna znajomość distributed tracing, APM, RUM i Synthetic monitoring, doświadczenie w definiowaniu i monitorowaniu SLI/SLO, dobre...
-
Senior Design Engineer
5 days ago
Remote, Warszawa, Czech Republic Antal Full timeMinimum of 5 years of experience in designing protection, control, SCADA, and auxiliary systems for substations. Master’s Degree in Electrical Engineering from an accredited University. Proficiency in AutoCAD, MicroStation, and EPLAN. Strong knowledge of Protection Relays Related Codes (IEC, IEEE, ANSI) and local Power and Energy Systems (PES) regulations....
-
Senior Software Engineer, CI/CD
5 days ago
Remote, Czech Republic Glia Full timeRequirements System Design: Strong experience in building and maintaining reliable and highly available systems. Infrastructure Background: Experience with cloud infrastructure (AWS), running containerised applications (e.g., Kubernetes), and modern CI/CD platforms (e.g., Github Actions, Gitlab CI). Project Leadership: Prior experience leading complex...
-
DevOps Engineer @ Link Group
3 days ago
Remote, Warszawa, Czech Republic Link Group Full timeRequirements: Proven experience as a DevOps Engineer with a focus on application delivery. Strong knowledge of CI/CD tools (e.g., Jenkins, Azure DevOps, GitHub Actions, or similar). Experience with cloud-based environments, ideally Microsoft Azure. Good understanding of mobile app deployment processes (Android/iOS). Familiarity with containerization...