
Principal Site Reliability Engineer
3 days ago
Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis.
Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact.
Principal Site Reliability Engineer
Role Overview:
Are you ready to take your expertise to the next level and make a meaningful impact on the reliability and scalability of mission-critical systems? As a Principal Site Reliability Engineer (SRE Level V/VI), you will play a central role in ensuring the performance, availability, and resilience of our platforms. In this position, you will go beyond maintaining systems by leading initiatives that redefine operational excellence. You will collaborate with diverse teams to implement cutting-edge technologies and best practices, foster a culture of reliability, and mentor others in their growth as engineers. This is an exceptional opportunity for someone passionate about solving complex challenges and shaping the future of platform reliability in a high-impact role.
Key Responsibilities:
- Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher.
- Drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools.
- Create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery.
- Build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack.
- Collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs.
- Lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues.
- Design and execute performance testing, capacity planning, and scalability strategies for evolving workloads.
- Proactively identify and resolve bottlenecks, increasing system performance and developer efficiency.
- Mentor junior engineers, fostering a collaborative and growth-oriented team environment.
- Guide architectural decisions that drive innovation and enhance system reliability.
Qualifications:
- 10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles.
- Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker).
- Proficiency in programming and scripting languages like Python, Go, and Bash.
- Advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible.
- Deep understanding of networking, DNS, load balancing, and security principles.
- Proven track record of managing high-availability systems in demanding environments.
- Exceptional analytical and problem-solving skills.
Preferred Qualifications:
- Certifications in cloud or container technologies (e.g., AWS/GCP/Azure, Kubernetes CKA).
- Experience in industries like eCommerce, FinTech, or SaaS.
- Familiarity with Agile development processes and frameworks.
What We Offer:
- The opportunity to work with cutting-edge technologies in a transformative environment.
- A collaborative and innovative work culture that values your expertise and contributions.
- Professional growth and leadership development pathways tailored to your aspirations.
- A chance to leave a lasting impact by shaping the future of reliable and scalable systems.
Join us to push the boundaries of platform reliability and drive meaningful change in a fast-evolving digital world
Groupon is an AI-First Company
We're committed to building smarter, faster, and more innovative ways of working—and AI plays a key role in how we get there. We encourage candidates to leverage AI tools during the hiring process where it adds value, and we're always keen to hear how technology improves the way you work. If you're passionate about AI or curious to explore how it can elevate your role—you'll be right at home here.
Groupon's purpose is to build strong communities through thriving small businesses. To learn more about the world's largest local e-commerce marketplace, click here. You can also find out more about us in the latest Groupon news as well as learning about our DEI approach. If all of this sounds like something that's a great fit for you, then click apply and join us on a mission to become the ultimate destination for local experiences and services.
Beware of Recruitment Fraud: Groupon follows a merit-based recruitment process without charging job seekers any fees. We've noticed an increase in recruitment fraud, including fake job postings and fraudulent interviews and job offers aimed at stealing personal information or money. Be cautious of individuals falsely representing Groupon's Talent Acquisition team with fake job offers. If you encounter any suspicious job offers or interview calls demanding money, recognize these as scams. Groupon is not responsible for losses from such dealings. For legitimate job openings (and a sneak peek into life at Groupon), always check our official career website at Groupon Careers
-
Principal Site Reliability Engineer
19 hours ago
Remote, Czech Republic Akamai Full time12 years of relevant experience and a Bachelor's degree or its equivalent in work experiencePossess expertise in Linux internals, deep understanding of hardware and best practices enabling HW features in LinuxPossess advanced level experience with the Linux kernel, OS, and optimization of their configurations for KVM/QEMU virtualizationPossess expert level...
-
Principal Site Reliability Engineer
1 hour ago
Remote, Czech Republic Akamai Full time12 years of relevant experience and a Bachelor's degree or its equivalent in work experiencePossess expertise in Linux internals, deep understanding of hardware and best practices enabling HW features in LinuxPossess advanced level experience with the Linux kernel, OS, and optimization of their configurations for KVM/QEMU virtualizationPossess expert level...
-
Senior Site Reliability Engineering Lead @
19 hours ago
Remote, Czech Republic Akamai Full timeHave 5 years of relevant experience and a Bachelor's Degree in Computer Science or its equivalentPossess expert level experience in a DevOps, Development, or SysAdmin role working with large scale distributed systemsHave experience with building tools for automation and infrastructure at scale(python/go, terraform, saltstack, jenkins)Be able to work in...
-
Senior Site Reliability Engineering Lead @
1 hour ago
Remote, Czech Republic Akamai Full timeHave 5 years of relevant experience and a Bachelor's Degree in Computer Science or its equivalentPossess expert level experience in a DevOps, Development, or SysAdmin role working with large scale distributed systemsHave experience with building tools for automation and infrastructure at scale(python/go, terraform, saltstack, jenkins)Be able to work in...
-
Site Reliability Engineer Mid
19 hours ago
Remote, Czech Republic Akamai Full timeHave relevant experience and a Bachelor's diploma in Computer Science or its equivalentPossess high level experience in a SysAdmin (Linux/Unix Administration), DevOps or Software engineering role, working with large scale distributed systems, ability to understand complex infrastructurePossess knowledge of Linux Kernel, performance tuning, hardening...
-
Site Reliability Engineer Mid
1 hour ago
Remote, Czech Republic Akamai Full timeHave relevant experience and a Bachelor's diploma in Computer Science or its equivalentPossess high level experience in a SysAdmin (Linux/Unix Administration), DevOps or Software engineering role, working with large scale distributed systems, ability to understand complex infrastructurePossess knowledge of Linux Kernel, performance tuning, hardening...
-
Site Reliability Engineer @
1 week ago
Remote, Budapest, Czech Republic Cision Hungary Full timeEssential Skills & Experience3–5 years' experience in SRE, DevOps, or Software Engineering roles.Extensive Kubernetes expertise at scale, with strong containerization knowledge.Hands-on proficiency in Infrastructure as Code (IaC) tools such as Ansible, Puppet and Terraform.Strong coding skills in an OOP language plus the ability to develop effective...
-
Software Principal Engineer
2 days ago
Remote, Warszawa, Czech Republic Dell Technologies Full timeEssential RequirementsBachelor's degree in Computer Science or related field10+ years of proven experience in complicated system, e.g. critical telecom product, OS (multi-threading, locks, scheduling), storage protocols (NFS, CIFS, iSCSI), storage technologies (SAN, NAS, RAID, OSD, snapshot, replication), networking, device drivers, clustering, etc.Strong...
-
Remote Principal Engineer
4 days ago
Remote, Kraków, Czech Republic uSoftware Full timeMinimum Qualifications:At least 5 years of proven experience in software engineering including roles like Senior/Staff/Principal Engineer, Lead Developer, or Software ArchitectExperience in driving projects which influenced at least 2-3 engineering teamsExcellent problem-solving and communication skillsSolid understanding of software fundamentals (Data...
-
Highly Skilled Computer Systems Specialist
7 hours ago
Remote, Czech Republic beBeeSkillset Full time €80,000 - €120,000Job Title: Site Reliability Engineer Mid**Key Responsibilities:**We are seeking a highly skilled professional to fill the role of Site Reliability Engineer. This individual will be responsible for designing, developing, and managing applications and infrastructure that support our products and services.**Requirements:* Strong understanding of computer...