fbpx

Site Reliable Engineer

We keep on growing and we would like you to be our new Site Reliable Engineer.

Type of employment: Full-time 

Job summary

Do you want to build software that impacts millions of customers around the world, tackling some of the world’s most complex ecommerce challenges? We are looking for talented infrastructure backend developers to join our partners Core Infrastructure department from Amsterdam HQ with strong MySQL, Python and Kafka skills.
In Core Infrastructure they design, build and operate all the technology that their product development teams need in order to deliver great travel products to their customers.
This includes, for instance, their on premise data centers, their cloud hosted Kubernetes clusters, MySQL/Cassandra/Elasticsearch database environment, HAProxy load balancers, Envoy service mesh, APigee gateway, Kafka streaming service, Hadoop big data storage, Graphite time series, Grafana dashboard platform, monitoring & alerting tools, CI/CD tooling, Perl/Java/Node.js language frameworks and more…
In Application Data Services they are operating a fleet of thousands of database instances in hundreds of replication hierarchies, some with hundreds of members, some with sizes up to hundred Terabytes, and some with a transaction rate that pushes the boundaries of what the hardware can do. They are also taking care of developer needs, providing services to automate grant management, data ownership, online schema management, and monitoring and alerting. They are using Python and Go, CI/CD in GitLab, Puppet, and some bits and pieces in other languages and systems.
Of course, this is only possible because the provisioning, maintenance and operations of these servers is automated. And maintaining, improving and refactoring this automation is what the SRE job is about: They code their way out of problems where operations are concerned, addressing availability, scalability, latency, and efficiency challenges within the vast infrastructure of the client.

You are responsible to: 

  • You will impact millions of people all over the globe with your creative solutions
  • You will be working in one of the biggest e-commerce companies in the world
  • You will solve interesting problems at scale by writing and deploying code across tens of thousands of servers
  • You will have the opportunity to collaborate with many of the world’s leading SREs
  • You will be free to launch your own ideas and solutions within our complex production environment
  • Their automation is written in Python and Go, and is interfacing with a number of systems, among them our Puppetry, Openstack, Kubernetes, PowerDNS, Graphite, Prometheus, Zookeeper, and many more.

Important aspects of the job include:

  • It’s MySQL, thousands of instances in hundreds of replication hierarchies, some of them seeing substantial load, the foundation of our Application Data Infrastructure
  • It’s automated. But as our systems are evolving, this automation needs improvement, extension and refactoring to meet the changing requirements of a different environment.
    It’s Python, and Go. And being at the center of most, if not all applications, it is literally talking to everything else
  • It’s moving to all the platforms, including Openstack, Kubernetes and the public cloud
  • It’s dynamic. With automated capacity testing, restore testing, failover testing and disaster recovery testing, it needs to be able to adapt to planned and unplanned changes in the production conditions and environments
  • Sometimes it has problems. Sometimes their customers make problems. Good monitoring and alerting are required to be aware of problems as they develop, or ideally before they develop.
    It’s in multiple data centers, ours and in the public cloud. Replication and communication over long distances pose their own scaling and performance problems
  • As SRE in the data infrastructure team, you will be responsible for planning, building, improving and refactoring solutions that solve these problems. You will also share the on-call rotation and be an escalation contact for incidents. You will be working in close collaboration with multi-functional teams in Core Infrastructure and in the Application Teams

What will you bring to the role?

  • Experience in managing a production Kafka cluster on a very large scale environment. That is, do you know how to scale up, or react when there is an issue like “disk space quickly filling up”?
  • This cluster in the current infrastructure is moving 1TB of data per second to give you some background information
  • The operational experience in maintaining the above Kafka cluster and familiar with ITIL best practices (incident, problem and change management)
  • Experience in writing production java code to make extensions on control plane written in java
  • Familiar with the best practices that comes with role as SRE

Nice to have:

  • Experience in automation, capacity management achieved java, puppet, terraform, etc.
  • Participation in the stand-by rotation schedule, 24/7 support
  • Exposure in coding on scale
  • Experience with Private cloud solutions
  • Knowledge Python programming and scripting

We want you to love Mondays, so:

  • You can count on private medical insurance
  • You can count on the conference budget
  • You can count on 23 vacation days per year
  • You can count on the budget for education
  • You can count on personal and professional growth opportunities
  • You can count on a friendly working environment
  • You can count on flexible working hours
  • You can count on a competitive salary
  • You can count on coffee, tea, and refreshments when in the office
  • You can count on working on a great product
  • You can count on fully remote mode (or join us on-site)
  • You can count on the referral fees

Why ProDevs?

ProDevs is a fast growing software company headquartered in Belgrade, Serbia. We build web, desktop, and mobile applications for businesses, from small startups to large enterprises, and we’re looking for a motivated developer to help us grow.

We’re a medium-sized team that’s growing fast, so everyone that joins has a direct impact on the direction and success of the company. We strive for an open, flat, collaborative, work-hard play-hard environment.

Curious?

Join us and advance your career, apply now! Send us your CV to career@prodevs.rs.

Site Reliable Engineer

Business Development Manager

Vesna Radović

Business Development Manager

Address

Njegoseva 45, 5th floor 11000 Belgrade

Email

info@prodevs.rs

Office

+38160 033-38-07

Our focus is on building and maintaining long-term partnerships and business relationships, while taking care of our and your business reputation.

Follow Us

    Apply for a job


    Footer

    Business Development Manager

    Vesna Radović calendar

    Business Development Manager

    Address

    Njegoseva 45, 5th floor 11000 Belgrade

    Email

    info@prodevs.rs

    Office

    +38160 033-38-07

    Our focus is on building and maintaining long-term partnerships and business relationships, while taking care of our and your business reputation.

    Follow Us