13 days old

DevOps/Site Reliability Engineer

Cisco Systems Inc.
Ottawa, ON K1P

Who we are

Developer Experience (DevX) at Cisco is on a mission to provide best-in-class tools, infrastructure and services for frictionless developer experience for delivering highest quality onPrem, Hybrid and multi-cloud products & software services to our customers. We are a horizontal organization servicing all the three major business - Enterprise networking & cloud, Data Center & Service provider.


What you will do

We are looking for a DevOps/SRE Engineer for our Workflow orchestration tools team that drives pre-commit and post-workflows for developers . This is an exciting opportunity to join a fast-paced team focused on innovation and customer success, to design, build, deliver & operate Workflows/Builds-as a Service & developer tools that provide world-class experience to all the 10,000+ developers across Cisco. In this high impact role, you will have the freedom to embrace and extend cutting edge OSS tools and technologies working with a very talented group of engineers.


As part of a development team, youll work closely with both application feature developers, IT system administrators and storage engineers.

  • Monitor and maintain a pool of 700 build hosts in conjunction with IT. Identify operational issues and implement long-term solutions to prevent reoccurrence. During system outages and degradations, identify and root-cause faults and work closely with many development and IT teams on quick resolutions
  • Identify and implement technical solutions to optimize compute and storage requirements for our fleet of hardware pool
  • Measure the performance and operational reliability of a large scale, distributed commit automation tool. Identity system failures, performance bottlenecks and regressions
  • Analyze and understand the performance and scale characteristics of a large commit automation system requiring hundreds of physical compute nodes and petabytes of network storage
  • Demonstrate Strong Ownership & Relevance
  • Participate in Requirements, Planning, Design, Review discussions
  • Analyze customer feedback and customer issues, participate actively in customer support activities which include triage & providing resolutions

Who you are


Someone with systematic problem-solving approach with strong communication skills and a sense of ownership and drive

Bachelor's in Computer Science or equivalent with 5-7 years' experience in designing, analyzing, and troubleshooting large-scale distributed systems

3+ years of DevOps/SRE experience in managing and operating large scale projects, managing 1000s of Servers & storage infrastructure. Managing public cloud (computer/storage services) is a plus

5 years of programming experience in any of the languages: Python/Perl/C/C++/Java

In-depth working knowledge of build systems(make, Cmake, distCC) , CI/CD systems (such as Jenkins, CircleCI etc.)

Good internal understanding and hands-on knowledge in Linux (RHEL8), networking, storage and security

Experience managing on-call rotation and resolution of issues establishing SLA/SLO/SLI for the service



Cisco Covid-19 Vaccination Policy
The health and safety of Cisco's employees, customers, and partners is a top priority. Our goal is to protect and mitigate the spread of COVID-19 infection for strong business resiliency during the pandemic. Therefore, Cisco requires all new hires to be fully vaccinated against COVID-19 in the U.S., unless otherwise prohibited by applicable law, and in countries where COVID-19 vaccination is legally required. The company will consider legally required accommodations/exceptions for medical, religious, and other reasons as per the requirements of the role and in accordance with applicable law. Additional information will be provided to candidates about the requirements and accommodation process at the offer time based on region.


Posted: 2022-05-13 Expires: 2022-06-12

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

DevOps/Site Reliability Engineer

Cisco Systems Inc.
Ottawa, ON K1P

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast