1+ months

Site Reliability Engineer

Cisco Systems Inc.
San Jose, CA 95113

What You'll Do

  • You are deeply motivated Site Reliability engineer with background in DevOps/SRE software development. Ideal candidate must have experience building, shipping and operating software-as-a-service (SaaS) product. The candidate would have Cloud Ready mindset & have exposure to cloud native technologies. This position will enable the monitoring, maintenance & management of public & private cloud infrastructure while providing timely response within designated SLA times.
  • As an SRE you will work closely with our Managed Services Team to diagnose & characterize issues to provide continuous improvement and to develop infrastructure best practices, build highly scalable, fault-tolerant, and easy to administer infrastructure. You must be pro-active and organized, diligent about documentation, and passionate about monitoring and automating everything.
  • This can only be accomplished by a candidate with substantial real-world experience actually building, deploying and operating distributed systems at scale using cloud native technologies.

Who You'll Work With

Cisco is transforming the networking industry. To make this happen, we are heavily investing in team responsible for The Network Intuitive. We are disrupting the industry by building a new networking platform that can learn, adapt, and secure itself at the speed of todays businesses. This Digital Network Architecture platform automates network management and provides our customers with state-of-the-art analytics and insights. This team's innovations span artificial intelligence, machine learning, analytics, IoT, security, automation, and more.

Who You Are

This role is primarily to apply your SRE skills to create complete self-serve Software Delivery Machine. The targeted platform will support vast number of cloud and hybrid customers. The candidate is expected to have strong hands-on skills, will guide and contribute technically to the Infrastructure platform team.

  • Develop full-fledged software tooling to deliver programmable infrastructure (infrastructure as code)
  • Develop tooling to drive end-to-end micro-services monitoring and management
  • Implement Kubernetes compliance and best practices in terms of security, audits, network policies, reporting
  • Develop Self-service Console to provide infrastructure visibility
  • Passion to automate anything & everything by leveraging cutting edge technologies, best practices from organizations operating at scale


  • Manage the availability, scalability and performance of the platform's infrastructure
  • Create tools and infrastructure leveraged by the rest of the engineering teams
  • Convert other engineering team's application development bottlenecks as an opportunity to automate & scale the tooling of platform's infrastructure
  • Design, Develop, Deploy, Document & Demonstrate monitoring and diagnostic tools applicable for the public & private cloud infrastructure(systems, databases & networks)
  • Triage & Troubleshoot infrastructure issues, identify any product gaps by using it as a consumer and convert the issue as a requirement to the Product Management(PM) team
  • Create and maintain continuous integration and continuous deployment(CI/CD) environments for scaling SaaS applications to multi-region & multi-cloud patterns
  • Work is generally expected to take place during normal working hours, however the Platform Operations Team provides Tier2 and Tier3 services, the candidates should be flexible with schedules to meet the needs and demands of the business.


  • Strong knowledge of core Enterprise LINUX (Red Hat/CentOS) with a focus upon building, maintaining, securing and performance tuning systems.
  • Experience with container management and microservices architectures in Kubernetes, Helm, Docker, and other virtual infrastructure platforms.
  • Experience with scaling web, application, and data systems horizontally and vertically.
  • Building, automating, and maintaining infrastructure in Amazon Web Services.
  • Strong experience with Python, GO, Ansible, Terraform, and working experience with Java & NodeJS are required.
  • Hands On experience with CI/CD tooling - GitHub/GitLab, Jenkins/Spinnaker, ArgoCD/GoCD is highly preferred.
  • Expertise with monitoring, alerting, and incident management, such as Grafana, Prometheus, Alert Manager, Kibana, PagerDuty.
  • Operating experience of real-time data processing pipelines with data ingestion, Kafka, Flink, and ElasticSearch.
  • Experience with SQL/NoSQL systems such as PostgresSQL, MongoDB, Cassandra, RabbitMQ, or Redis.
  • Experience in the development of operational procedures, processes, and scripts
  • Proven experience capacity planning, performance tuning, and infrastructure architecture.

The candidate expected to have strong hands-on skills and will guide and contribute technically to the product.

  • BS/MS in Computer Science or related area
  • Four or more years of relevant work experience
  • Hands on experience working with Kubernetes infrastructure in AWS
  • Kubernetes and AWS Certification is highly preferred
  • Knowledge of Kubernetes internals (clustering, scheduling, controllers, API server, etc)
  • Excellent understanding of container networking and microservices architecture

Why Cisco

At Cisco, each person brings their unique talents to work as a team and make a difference. Yes, our technology changes the way the world works, lives, plays and learns, but our edge comes from our people. We connect everything people, process, data and things and we use those connections to change our world for the better. We innovate everywhere - From launching a new era of networking that adapts, learns and protects, to building Cisco Services that accelerate businesses and business results. Our technology powers entertainment, retail, healthcare, education and more from Smart Cities to your everyday devices. We benefit everyone - We do all of this while striving for a culture that empowers every person to be the difference, at work and in our communities.


Posted: 2021-03-26 Expires: 2021-05-21

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Site Reliability Engineer

Cisco Systems Inc.
San Jose, CA 95113

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast