DevOps Manager

Apply Now

Company: ECS

Location: Fairfax, VA 22030

Description:

ECS is seeking a DevOps Manager to work in our Fairfax, VA office.

ECS is seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency's (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better awareness and visibility into their security posture and cyber threats. ECS is responsible for designing, building, deploying, operating, and maintaining a complete 'Data Services' solution which includes the collection, normalization, visualization, and sharing of cyber data from more than 100 Federal agencies. The CDM Data Services product is an integrated suite of multiple Commercial Off the Shelf (COTS) products, software configuration packages, and custom code which work together to operate as an integrated solution tailored to meet Department of Homeland Security (DHS) requirements.

We are seeking professionals who thrive in a dynamic, fast-paced, and highly collaborative environment where problem-solving, critical thinking, and a holistic approach to serving the mission are key. Our program operates within the Scaled Agile Framework (SAFe). An aptitude and enthusiasm for continuous learning, improvement, and cyber security is a must!

The DevOps Manager role encompasses two closely related disciplines for our program: Release Engineering and Site Reliability Engineering. Release Engineering is accountable for producing a repeatable process for building and deploying solutions. Site Reliability Engineering ensures the reliability, availability, and performance of our critical production environments. The successful candidate will work closely with development and operations teams to implement best practices in DevOps, automate infrastructure, and maintain scalable and resilient systems.

The successful candidate will design, implement, and maintain systems that are resilient, highly available, and performant. They will set up comprehensive monitoring and logging systems using the Elastic Stack, Prometheus, Grafana, and other tools to ensure the continuous performance of services. Additionally, they will respond to incidents, perform root cause analysis, and implement solutions to prevent recurrence.

The DevOps Manager will be responsible for developing and managing infrastructure as code (IaC) using tools like Terraform and CloudFormation. They will also design and implement CI/CD pipelines to enable reliable and repeatable processes for building, packaging, releasing, and deploying software. This work requires close collaboration with software engineers to integrate reliability and observability into the software development lifecycle.

Continuous improvement is a key focus. The DevOps Manager role requires a focus on continuous improvement, identifying areas for enhancement, and driving initiatives to improve system reliability, scalability, and efficiency. Responsibilities include creating and maintaining detailed documentation, providing training to team members on reliability best practices, and ensuring that the team is well-equipped to maintain the high standards set for system performance and reliability. Ensuring that systems adhere to security policies and compliance requirements is also crucial.

Leadership and team management are core aspects of this role. The DevOps Manager will lead, mentor, and manage a team of DevOps Engineers and Site Reliability Engineers, fostering a culture of continuous improvement and professional growth. Regular performance evaluations, constructive feedback, and career development support for team members are essential.

  • US citizenship with ability to obtain Public Trust Suitability
  • Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience)
  • 8+ years of experience in DevOps, Site Reliability Engineering, Release Engineering, or a related field executing on the defined responsibilities
  • 2+ years of recent experience building and managing a team of 5 or more engineers with differing levels of experience and expertise
  • 2+ years of experience deploying and monitoring solutions for Federal customers including experience with FISMA compliance and Federal configuration and change management policies
  • Expertise in configuration management tools (Ansible, Puppet, Chef)
  • Experience with infrastructure as code (Terraform, CloudFormation)
  • Experience with Test Automation Frameworks (Cucumber, JUnit, Selenium)
  • Experience with CI/CD pipelines (Jenkins, GitLab CI, CircleCI)
  • Proficiency in cloud platforms (AWS, GCP, Azure)
  • Proficiency in scripting languages (Python, Bash, Go)
  • Strong knowledge of containerization and orchestration (Docker, Kubernetes)
  • Comprehensive understanding of monitoring and logging tools (Elastic Stack, Prometheus, Grafana)
  • Strong problem-solving and analytical skills
  • Excellent communication and collaboration abilities
  • Ability to work independently and as part of a team
  • Attention to detail and a proactive approach to identifying and solving issues

Similar Jobs