Sr. Linux Administrator / Cloud Engineer
Apply NowCompany: Diamondpick
Location: Dallas, TX 75217
Description:
Job Summary:
We are seeking a skilled and experienced Linux Administrator with expertise in Suse Pacemaker cluster management and multi-cloud administration. The ideal candidate will be responsible for managing, maintaining, and optimizing our Linux-based infrastructure, ensuring high availability, scalability, and security across on-premises and multi-cloud environments. This role requires a deep understanding of Linux systems, clustering technologies, and cloud platforms to support our mission-critical applications and services.
Key Responsibilities:
Linux System Administration: Pacemaker Cluster Management: Multi-Cloud Administration:
Experience:
Soft Skills:
We are seeking a skilled and experienced Linux Administrator with expertise in Suse Pacemaker cluster management and multi-cloud administration. The ideal candidate will be responsible for managing, maintaining, and optimizing our Linux-based infrastructure, ensuring high availability, scalability, and security across on-premises and multi-cloud environments. This role requires a deep understanding of Linux systems, clustering technologies, and cloud platforms to support our mission-critical applications and services.
Key Responsibilities:
- Install, configure, and maintain Linux servers (Suse, Red Hat, CentOS, Ubuntu, etc.).
- Perform system monitoring, troubleshooting, and performance tuning.
- Manage user accounts, permissions, and access controls.
- Apply patches, updates, and security configurations to ensure system integrity.
- Develop and maintain scripts (Bash, Python, etc.) to automate routine tasks and improve operational efficiency.
- Implement Infrastructure as Code (IaC) practices for consistent and repeatable deployments.
- Design and implement disaster recovery plans for Linux systems and cloud environments.
- Manage backup solutions and ensure data integrity and availability
- Implement and enforce security best practices for Linux systems and cloud environments.
- Conduct regular security audits and vulnerability assessments.
- Ensure compliance with industry standards and regulations (e.g., GDPR, HIPAA, PCI-DSS).
- Design, implement, and manage high-availability (HA) clusters and DR using Pacemaker and Corosync.
- Configure and maintain resource agents, constraints, and failover mechanisms.
- Monitor cluster health and resolve issues related to node failures, resource allocation, and quorum.
- Perform regular testing and failover drills to ensure cluster reliability.
- Integrate Pacemaker with other technologies such as DRBD, iSCSI, NFS, and Apache.
- Manage and optimize workloads across multiple cloud platforms (e.g., AWS, Azure, GCP, Oracle Cloud).
- Implement and maintain cloud infrastructure, including virtual machines, storage, and networking.
- Automate cloud deployments and management using tools like Terraform, Ansible, or CloudFormation.
- Ensure seamless integration between on-premises and cloud environments.
Experience:
- 10+ years of experience in Linux system administration.
- 6+ years of experience managing multi-cloud environments of OCI and AWS.
- 5+ years of hands-on experience with Pacemaker and Corosync for HA clustering.
- Technical Skills:
- Proficiency in Linux operating systems (Suse, OEL, Red Hat, CentOS, Ubuntu).
- Strong knowledge of Pacemaker, Corosync, and resource agents.
- Experience with cloud platforms (AWS, Azure, GCP) and their services (EC2, S3, VPC, etc.).
- Familiarity with automation tools like Ansible, Terraform, or Puppet.
- Scripting skills in Bash, Python, or similar languages.
- Knowledge of networking, storage, and virtualization technologies.
Soft Skills:
- Strong problem-solving and analytical skills.
- Excellent communication and collaboration abilities.
- Ability to work independently and as part of a team.
- Proactive and self-motivated with a focus on continuous improvement.