Software Engineer, System Software

Apply Now

Company: Etched

Location: San Jose, CA 95123

Description:

Software Engineer, System Software
We are seeking a highly skilled and motivated System Software Engineer to join our team, responsible for the foundational software that powers our server infrastructure. This role focuses on the development, integration, and debugging of critical system software components, including BIOS, BMC firmware, boot processes (including NetBoot), root of trust implementations, advanced system logging, and kernel-mode drivers. You will play a pivotal role in ensuring the reliability, security, and performance of our server platforms, and contribute to the integration of data center orchestration technologies at the node level.

Key Responsibilities:
  • Firmware and Boot Process Development: Design, develop, and maintain BIOS and BMC firmware, ensuring robust and efficient server boot processes, including NetBoot implementations.
  • Measure and Tune System Performance Configuration: Analyze DRAM timings, PCIe configurations, power state transitions etc. to ensure high performance and maximal reliability.
  • Root of Trust and Security: Implement and maintain security features, including root of trust mechanisms, to protect system integrity and data security.
  • Kernel-Mode Driver Development and Debugging: Develop and debug kernel-mode drivers, ensuring seamless hardware integration and optimal system performance.
  • Advanced System Logging and Diagnostics: Design and implement advanced system logging and diagnostic capabilities to facilitate efficient troubleshooting and performance analysis.
  • Data Center Orchestration Integration: Integrate and optimize node-level data center orchestration technologies, such as Kubernetes and Docker, into the system software stack.
  • System Validation and Testing: Develop and execute comprehensive test plans to validate system software functionality, stability, and performance.
  • Collaboration and Troubleshooting: Collaborate with hardware and software teams to diagnose and resolve complex system-level issues.


Representative Projects:
  • Implement and validate secure boot processes, including root of trust verification.
  • Develop and debug kernel-mode drivers for new hardware peripherals.
  • Design and implement advanced system logging and monitoring solutions.
  • Optimize BIOS and BMC firmware for improved boot times and system stability.
  • Integrate node-level container orchestration capabilities into the system software.
  • Analyze and resolve complex system-level issues related to boot failures, hardware errors, and performance degradation.
  • Analyze and optimize system level logging for large scale server deployments.
  • Implement and debug NetBoot processes for large server deployments.


Must-Have Skills and Experience:
  • Proficiency in C/C++.
  • Strong understanding of BIOS and BMC firmware architectures.
  • Experience with server boot processes (EFI, UEFI), and NetBoot technologies.
  • Knowledge of root-of-trust and security principles.
  • Experience with kernel-mode driver development and debugging.
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures.
  • Experience with advanced system logging and diagnostic tools.
  • Ability to analyze complex technical problems and provide effective solutions.
  • Excellent communication and collaboration skills.
  • Experience with version control systems (e.g., Git).
  • Experience with reading and interpreting hardware logs.


Nice-to-Have Skills and Experience:
  • Experience with data center orchestration technologies (Kubernetes, Docker).
  • Experience with hardware diagnostic tools and techniques.
  • Knowledge of server virtualization.
  • Experience with tracing tools like perf, eBPF, ftrace, etc.
  • Experience with performance testing and benchmarking tools (gProf, vTune, Wireshark, etc.).
  • Experience with CI/CD pipelines.
  • Experience with Rust.


Ideal Background:
  • Candidates with experience in developing and debugging BIOS and BMC firmware.
  • Candidates with experience in implementing root of trust and security features.
  • Candidates with experience in kernel-mode driver development and debugging.
  • Candidates with experience in integrating data center orchestration technologies at the node level.
  • Candidates with experience in large scale server deployments.
  • Candidates who have debugged complex server boot issues.

    Benefits
    • Full medical, dental, and vision packages, with 100% of premium covered
    • Housing subsidy of $2,000/month for those living within walking distance of the office
    • Daily lunch and dinner in our office
    • Relocation support for those moving to West San Jose


    How we're different

    Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.

Similar Jobs