Senior Staff Engineer, Cloud Site Operations
Domain
Tech Stack
Must-Have Requirements
- ✓10+ years in Data Center Operations, Systems Engineering, or HPC hardware
- ✓Expert-level understanding of x86/GPU server architecture
- ✓Expert-level understanding of electrical distribution
- ✓Experience with hardware maintenance at scale
- ✓Proven experience in operational governance
Description
Crusoe is on a mission to accelerate the abundance of energy and intelligence . As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability. Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.
About the Role
As the Senior Staff Engineer for Data Center Operations, you are the technical architect and strategic "right hand" to the Director of Data Center Operations. You will bridge the gap between high-level hardware engineering and ground-level execution, ensuring our AI fleet—from our current H200 and Blackwell (GB200) clusters to upcoming GB300 and Rubin architectures—is the most reliable and maintainable in the world. This is a high-impact role focused on operational maturity, technical governance, and the systems that power our global white space.
What You'll Be Working On
Operational Governance & Metrics
Oversee the technical health of our global ticket queue. Partner with internal teams to develop real-time dashboards and track the KPIs/SLAs (MTTR, fleet availability, sparing accuracy) that measure our operational maturity.
Fleet Supportability & Tooling
Partner with the Fleet Engineering team to define the software access, diagnostic hooks, and physical tooling required for maximum repair efficiency. Act as the primary advocate for "serviceability" within the white space.
Power Topology Strategy
Lead the initiative to map end-to-end "Power Strings," from main distribution down to cabinet PDUs. Lead the Build vs. Buy analysis to determine whether we develop internal mapping tools or procure a third-party solution.
Operational Resilience
Architect the framework for our Business Continuity (BCP) and Disaster Recovery (DR) plans. Define the technical protocols for hardware recovery and site-level failovers to ensure minimal disruption to our AI Cloud customers.
Technical Advisory & Documentation
Provide expert guidance and architectural "sign-off" to the internal Documentation Committee. Ensure all break-fix SOPs and technical playbooks are accurate, safe, and optimized for global scale.
Advanced Escalation & Mentorship
Serve as the final technical authority for systemic or complex hardware failures. Mentor senior technicians and site leads, elevating the collective technical IQ of the global operations team.
What You'll Bring to the Team
Technical Mastery
10+ years in Data Center Operations, Systems Engineering, or HPC hardware, with an expert-level understanding of x86/GPU server architecture and electrical distribution.
The "Supportability" Mindset
Proven experience in hardware maintenance at scale. You know how to translate field challenges into technical requirements for Engineering and Fleet teams to minimize downtime.
Hardware Expertise
Deep familiarity with high-density AI infrastructure, including current NVIDIA H200 and Blackwell (GB200) systems, with the ability to architect support strategies for the transition to GB300 and Rubin platforms.
Data-Driven Leadership
Expert proficiency in defining operational KPIs and building dashboards (e.g., Tableau, Grafana) to drive "Operational Maturity."
Strategic Decision Making
Experience performing Build vs. Buy analyses for technical tools and infrastructure software, justifying decisions with clear ROI and technical requirements.
Communication
Exceptional ability to distill complex technical risks, ticket-queue trends, and infrastructure hurdles into clear, actionable strategies for senior leadership.
Benefits
Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Compensation Range Compensation will be paid in the range of up to $179,000 -$218,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.