Robotics Infrastructure Engineer
Description
Robotics Infrastructure Engineer
Systems, Infrastructure & Reliability
About the Role
We build robots that run 24/7 in production environments. We're looking for a hands-on engineer to own the reliability, infrastructure, and developer tooling that keeps our fleet running and our engineering team fast. You'll split your time between robot-side systems work, cloud infrastructure, and building automation that multiplies the team's output. A significant portion of this role involves working with AI coding agents. You'll direct autonomous agents to diagnose CI failures, triage production issues, run automated security and compliance checks, and execute multi-step engineering tasks. Knowing how to scope work for an agent, review its output critically, and build tooling that agents can use effectively is as important as writing the code yourself.
What You'll Do
Own robot-side software (Python)
Maintain the on-robot codebase that orchestrates arms, cameras, sensors, and I/O. Debug production hardware/software failures and ship fixes fast
Build and maintain infrastructure as code
Manage cloud infrastructure — identity and access management, CI/CD credentials, secrets, container registries, cluster autoscaling — using declarative configuration and reproducible builds
Drive build system and packaging migrations
Own the transition of robot software packaging to reproducible, hermetic build systems. Maintain machine images, dev environments, and deployment pipelines
Build simulation and testing infrastructure
Develop end-to-end simulation systems that validate robot behavior without physical hardware — camera projection, kinematics, placement validation, fleet-wide calibration
Develop and operate AI-powered engineering automation
Build autonomous agents that run nightly CI triage, security audits, infrastructure compliance checks, and code quality sweeps. Design the interfaces and instructions that make agents effective at real engineering work
Improve observability and health monitoring
Instrument robot software with metrics and structured telemetry. Build alerting that catches problems before humans notice them
Work across the stack
Touch frontend, backend, protobuf definitions, deployment tooling, and cloud services as needed. No part of the system is someone else's problem
What We're Looking For
3+ years of Python in a systems context — not web/ML Python, but the kind where you deal with processes, hardware I/O, async, and real-time constraints
Strong Linux systems knowledge
Memory management, device management, systemd, containers, networking, kernel tuning
Infrastructure as code experience
Declarative infrastructure and configuration management tools. You've managed IAM, CI runners, secrets, and machine images programmatically
Experience with real hardware
Robot arms, depth cameras, grippers, force/torque sensors, pneumatics, or similar
CI/CD ownership
You've not just used CI — you've owned it. Runner infrastructure, flaky test triage, build caching, GPU-enabled pipelines
Comfort with AI coding agents
You've used tools like Claude Code, Cursor, Copilot Workspace, or similar to do real engineering work — not just autocomplete, but directing agents through multi-step debugging, refactoring, and infrastructure tasks. You understand their failure modes and know when to trust vs. verify
Strong debugging instincts
You can go from a vague production symptom to root cause across hardware, OS, network, and application layers
Bias toward shipping over perfecting
You fix, monitor, iterate. Your commit history has more
fix
than
feat
and you're proud of that Nice to Have NixOS or reproducible build system experience Experience building or operating autonomous engineering agents/bots Robotics simulation (kinematics, camera models, physics) gRPC / Protocol Buffers Managed network infrastructure, VPNs, overlay networks Time-series databases and observability stacks About the Work Style This is a high-autonomy, high-output role. On a typical day you might direct an AI agent to triage overnight CI failures while you debug a production robot issue, then spend the afternoon migrating a package to a new build system. You'll write a lot of code, but you'll also write a lot of prompts — and the best candidates will see those as the same skill.