Staff DevOps Engineer — Developer Infrastructure - US (Remote)
Domain
Tech Stack
Must-Have Requirements
- ✓6-10+ years in DevOps, Platform Engineering, or SRE roles building and operating production systems at scale
- ✓Expertise with Kubernetes (EKS) and AWS (IAM, VPC, ECR, SSM/Secrets Manager, S3, SQS, Lambda, RDS/Aurora)
- ✓Strong IaC experience (Terraform preferred) and GitOps workflows (Argo CD or similar)
- ✓CI/CD depth (CircleCI, GitHub Actions, or similar) including caching/parallelism, artifact management, test reliability, and pipeline observability
- ✓Excellent cross-team communication skills
Nice to Have
- -Active user of AI development tools (Claude Code, Codex, etc) in infrastructure workflow
- -Proven track record building ephemeral environments, developer tooling, or internal platforms
- -Experience with load testing frameworks (k6, Locust, Gatling, or similar)
- -Examples of building mock or stub infrastructure for integration testing at scale
- -Experience with release strategies (canary/blue-green, automated rollbacks) and progressive delivery
- -Observability fundamentals (Datadog, OpenTelemetry) with ability to define SLIs/SLOs
Description
Luxury Presence is building the AI growth platform for real estate. Backed by Bessemer Venture Partners and other top investors, we're a Series C company on track to hit $100M in annual recurring revenue in the next six months. More than 87,000 real estate professionals, including over 30% of the WSJ Real Trends top 100 agents in the United States, use us to run and grow their business.
What You’ll Do
Agent and developer environment infrastructure
Design and operate ephemeral, pre-warmed development environments that agents and engineers can spin up on demand. Extend our internal CLI (luxp) so that a new engineer or an AI agent can run luxp local start and have a working, validated environment in minutes — with service discovery, dependency management, and local configuration handled automatically. Build environment parity monitoring to ensure dev environments match production behavior.
Pre-production quality gates
Own the infrastructure-level gates that prove a deploy is safe before it reaches production. Build and operate automated load testing, performance benchmarking, and security scanning gates in the pipeline. Partner with QA and engineering to expand gate coverage across services — the gates apply equally to all contributions regardless of author (or agent).
Pre-PR validation infrastructure
Build containerized mock services (generated from OpenAPI specs) so contributors can validate integration code against realistic third-party dependencies locally. Stand up Playwright-based UI validation in agent and CI loops. Create the infrastructure that supports iterative self-refinement — where an agent or engineer can run their output, capture what failed, and iterate before opening a PR.
Internal tooling and dashboards
Build the review tooling, metrics dashboards, and operational controls that make our pipelines observable and improvable (especially at increased throughput). Surface scoring signals, approval rate trends, gate pass rates, and common failure modes. Create the policy layer that defines per-component or per-task-type what the approval requirements are.
What We’re Looking For 6–10+ years in DevOps, Platform Engineering, or SRE roles building and operating production systems at scale. Active user of AI development tools (Claude Code, Codex, etc) in your infrastructure workflow. We use AI assistants daily for Terraform changes, Kubernetes debugging, automation scripting, and operational investigations. You should be someone who reaches for these tools naturally and has opinions about where they help and where they don't. Expertise with Kubernetes (EKS) and AWS (IAM, VPC, ECR, SSM/Secrets Manager, S3, SQS, Lambda, RDS/Aurora). Strong IaC experience (Terraform preferred) and GitOps workflows (Argo CD or similar). Proven track record building ephemeral environments, developer tooling, or internal platforms (CLIs, scaffolding tools, developer portals). Experience with load testing frameworks (k6, Locust, Gatling, or similar) and automating performance gates in CI/CD pipelines. Examples of building mock or stub infrastructure for integration testing at scale — containerized services, API mocking, dependency isolation. CI/CD depth (CircleCI, GitHub Actions, or similar) including caching/parallelism, artifact management, test reliability, and pipeline observability. Experience with release strategies (canary/blue-green, automated rollbacks) and progressive delivery. Observability fundamentals (Datadog, OpenTelemetry) with the ability to define SLIs/SLOs and wire them to delivery decisions. Excellent cross-team communicator who can translate platform constraints into developer-friendly solutions and documentation.
Tech Stack Infrastructure : AWS, EKS, Terraform, ArgoCD, Docker, Vault
CI/CD
: CircleCI, ArgoCD (GitOps), Github Actions Messaging : Kafka (Confluent Cloud) Observability : Datadog, OpenTelemetry Languages/Apps : Node.js/TypeScript microservices, Python jobs, React front-ends
How You’ll Succeed Here You think about infrastructure as a product — you talk to the engineers using your tools, measure adoption, and iterate based on what you learn. You're energized by building systems that multiply other people's output, not just keeping the lights on. You bias toward automation, reproducibility, and measurable outcomes. If a human is doing it repeatedly, you build a gate or a tool. You operate with high ownership across team boundaries: Infrastructure, DevEx, QA, and product engineering are all your collaborators. You use AI tools to move faster without sacrificing rigor. You know when to trust the output and when to verify, and you help the team develop better patterns for AI-assisted infrastructure work.