Skip to content

Forward Deployed Reliability Engineer

Palantir Technologies
New York, NYhybridFeb 10, 2026·Posted 2 months ago
View Application Page

Domain

Tech Stack

PythonJavaSQLSparkPalantir Foundry

Must-Have Requirements

  • Background in Computer Science, Engineering, Information Systems, or other technical field
  • U.S. citizen or green card holder
  • Proficiency in Python, Java, and SQL
  • Excellent written and verbal communication skills

Nice to Have

  • -Familiarity with parallel data processing and Spark job optimization
  • -Experience with root cause analysis and documenting solutions
  • -Experience working independently and collaboratively on technical challenges

Description

A World-Changing Company

Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.

The Role

As a Forward Deployed Reliability Engineer (FDRE), you ensure stability and reliability of mission-critical workflows built on Palantir software. You gather signal by going on call — resolving problems before the customer is impacted — and use those learnings to drive product change, shape our internal tooling, and refine our operational processes such that we provide an increasing quality of service to more and more customers.

Your approach is hands-on and pragmatic: you’ll rapidly address issues as they arise with quick and effective solutions and advocate for workflow or product improvements after the immediate issue is resolved. You are energized by engaging directly with problems, from writing a script to automate a manual task, to finding creative workarounds, or building a case for a product enhancement. You don’t just fix issues— you look for opportunities to simplify, automate, and make the entire system more resilient.

An FDRE synthesizes learnings from support into best practices for others to follow. These are captured into documentation and shared with the team and broader organization. In this way, you raise the bar for reliability and efficiency across Palantir.

Core Responsibilities Develop a deep understanding of Palantir's products and operational processes Go on-call, responding quickly and effectively to mission-critical incidents Diagnose, resolve, and proactively prevent issues encountered in the field Collaborate with internal stakeholders to increase the scalability and reliability of Foundry workflows for our customers Identify recurring pain points and inefficiencies, and take initiative to automate or streamline workflows Advocate for and implement product enhancements based on insights gleamed from the field Create clear, actionable documentation and share best practices to elevate team and company-wide reliability Note: While active work is not required on weekends or outside business hours, you must be available to respond to critical outages during assigned on-call weeks.

What We Value Ability to work independently and collaboratively to solve ambiguous technical and operational challenges Excellent written and verbal communication skills, capable of interacting effectively with both technical and non-technical stakeholders. Proficiency in Python, Java, and SQL Familiarity with parallel data processing and Spark job optimization Strong organizational skills and attention to detail, with the ability to prioritize effectively Resourcefulness and creativity in fast-paced dynamic environments Experience with root cause analysis and documenting solutions for broader impact Enthusiasm for hands-on problem solving, continuous improvement, and knowledge sharing

What We Require Background in Computer Science, Engineering, Information Systems, or other technical field Must be a US citizen or green card holder