Senior Data Engineer (Data + Applied AI)
Description
About Plume
Plume is a passion-fueled, mission-driven company that is trans-founded with a vision to transform healthcare for every trans life. We hope to make gender-affirming hormone therapy easily accessible at the touch of a button in every state of the US.
This work is deeply personal and heart-driven, and we want teammates who care about the mission and the people we serve. For the right candidates, we present a rare opportunity to do well by doing good. Plume offers an affirming, trans-centered, culturally inclusive, and fun work environment filled with purpose.
Responsibilites
Building and maintaining production-grade data pipelines in cloud data warehouses such as Google BigQuery or equivalent, following architectural standards set by the Director of Data and AI.
Designing and developing dbt models across bronze, silver, and gold layers, including a focus on quality and governance via automated tests, documentation, and incremental load strategies.
Creating and optimizing Airflow DAGs for data workflow orchestration, including scheduling, dependency management, error handling, and alerting.
Implement dimensional data models and data mart structures — guided by the team's modeling standards — that support clinical BI and ML feature consumption.
Crafting easy-to-understand visualizations and dashboards that align with commonly used business analytic standards in Looker or equivalent BI tools in close collaboration with product analytics, finance, operations, growth, and clinical stakeholders.
Integrating healthcare data from sources such as EHRs, Stripe, 3rd-party APIs, and application database feeds, normalizing incoming data into the unified data platform.
Applying HIPAA-compliant data handling practices, including PHI/PII masking, tokenization, audit logging, and role-based access controls across all pipeline and AI system work. Architecting and implementing RAG pipelines — including document ingestion, chunking, embedding generation, and retrieval — using frameworks such as LangChain or LangGraph Supporting MLOps workflows, including model training pipeline maintenance, deployment support, performance monitoring, and retraining triggers Code reviewing PRs from teammates, providing constructive technical feedback to peers, and upholding the team's engineering standards.
Collaborating closely with product managers to understand requirements and deliver reliable data and AI products.
Monitoring and triaging assigned pipeline and data quality failures, escalating architectural issues as appropriate.
Documenting pipeline designs, data models, and technical decisions in alignment with the team's governance and lineage tracking standards.
Evaluating new tools and frameworks, providing hands-on prototyping and technical assessments.
Must-Have
Requirements
5+ years of hands-on experience in data engineering, analytics engineering, or a closely related role.
2+ years of experience working within the healthcare industry, including working knowledge of healthcare data standards, clinical workflows, regulated data environments, and domain-specific data visualizations.
Working knowledge of HIPAA — including PHI/PII classification, data masking, audit logging, and access control requirements.
Proven production experience with at least one major cloud data warehouse: BigQuery, Snowflake, or Redshift — including advanced SQL and query optimization. Strong hands-on experience with dbt (Core or Cloud), including incremental models, tests, documentation, and multi-environment workflows. Deep experience with Apache Airflow for workflow orchestration, including DAG design, scheduling, monitoring, and failure handling. Demonstrated knowledge of dimensional data modeling
- star/snowflake schemas, SCD Types 1/2, fact and dimension table design.
Hands-on experience delivering dashboards and reports in at least one enterprise BI tool: Looker, Power BI, Tableau, Qlik, etc. Proficiency in Python for data pipeline development, API integrations, and automation (Pandas, PySpark, or similar). Practical exposure to RAG pipeline development and LLM integration using LangChain, LangGraph, or LlamaIndex Hands-on exposure to MLOps concepts
- model deployment, monitoring, and retraining workflows
Knowledge of CI/CD tooling for data and AI workloads (GitHub Actions, dbt Cloud CI) Strong understanding of data quality and governance principles: lineage, access controls, data contracts, and automated testing and experience with data governance tools such as OpenMetadata Excellent written and verbal communication skills with the ability to collaborate effectively across engineering, analytics, and clinical teams Ability to work independently on assigned workstreams while keeping the Director and team informed of progress, blockers, and risks
Nice-to-have
Experience with real-time or streaming data pipelines using Kafka, Kinesis, or Pub/Sub, particularly for ADT or clinical event feeds. Knowledge of vector databases such as Pinecone, Weaviate, FAISS, or Chroma Familiarity with responsible AI principles, including bias detection and model explainability in a healthcare context Experience with data observability tools such as Monte Carlo, Bigeye, or Soda Familiarity with data lakehouse patterns (Delta Lake, Iceberg, Apache Hudi) Experience working toward or maintaining SOC2 or HITRUST certification Familiarity with semantic layer tools (Looker LookML, dbt Semantic Layer) Experience with population health, revenue cycle, or clinical quality reporting datasets Exposure to Kubernetes or containerized ML workloads