Senior Data Scientist - Data for Perception Machine Learning
Domain
Tech Stack
Must-Have Requirements
- ✓Master's or PhD degree in a field relevant to autonomous driving (computer science, robotics) or the analysis of human data (computational neuroscience, cognitive science) or a related field
- ✓Proficient using data query languages (SQL and/or Spark/scala) to quickly build complex yet efficient data queries at scale
- ✓Proficient in Python to build production-quality code
- ✓Proficient in exploratory data analysis (EDA) and data visualization
- ✓Background in statistical modeling and analysis
- ✓Experience with data-centric ML development and data curation
Nice to Have
- -Experience with experiment design and statistical comparisons (A/B testing, parametric/non-parametric statistics, etc.)
- -Experience with human data collection, including annotation task design
Description
We are seeking an experienced and highly skilled data scientist to join the Perception Data and Labeling team.. The team is responsible for training and evaluation data powering the perception (vision, lidar, and other modalities) ML models at Zoox. The candidate will work alongside data ops partners, ML engineers, software developers, and data engineers to improve model performance through high quality human- and auto-labeled data.
In this role, you will
Define and implement scalable data quality measures across complex, multimodal data labeling pipelines Drive data-centric ML model improvements to achieve critical Zoox milestones Support an org-wide data ontology and class structure for perception models Determine trade-offs and integrations between human-labeled, human-in-the-loop, and zero-shot autolabeled data Build metrics to quantify labeling throughput, capacity, and annotator/vendor quality
Qualifications
Master's or PhD degree in a field relevant to autonomous driving (computer science, robotics) to the analysis of human data (computational neuroscience, cognitive science) or a related field Proficient using data query languages (SQL and/or Spark/scala) to quickly build complex yet efficient data queries at scale and using Python to build production-quality code Proficient in exploratory data analysis (EDA) and data visualization to understand and present trends and their implications for the business. Background in statistical modeling and analysis; including experience making data-driven decisions that connect point and uncertainty estimates to business impact. Experience with data-centric ML development and data curation
Bonus Qualifications
Experience with experiment design and statistical comparisons (A/B testing, parametric/non-parametric statistics, etc.) Experience with human data collection, including annotation task design