Remote Opportunity

Research Scientist, AI Evaluation Science

Join Apple as a senior professional working remotely from Worldwide. Explore the role, benefits, and apply in one place.

Full Time
$120,000 - $180,000*
4 months ago
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
Data Science
+4 more

Job Description

AI systems are only as trustworthy as the methods used to evaluate them. At Apple, where AI powers experiences for billions of people, getting evaluation right is not a support function—it is a foundational science. Our team, part of Apple Services Engineering, is building that scientific foundation: rigorous, scalable evaluation methodology for LLMs, agentic systems, and human-AI interaction. What makes this team unusual is its interdisciplinary core. You will work alongside measurement scientists (psychometrics, validity theory), ML researchers, and platform engineers—bringing together ML research, statistical rigor, and production engineering. We are looking for a Research Scientist who treats evaluation methodology itself as a first-class research problem—someone with deep technical fluency in preference learning, reward modeling, or calibration theory, and the drive to advance the field while solving real problems at scale. We're hiring at multiple levels (early-career to senior researchers). What unites all candidates is depth of thinking about evaluation as a research problem. DESCRIPTION This is primarily a research role. You will formulate open problems in evaluation science, design experiments, publish findings, and drive projects from conception through completion. While you will also partner with platform engineers to ensure your methods are productionized into SDKs and APIs, the focus of the role is original research. Our research team brings together ML scientists and measurement scientists to tackle evaluation as both a machine learning and a measurement problem, building methods that are technically innovative and scientifically valid. You will also work closely with a platform engineering team that translates research into production-ready SDKs and APIs used across Apple. The successful candidate will have a strong publication record in evaluation-adjacent ML areas and a demonstrated ability to implement complex methods from recent papers, run large-scale experiments, and communicate results to both technical and non-technical audiences. MINIMUM QUALIFICATIONS Ph.D. in Computer Science, Machine Learning, or a closely related field, with a research focus in evaluation-adjacent areas (preference learning, RLHF, human feedback, calibration, automated assessment) Strong publication record at top-tier conferences (NeurIPS, ICML, ICLR, ACL, EMNLP), including first-author publications demonstrating independent research contributions Deep technical expertise in at least one evaluation-adjacent ML area, with strong mathematical foundations: preference learning and reward modeling (RLHF, DPO, reward hacking, specification gaming); OR calibration theory, proper scoring rules, and statistical reliability; OR human-AI interaction methodology (active learning, annotation quality, preference elicitation) Demonstrated ability to implement complex methods from recent papers and run large-scale experiments Track record of translating research into practical systems—prototypes, tools, or methods adopted by others Excellent written and verbal communication skills, including the ability to write clear research papers and explain complex concepts to diverse audiences PREFERRED QUALIFICATIONS Publications specifically on evaluation methodology—papers about how to evaluate, not just papers that use evaluation to demonstrate model improvements Strong hands-on experience with modern ML frameworks (PyTorch, JAX, or TensorFlow) and training or fine-tuning large language models Experience with theoretical foundations of evaluation: measurement theory and validity frameworks, statistical learning theory (calibration, reliability, decision theory), or preference elicitation and aggregation Specific research experience in one or more of: reward modeling and RLHF for alignment; LLM-as-judge approaches (calibration, rubric design, bias mitigation); benchmark design and validation (IRT, contamination detection); human evaluation methodology (protocol design, quality control); or agentic and multi-agent system evaluation Demonstrated passion for evaluation as a research area: conference presentations, workshops, or tutorials on evaluation topics; open-source contributions to evaluation tools or benchmarks; active engagement with the evaluation research community Experience with cross-disciplinary research, such as collaboration with social scientists, psychometricians, or domain experts

Requirements

  • Ph.D. in Computer Science, Machine Learning, or a closely related field
  • Strong publication record at top-tier conferences
  • Deep technical expertise in at least one evaluation-adjacent ML area
  • Demonstrated ability to implement complex methods from recent papers
  • Track record of translating research into production-ready SDKs and APIs

Benefits

  • 401k Matching
  • Health Insurance
  • Paid Time Off
  • Flexible Hours
  • Remote Work
  • Stock Options
  • Training Budget
  • Wellness Programs

Skills

Python
Machine Learning
Data Science
Calibration Theory
Reward Modeling
Preference Learning
Human-AI Interaction

About AI-Estimated Salary

The salary range shown was not provided by the employer. Our AI has estimated it based on the job title, required experience, location, and industry standards (confidence: 80%). This estimate should be used as a general guide only and may not reflect the actual compensation. Always confirm salary details directly with the employer during the application process.

Ready to Apply?

Join Apple today

Salary Range (AI-Estimated)*
$120,000 - $180,000
80% confidence
Posted 4 months ago

More AI Governance & Programs roles you might like

Discover similar opportunities from companies that are also hiring remotely.

Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Data governance
AI Governance
Metadata Management
+5 more
Contract
$100,000 - $150,000*
2 days ago
United Kingdom
Worldwide
AI Governance & Programs
Senior
AI
Data governance
Compliance
+2 more
Full Time
$100,000 - $150,000*
2 days ago
Worldwide
AI Governance & Programs
Mid
Cybersecurity
AI
Machine Learning
+4 more

Explore more remote openings

Browse fresh listings from our global community of remote-friendly teams.

Full Time
1 day ago
United States
AI
Senior
Full Time
1 day ago
United States
AI
Senior
Python
AWS
Git
Full Time
GBP 91k - GBP 106k
1 day ago
United States
AI
Senior
Python
SQL
API
Full Time
1 day ago
United States
AI
Executive
AWS
Git
API
Full Time
1 day ago
United States
AI
Executive
AWS
Git
API
Full Time
CAD 92.6k - CAD 142.6k
2 days ago
United States
AI
Senior
Git
API
Full Time
$95.1k - $169.8k
2 days ago
United States
AI
Senior
Git
API
Full Time
$124k - $140k
2 days ago
United States
AI
Senior
Git
API
Full Time
$124k - $140k
2 days ago
United States
AI
Senior
Git
API
Full Time
$95.1k - $169.8k
2 days ago
United States
AI
Senior
Git
API
Full Time
CAD 92.6k - CAD 142.6k
2 days ago
United States
AI
Senior
Git
API
Full Time
CAD 92.6k - CAD 142.6k
2 days ago
United States
AI
Senior
Git
API
Full Time
$95.1k - $169.8k
2 days ago
United States
AI
Senior
Git
API
Full Time
$95.1k - $169.8k
2 days ago
United States
AI
Senior
Git
API
Full Time
CAD 92.6k - CAD 142.6k
2 days ago
United States
AI
Senior
Git
API
Full Time
2 days ago
United States
AI
Senior
Git
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
AI/ML
Threat modeling
Python
+3 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
Cybersecurity
Machine Learning
Cloud Security
+2 more
Full Time
$80,000 - $150,000*
2 days ago
United Kingdom
Worldwide
AI Governance & Programs
Lead
Python
SQL
SAS
+3 more
Full Time
$80,000 - $150,000*
2 days ago
United Kingdom
Worldwide
AI Governance & Programs
Lead
Python
SQL
SAS
+3 more
Full Time
$80,000 - $150,000*
2 days ago
United Kingdom
Worldwide
AI Governance & Programs
Lead
Python
SQL
SAS
+3 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Lead
Python
Node.JS
Go
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
Python
Pandas
Transformers
+4 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
Python
Pandas
Transformers
+4 more
Full Time
GBP 190k - GBP 225k
3 days ago
Worldwide
AI Security & Privacy
Senior
Compliance
Networking
Data Architecture
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
AI Security
Machine Learning
Python
+4 more
Contract
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
SQL
Informatica Intelligent Data Management Cloud (IDMC)
Informatica CLAIRE AI
+5 more
Full Time
$138.2k - $224.6k
3 days ago
Worldwide
AI Security & Privacy
Senior
AI/ML Risks
Model Connectivity & Secure Deployment
AI Lifecycle Security
+5 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Governance & Programs
Senior
Machine Learning
Data governance
Model Risk
+5 more
Full Time
$120,000 - $180,000*
4 days ago
United Kingdom
AI Governance & Programs
Staff
AI Governance
Cloud Security
Product Management
+2 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI Security
DevSecOps
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI Security
DevSecOps
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI Security
DevSecOps
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI Security
DevSecOps
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI Security
DevSecOps
+4 more
Full Time
$120,000 - $200,000*
4 days ago
Worldwide
AI Governance & Programs
Senior
Generative AI
Swift
Objective C
+4 more