Remote Opportunity

Director of Engineering - AI Evaluations & Experimentation

Join Salesforce as a senior professional working remotely from United States. Explore the role, benefits, and apply in one place.

Full Time
$237.7k - $344.7k
4 months ago
United States
AI Governance & Programs
Senior
AI
Machine Learning
Python
+4 more

Job Description

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.

Job Category

Software Engineering

Job Details

About Salesforce

Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn’t a buzzword — it’s a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate about bettering business and the world through AI, driving innovation, and keeping Salesforce's core values at the heart of it all.

Ready to level-up your career at the company leading workforce transformation in the agentic era? You’re in the right place! Agentforce is the future of AI, and you are the future of Salesforce.

Overview of the Role

We are seeking a Director of Engineering to lead our AI Agent Evaluation and Experimentation Platform team. In this role, you'll own the end-to-end evaluation and experimentation lifecycle for both agentic systems and traditional ML models. You'll be part of Salesforce's AI Engineering organization, working at the forefront of the agentic era as we build Agentforce—the future of AI-powered CRM. Your team will be responsible for building the critical infrastructure that ensures we ship high-quality, safe, and performant AI systems with confidence.

Responsibilities

  • Define and execute the technical vision for evaluation and experimentation across AI agents and traditional ML models

  • Own offline evaluation, regression testing, scenario-based simulations, and multi-turn agent testing infrastructure

  • Build automated evaluation systems including LLM-as-Judge, rule-based scoring, and hybrid evaluation approaches

  • Design and operate online evaluation, observability, and continuous performance monitoring for agent behavior

  • Lead development of self-service evaluation and experimentation tooling for agent workflows, tool use, memory, and planning

  • Support experimentation for both real-time agents and batch or online traditional ML models

  • Integrate evaluation and experimentation pipelines into CI/CD workflows and release quality gates

  • Drive adoption of evaluation and experimentation best practices across engineering and AI teams

  • Set technical direction, review designs, and raise the bar on engineering quality

  • Lead and develop a senior engineering team, fostering innovation and excellence

  • Partner with AI research, product, security, and Responsible AI teams on evaluation and experimentation strategy


Through this role, you'll gain deep experience building large-scale AI infrastructure, shape the future of how Salesforce evaluates and ships AI systems, and make a direct impact on the quality and reliability of AI products used by millions of customers worldwide.

Required Qualifications

  • A related technical degree required

  • 10+ years of engineering experience, with 5+ years leading AI/ML teams

  • Proven ability to lead senior engineers and engineering managers

  • Experience building and operating experimentation platforms for AI systems or ML products

  • Strong understanding of LLM-based agentic architectures and traditional ML systems

  • Experience designing experimentation frameworks for online and offline ML workflows

  • Experience building evaluation systems for models and agents, including offline tests, regression suites, online monitoring, and LLM-as-a-Judge-style approaches

  • Strong background in AI agents and LLM systems, including tool use, multi-step workflows, RAG, prompt and policy management, and common agent failure modes

  • Experience evaluating agent behavior across multi-step workflows and tool-using systems

  • Hands-on experience designing evaluation frameworks for AI systems

  • Experience with offline benchmarking, regression testing, and scenario-based evaluation

  • Experience with automated evaluation approaches such as LLM-as-Judge and hybrid scoring systems

  • Experience with online experimentation methods including A/B testing, shadow testing, and canary deployments

  • Experience integrating evaluation and experimentation into CI/CD pipelines and release gating

  • Experience with data pipelines, metrics systems, and observability tooling

  • Strong cross-functional communication and stakeholder alignment skills


Preferred Qualifications

  • A master's or Ph.D. degree in computer science, machine learning, artificial intelligence, or related field

  • Experience with data and ML platforms (e.g., Snowflake-centric workflows, feature stores, training pipelines)

  • Experience working in high-scale production AI/ML environments


Benefits & Perks

Check out our benefits site which explains our various benefits, including wellbeing reimbursement, generous parental leave, adoption assistance, fertility benefits, and more.

Unleash Your Potential

When you join Salesforce, you’ll be limitless in all areas of your life. Our benefits and resources support you to find balance and be your best, and our AI agents accelerate your impact so you can do your best. Together, we’ll bring the power of Agentforce to organizations of all sizes and deliver amazing experiences that customers love. Apply today to not only shape the future — but to redefine what’s possible — for yourself, for AI, and the world.

Accommodations

If you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.

Posting Statement

Salesforce is an equal opportunity employer and maintains a policy of non-discrimination with all employees and applicants for employment. What does that mean exactly? It means that at Salesforce, we believe in equality for all. And we believe we can lead the path to equality in part by creating a workplace that’s inclusive, and free from discrimination. Know your rights: workplace discrimination is illegal. Any employee or potential employee will be assessed on the basis of merit, competence and qualifications – without regard to race, religion, color, national origin, sex, sexual orientation, gender expression or identity, transgender status, age, disability, veteran or marital status, political viewpoint, or other classifications protected by law. This policy applies to current and prospective employees, no matter where they are in their Salesforce employment journey. It also applies to recruiting, hiring, job assignment, compensation, promotion, benefits, training, assessment of job performance, discipline, termination, and everything in between. Recruiting, hiring, and promotion decisions at Salesforce are fair and based on merit. The same goes for compensation, benefits, promotions, transfers, reduction in workforce, recall, training, and education.

In the United States, compensation offered will be determined by factors such as location, job level, job-related knowledge, skills, and experience. Certain roles may be eligible for incentive compensation, equity, and benefits. Salesforce offers a variety of benefits to help you live well including: time off programs, medical, dental, vision, mental health support, paid parental leave, life and disability insurance, 401(k), and an employee stock purchasing program. More details about company benefits can be found at the following link: https://www.salesforcebenefits.com.

At Salesforce, we believe in equitable compensation practices that reflect the dynamic nature of labor markets across various regions. The typical base salary range for this position is $237,700 - $344,700 annually. In select cities within the San Francisco and New York City metropolitan area, the base salary range for this role is $237,700 - $344,700 annually. The range represents base salary only, and does not include company bonus, incentive for sales roles, equity or benefits, as applicable.

Requirements

  • Define and execute the technical vision for evaluation and experimentation across AI agents and traditional ML models
  • Own offline evaluation, regression testing, scenario-based simulations, and multi-turn agent testing infrastructure
  • Build automated evaluation systems including LLM-as-Judge, rule-based scoring, and hybrid evaluation approaches
  • Design and operate online evaluation, observability, and continuous performance monitoring for agent behavior
  • Lead development of self-service evaluation and experimentation tooling for agent workflows, tool use, memory, and planning
  • Support experimentation for both real-time agents and batch or online traditional ML models
  • Integrate evaluation and experimentation pipelines into CI/CD workflows and release quality gates
  • Drive adoption of evaluation and experimentation best practices across engineering and AI teams

Skills

AI
Machine Learning
Python
Java
Cloud Computing
APIs
Data Science

Ready to Apply?

Join Salesforce today

Salary Range
$237.7k - $344.7k
Posted 4 months ago

More AI Governance & Programs roles you might like

Discover similar opportunities from companies that are also hiring remotely.

Full Time
$7048.161k - $1061.802k
21 hours ago
United States
Worldwide
AI Governance & Programs
Senior
Python
SQL
AI/ML
+4 more
Full Time
$135k - $150k
23 hours ago
Worldwide
AI Governance & Programs
Mid
Python
Machine Learning
LLM
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
Data governance
AI Policy
Risk Management
+5 more

Explore more remote openings

Browse fresh listings from our global community of remote-friendly teams.

Full Time
$120,000 - $180,000*
23 hours ago
Worldwide
Americas
AI Security & Privacy
Senior
AI
Machine Learning
Cyber Security
+3 more
Full Time
$150,000 - $250,000*
1 day ago
Worldwide
AI Security & Privacy
Senior
OWASP ZAP
Nmap
Postman
+5 more
Full Time
$150k - $200k
1 day ago
Worldwide
AI Governance & Programs
Mid
AI
Python
Clinical AI
+5 more
Full Time
$120,000 - $180,000*
1 day ago
Australia
Worldwide
AI Governance & Programs
Senior
Data governance
AI Ethics
Regulatory Compliance
+3 more
Full Time
$120,000 - $180,000*
1 day ago
Australia
Worldwide
AI Governance & Programs
Senior
Data governance
AI Ethics
Regulatory Compliance
+3 more
Full Time
$85k - $95k
1 day ago
United States
Model Risk Management & Validation
Senior
Model Risk Management
Quantitative Risk Management
Financial Modeling
+4 more
Full Time
$85k - $95k
1 day ago
United States
Model Risk Management & Validation
Senior
Model Risk Management
Quantitative Risk Management
Financial Modeling
+5 more
Full Time
$80,000 - $140,000*
1 day ago
United States
AI Risk & Controls
Mid
Excel
SQL
Python
+1 more
Full Time
$80,000 - $120,000*
1 day ago
United States
Model Risk Management & Validation
Mid
Excel
SQL
Python
+1 more
Full Time
$129k - $175k
2 days ago
Worldwide
AI Audit / Assurance / Controls Testing
Senior
API
Automation
Python
+3 more
Full Time
$129k - $175k
2 days ago
Worldwide
AI Audit / Assurance / Controls Testing
Senior
API
Automation
Python
+3 more
Full Time
$119.7k - $191.1k
2 days ago
Worldwide
AI Governance & Programs
Senior
Risk Management
Model Risk
Governance
+5 more
Full Time
$120,000 - $180,000*
2 days ago
Ireland
Worldwide
AI Compliance & Legal
Senior
Data Protection
AI Compliance
Regulatory Requirements
+3 more
Full Time
$100,000 - $150,000*
2 days ago
Worldwide
AI Governance & Programs
Mid
AI/ML Concepts
Tableau
JIRA
+1 more
Full Time
$204k - $255k
2 days ago
Worldwide
AI Policy, Enablement & Training
Senior
AI
Machine Learning
Policy Development
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Staff
Python
ISO 27001
ISO 27701
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Staff
Python
ISO 27001
ISO 27701
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Staff
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
Up to PHP 150k
2 days ago
Worldwide
AI Security & Privacy
Senior
PyTorch
TensorFlow
Containerized Environments
+4 more
Full Time
Up to PHP 150k
2 days ago
Worldwide
AI Security & Privacy
Senior
PyTorch
TensorFlow
Gradient-based attacks
+4 more
Full Time
$209k - $309k
4 days ago
Worldwide
AI Security & Privacy
Senior
API
AI
Security
+1 more
Full Time
$239.5k - $351.5k
4 days ago
Worldwide
AI Security & Privacy
Senior
API
AI
Security
+1 more
Full Time
$230k - $280k
4 days ago
United States
Worldwide
AI Governance & Programs
Senior
OWASP
NIST AI RMF
AI/ML systems
+5 more
Full Time
$230k - $280k
4 days ago
Worldwide
AI Governance & Programs
Senior
Agentic Trust Framework
OWASP
NIST AI RMF
+5 more
Full Time
$120,000 - $180,000*
5 days ago
Worldwide
AI Security & Privacy
Senior
Python
Go
Git
+5 more
Full Time
$159.3k - $273.2k
5 days ago
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
Data Science
+5 more
Full Time
$120,000 - $180,000*
5 days ago
Worldwide
AI Security & Privacy
Staff
Python
Go
Threat modeling
+3 more
Full Time
$80,000 - $140,000*
5 days ago
Worldwide
AI Governance & Programs
Mid
Responsible AI
ISO/IEC 42001
ISO/IEC 27001
+2 more
Full Time
$120,000 - $180,000*
6 days ago
United States
Worldwide
AI Governance & Programs
Senior
AI Ethics
Risk Management
AI governance frameworks
+5 more
Full Time
$120,000 - $180,000*
6 days ago
Worldwide
AI Security & Privacy
Senior
Security Operations
Cybersecurity
NG-SIEM
+5 more
Full Time
$163k - $237k
6 days ago
Worldwide
AI Governance & Programs
Senior
API
Product Management
AI
+4 more
Full Time
$80,000 - $140,000*
6 days ago
United States
Worldwide
AI Governance & Programs
Mid
Python
Data Analysis
Financial Data
+3 more
Full Time
$80,000 - $140,000*
6 days ago
United States
Worldwide
AI Governance & Programs
Mid
Python
Data Analysis
Machine Learning
+2 more
Full Time
$80,000 - $140,000*
6 days ago
Worldwide
AI Governance & Programs
Mid
Python
Excel
Google Sheets
+4 more
Full Time
$120,000 - $180,000*
6 days ago
Australia
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Data Science
+4 more
Full Time
$120,000 - $180,000*
6 days ago
Worldwide
AI Governance & Programs
Senior
AI Governance
Model Risk Management
Regulatory Compliance
+5 more
Full Time
$120,000 - $180,000*
1 weeks ago
Worldwide
AI Governance & Programs
Senior
Python
ML frameworks
LLM/GenAI tooling
+2 more