Remote Opportunity

AI Evaluation Scientist

Join BMO as a senior professional working remotely from Canada. Explore the role, benefits, and apply in one place.

Full Time
CAD 103.2k - CAD 192k
2 weeks ago
Canada
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
Deep Learning
+5 more

Job Description

Application Deadline:

04/29/2026

Address:

100 King Street West

Job Family Group:

Data Analytics & Reporting

About the Team

BMO’s Applied AI team is responsible for building high‑performing, safe, and reliable AI systems that power real banking experiences. The Evaluations group within Applied AI develops the methods, datasets, and tooling that measure quality, safety, and performance across the full AI lifecycle. Working closely with product, engineering, and research partners, the team ensures evaluation signals are deeply embedded into training loops, deployment workflows, and continuous monitoring processes. This group operates at the intersection of data science, machine learning, and responsible AI, enabling scalable, repeatable, and trustworthy evaluation of advanced AI systems.

About the Role

The AI Evaluation Scientist is an individual contributor role focused on delivering the data science stream of AI evaluations. This includes designing, implementing, and productionizing evaluation methods, metrics, and datasets that directly influence modeling decisions, product quality, and the safety posture of AI systems across the bank. You will work hands‑on with complex models—particularly LLMs and deep learning systems—developing rigorous empirical analyses that surface model weaknesses, performance trends, and risk signals.

In this role, you will translate evaluation standards into robust, maintainable evaluation code and workflows. You will collaborate with engineers to integrate evaluation signals into CI/CD and training pipelines, and work with product and research partners to ensure evaluation insights meaningfully shape model improvements. This position is highly technical, experimental, and delivery‑oriented, with a strong emphasis on applied data science, reproducible experimentation, and responsible AI practices.

Key Responsibilities

  • Design and implement advanced evaluation methods for LLMs and ML systems, including robustness, reliability, fairness, explainability, calibration, and safety‑and-performance-focused metrics.
  • Build and maintain high‑quality evaluation datasets, golden sets, challenge sets, and red‑teaming corpora tailored to real banking workflows.
  • Develop reusable evaluation harnesses and pipelines that support multi‑agent workflows, tool use, and retrieval‑augmented generation scenarios.
  • Conduct empirical analyses, including statistical tests, error analysis, and ablation studies, to identify model weaknesses and guide model and product improvements.
  • Integrate evaluation metrics and signals into model training loops, deployment gating checks, and continuous monitoring processes.
  • Prototype and validate novel evaluation algorithms inspired by current research in LLM safety, interpretability, and reliability, and convert prototypes into maintainable components.
  • Produce clear, actionable evaluation reports that translate technical findings into insights for engineering, modeling, product, and business stakeholders.
  • Collaborate with engineering, research, and product teams to align evaluation requirements and deliver production‑ready evaluation capabilities.
  • Ensure reproducibility and reliability of evaluation results through dataset versioning, configuration control, testing practices, and documentation.

Qualifications

  • 7+ years of experience in data science, machine learning, or AI development, with at least 3 years focused on evaluation, safety, reliability, or model performance analysis.
  • Master’s or PhD in Computer Science, Data Science, Statistics, Engineering, or a related quantitative field, or equivalent practical experience.
  • Strong proficiency in Python and SQL, with experience using PyTorch or TensorFlow, scikit‑learn, and modern data science libraries.
  • Demonstrated experience building evaluation pipelines for LLMs or ML systems, including metric implementation, dataset creation, and CI/CD integration.
  • Solid understanding of statistical testing, calibration, sampling design, and error analysis.
  • Experience with evaluation of RAG systems, tool‑use workflows, long‑context scenarios, adversarial/jailbreak attacks, toxicity/bias detection, or privacy/PII leakage tests.
  • Familiarity with MLOps/LLMOps practices, including experiment tracking, artifact management, and cloud‑based ML infrastructure.
  • Strong communication skills with the ability to translate complex evaluation findings for both technical and non‑technical audiences.
  • Experience with interpretability or fairness techniques (e.g., SHAP, counterfactuals, model probing) is an asset.
  • Contributions to research or open‑source projects in evaluation, safety, reliability, or interpretability are an asset.

Salary:

$103,200.00 - $192,000.00

Pay Type:

Salaried

The above represents BMO Financial Group’s pay range and type.

Salaries will vary based on factors such as location, skills, experience, education, and qualifications for the role, and may include a commission structure. Salaries for part-time roles will be pro-rated based on number of hours regularly worked. For commission roles, the salary listed above represents BMO Financial Group’s expected target for the first year in this position.

BMO Financial Group’s total compensation package will vary based on the pay type of the position and may include performance-based incentives, discretionary bonuses, as well as other perks and rewards. BMO also offers health insurance, tuition reimbursement, accident and life insurance, and retirement savings plans. To view more details of our benefits, please visit: https://jobs.bmo.com/global/en/Total-Rewards

About Us

At BMO we are driven by a shared Purpose: Boldly Grow the Good in business and life. It calls on us to create lasting, positive change for our customers, our communities and our people. By working together, innovating and pushing boundaries, we transform lives and businesses, and power economic growth around the world.

As a member of the BMO team you are valued, respected and heard, and you have more ways to grow and make an impact. We strive to help you make an impact from day one – for yourself and our customers. We’ll support you with the tools and resources you need to reach new milestones, as you help our customers reach theirs. From in-depth training and coaching, to manager support and network-building opportunities, we’ll help you gain valuable experience, and broaden your skillset.

To find out more visit us at https://jobs.bmo.com/ca/en.

BMO is committed to an inclusive, equitable and accessible workplace. By learning from each other’s differences, we gain strength through our people and our perspectives. Accommodations are available on request for candidates taking part in all aspects of the selection process. To request accommodation, please contact your recruiter.

Note to Recruiters: BMO does not accept unsolicited resumes from any source other than directly from a candidate. Any unsolicited resumes sent to BMO, directly or indirectly, will be considered BMO property. BMO will not pay a fee for any placement resulting from the receipt of an unsolicited resume. A recruiting agency must first have a valid, written and fully executed agency agreement contract for service to submit resumes.

Requirements

  • Design and implement advanced evaluation methods for LLMs and ML systems
  • Build and maintain high-quality evaluation datasets, golden sets, challenge sets, and red-teaming corpora
  • Develop reusable evaluation harnesses and pipelines that support multi-agent workflows, tool use, and retrieval-augmented generation scenarios
  • Conduct empirical analyses, including statistical tests, error analysis, and ablation studies, to identify model weaknesses and guide model and product improvements
  • Integrate evaluation metrics and signals into model training loops, deployment gating checks, and continuous monitoring processes
  • Prototype and validate novel evaluation algorithms inspired by current research in LLM safety, interpretability, and reliability
  • Produce clear, actionable evaluation reports that translate technical findings into insights for engineering, modeling, product, and business stakeholders

Benefits

  • 401k Matching
  • Certification Support
  • Flexible Hours
  • Gym Membership
  • Health Insurance
  • Home Office Budget
  • Learning Budget
  • Paid Time Off

Skills

Python
Machine Learning
Deep Learning
LLMs
Data Science
statistics
Data Analysis
Programming

Ready to Apply?

Join BMO today

Salary Range
CAD 103.2k - CAD 192k
Posted 2 weeks ago

More AI Governance & Programs roles you might like

Discover similar opportunities from companies that are also hiring remotely.

Full Time
$9149.346k - $1372.402k
20 hours ago
Worldwide
AI Governance & Programs
Senior
Python
Microsoft Entra ID
Azure AD
+5 more
Full Time
$225k - $280k
23 hours ago
Worldwide
AI Governance & Programs
Senior
Python
Data Science
Machine Learning
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Generative AI
+5 more

Explore more remote openings

Browse fresh listings from our global community of remote-friendly teams.

Full Time
$120,000 - $180,000*
14 hours ago
United Kingdom
Worldwide
Model Risk Management & Validation
Senior
Solvency II
Internal Model
Model Risk
+4 more
Full Time
$120,000 - $180,000*
23 hours ago
Worldwide
AI Security & Privacy
Senior
AI/ML
Security Architecture
Cloud Security
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Canada
Worldwide
Model Risk Management & Validation
Senior
Python
SAS
SQL
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
Copilot Studio
Power Automate
Power Apps
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Ireland
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Data Science
+4 more
Full Time
$120,000 - $180,000*
1 day ago
United States
Worldwide
AI Governance & Programs
Senior
Model Risk Management
AI Governance
Compliance
+4 more
Full Time
CAD 94.6k - CAD 176k
1 day ago
Canada
Worldwide
AI Risk & Controls
Senior
Finance
Model Validation
Risk Management
+3 more
Full Time
$106.23k - $145k
1 day ago
Worldwide
AI Governance & Programs
Senior
AI/ML
Cloud Security
Data Science
+5 more
Full Time
$148k - $274.2k
1 day ago
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Data Science
+3 more
Full Time
$228.911k - $471.286k
1 day ago
Worldwide
AI Compliance & Legal
Senior
AI
Machine Learning
Data
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
Enterprise Security Architecture
+5 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
Enterprise Security Architecture
+5 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
Python
Node.JS
Go
+3 more
Full Time
$147k - $211k
2 days ago
Worldwide
AI Governance & Programs
Senior
C++
API design
Stubby
+3 more
Full Time
$80,000 - $140,000*
2 days ago
Worldwide
AI Risk & Controls
Mid
Data Analysis
Risk Assessment
Automation
+2 more
Full Time
$120,000 - $200,000*
2 days ago
Worldwide
AI Risk & Controls
Senior
Python
Django
Kubernetes
+5 more
Full Time
$98.16k - $159.27k
2 days ago
United States
AI Security & Privacy
Senior
Azure Security Engineer
Microsoft Cybersecurity Architect
CISSP
+4 more
Full Time
$108.75k - $200k
2 days ago
United States
Model Risk Management & Validation
Senior
Quantitative models
Risk Management
Model Risk Management
+4 more
Full Time
$108k - $185k
2 days ago
United States
Worldwide
AI Compliance & Legal
Senior
Artificial Intelligence
Generative AI
Machine Learning
+3 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Generative AI
LLMs
prompt engineering
+5 more
Full Time
$229.9k - $262.4k
2 days ago
United States
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
Cloud Computing
+3 more
Full Time
$120,000 - $180,000*
2 days ago
Canada
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Risk Management
+3 more
Full Time
$80,000 - $140,000*
2 days ago
Germany
Worldwide
Model Risk Management & Validation
Mid
Model Validation
Risk Management
Model Development
+5 more
Full Time
$100,000 - $150,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
AI
Data governance
Regulatory Compliance
+5 more
Full Time
$120,000 - $180,000*
2 days ago
United Kingdom
Worldwide
AI Governance & Programs
Senior
Cloud
AI
API
+3 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
Data Analysis
+4 more
Full Time
$108k - $185k
2 days ago
United States
Worldwide
AI Compliance & Legal
Senior
Artificial Intelligence
Generative AI
Machine Learning
+3 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Data security & privacy
Access control and permissions
API integrations
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
AWS
Azure
GCP
+5 more
Contract
$120,000 - $180,000*
4 days ago
Worldwide
AI Governance & Programs
Senior
ISO/IEC 42001
AI Lifecycle Governance
Data governance
+4 more
Contract
$120,000 - $180,000*
4 days ago
Worldwide
AI Governance & Programs
Senior
ISO/IEC 42001
Audit Methodology
Artificial Intelligence Management Systems
+1 more
Full Time
$102.8k - $210.2k
4 days ago
Worldwide
AI Compliance & Legal
Senior
Data Analysis
Business Analysis
Data Management
+5 more
Full Time
$197.4k - $246.75k
4 days ago
Worldwide
AI Governance & Programs
Senior
Machine Learning
Generative AI
Python
+5 more
Full Time
$149.084k - $218.657k
4 days ago
Worldwide
AI Security & Privacy
Senior
AI/ML systems
Governance
Model Risk Management
+5 more
Full Time
$80,000 - $160,000*
5 days ago
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Data Science
+5 more
Full Time
$80,000 - $160,000*
5 days ago
Worldwide
AI Governance & Programs
Senior
Pharmacy
Retail Operations
Customer Service
+5 more