Remote Opportunity

Head of AI Evaluation & Reliability Engineering

Join Codvo.ai as a senior professional working remotely from Worldwide. Explore the role, benefits, and apply in one place.

Full Time
$120,000 - $180,000*
22 hours ago
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
AI
+4 more

Job Description

Head of AI Evaluation & Reliability Engineering Location: Flexible / Hybrid Reports To: Head of Engineering Role Mission Build and scale Codvo’s AI Evaluation & Reliability Engineering capability as a core engineering function supporting the design, validation, and continuous improvement of enterprise AI systems in production. You will architect the frameworks, tooling, benchmark assets, and operational processes required to ensure AI systems deployed by Codvo and its customers meet enterprise standards for reliability, safety, performance, and governance. This role is deeply embedded within engineering and serves as the quality and reliability backbone for Codvo’s AI platform and delivery organization. Why This Role Matters As AI systems move from pilots to business-critical workflows, reliability and evaluation become core engineering disciplines—not optional afterthoughts. Codvo is building the infrastructure and operational rigor required to ensure every AI deployment is measurable, governed, and production-ready. Core Responsibilities Engineering Ownership - Build Codvo’s AI Evaluation & Reliability Engineering function as a core platform/engineering capability. - Define engineering standards for AI evaluation, testing, release gating, and runtime monitoring. - Integrate evaluation/reliability frameworks into Codvo’s engineering and delivery lifecycle. Evaluation Architecture - Design reusable evaluation frameworks for: - LLM / multimodal quality - RAG grounding / evidence fidelity - Agent reasoning / decision quality - Tool / workflow execution success - Safety / policy / compliance adherence - Cost / latency / production economics Benchmark Infrastructure - Build benchmark packs, golden datasets, and regression suites for priority enterprise workflows. - Define benchmark coverage and versioning standards. - Establish processes for edge-case capture and benchmark expansion. Runtime Reliability Systems - Design systems/processes for: - Runtime drift / degradation monitoring - Failure mode analysis / incident diagnostics - Human review / escalation pathways - Continuous evaluation and improvement loops Technical Leadership - Partner closely with platform, product, and solution engineering teams. - Serve as internal SME on AI reliability, benchmark design, and evaluation methodology. - Help shape architecture standards for AI-native product and workflow delivery. Team Leadership - Build and lead a team of: - Evaluation Engineers - Benchmark / QA Engineers - Reliability / Observability Engineers - Domain Review / Feedback Ops Specialists Required Qualifications - 10+ years in engineering / AI / ML leadership roles. - 5+ years building or operating production AI / ML systems. - Proven experience designing or operating: - AI/LLM evaluation frameworks - Benchmark / regression systems - AI QA / testing / validation infrastructure - Production ML / observability / monitoring systems - Reliability engineering / quality engineering organizations Technical Expertise - LLM / multimodal evaluation methodologies - Benchmark / golden dataset design - Agent / tool-use / workflow evaluation - RAG evaluation / grounding analysis - AI observability / telemetry / tracing - Human-in-the-loop feedback systems - AI safety / governance / policy testing - Release gating / CI/CD / engineering quality systems Preferred Backgrounds - AI Infrastructure / Evaluation Platforms - AI Observability / MLOps Companies - Enterprise AI Platform Teams - Applied AI Product / Platform Organizations - Reliability / QA Engineering Leadership in Complex Systems Success Metrics - Establish Codvo-wide AI evaluation/reliability standards - Integrate evaluation frameworks into engineering lifecycle - Launch reusable benchmark packs for target workflows - Reduce AI production failure / exception rates across deployments - Improve release confidence and deployment velocity for AI systems - Increase benchmark/evaluation asset reuse across customers Ideal Candidate Profile - Systems/reliability engineer mindset with strong AI depth - Product-minded builder who can create reusable engineering frameworks - Obsessed with operational excellence and measurable quality - Comfortable driving standards across engineering organizations Note- Please apply via our official careers portal only, as applications sent directly to executives may not be considered.

Requirements

  • 10+ years in engineering / AI / ML leadership roles
  • 5+ years building or operating production AI / ML systems
  • Proven experience designing or operating AI/LLM evaluation frameworks
  • Benchmark / regression systems
  • AI QA / testing / validation infrastructure

Benefits

  • 401k Matching
  • Health Insurance
  • Paid Time Off
  • Remote Work
  • Stock Options

Skills

Python
Machine Learning
AI
Engineering
Evaluation Frameworks
Reliability Engineering
Benchmarking

About AI-Estimated Salary

The salary range shown was not provided by the employer. Our AI has estimated it based on the job title, required experience, location, and industry standards (confidence: 80%). This estimate should be used as a general guide only and may not reflect the actual compensation. Always confirm salary details directly with the employer during the application process.

Ready to Apply?

Join Codvo.ai today

Salary Range (AI-Estimated)*
$120,000 - $180,000
80% confidence
Posted 22 hours ago

More AI Governance & Programs roles you might like

Discover similar opportunities from companies that are also hiring remotely.

Full Time
$113k - $135k
9 hours ago
Worldwide
AI Governance & Programs
Senior
AI Governance
Responsible AI
Model Risk Management
+5 more
Full Time
$180k - $200k
13 hours ago
United States
Worldwide
AI Governance & Programs
Senior
Asset Liability Management
Model Risk Management
Risk Oversight
+5 more
Full Time
$180k - $200k
13 hours ago
United States
Worldwide
AI Governance & Programs
Senior
Asset Liability Management
Model Risk Management
Risk Oversight
+5 more

Explore more remote openings

Browse fresh listings from our global community of remote-friendly teams.

Full Time
$150,000 - $180,000*
17 hours ago
United States
AI Governance & Programs
Senior
Python
Machine Learning
Deep Learning
+3 more
Full Time
$120,000 - $180,000*
17 hours ago
United States
AI Governance & Programs
Senior
Python
Machine Learning
Deep Learning
+4 more
Full Time
$268k - $384k
18 hours ago
Worldwide
AI Safety / Red Teaming / Evaluations
Senior
AI
Machine Learning
Program Management
+5 more
Full Time
$80,000 - $120,000*
19 hours ago
Worldwide
AI Governance & Programs
Mid
Intelligenza Artificiale
Machine Learning
Data governance
+2 more
Full Time
$80,000 - $120,000*
19 hours ago
Worldwide
AI Governance & Programs
Mid
Intelligenza Artificiale
Machine Learning
Data governance
+2 more
Contract
$120,000 - $180,000*
20 hours ago
Worldwide
Engineering
Senior
AI
Machine Learning
Data Science
+5 more
Full Time
$120,000 - $180,000*
23 hours ago
Worldwide
AI Security & Privacy
Senior
AWS
Azure
GCP
+5 more
Full Time
$100,000 - $150,000*
1 day ago
Worldwide
AI Governance & Programs
Mid
API
Data governance
Compliance
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Security & Privacy
Senior
Azure OpenAI
AWS Bedrock
model isolation
+5 more
Full Time
$120,000 - $180,000*
1 day ago
United States
Worldwide
AI Governance & Programs
Mid
Python
AI
Machine Learning
+3 more
Full Time
$100k - $130k
1 day ago
Worldwide
AI Governance & Programs
Senior
Python
LangChain
CrewAI
+5 more
Full Time
$195k - $205k
1 day ago
Worldwide
AI Governance & Programs
Staff
Deep Neural Network
Mixture-of-Experts (MoE) routing
Expert specialization evaluation
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
Stakeholder management
AI/ML lifecycle concepts
Data use risk domains
+2 more
Full Time
$120,000 - $180,000*
1 day ago
United Kingdom
Worldwide
AI Risk & Controls
Senior
Model Risk
Market Risk
Data Analytics
+3 more
Full Time
$80,000 - $150,000*
2 days ago
India
Worldwide
AI Risk & Controls
Senior
API
Model Risk
Risk Management
Full Time
$186.9k - $220.4k
2 days ago
Worldwide
AI Security & Privacy
Staff
API
Cloud
Encryption
+5 more
Full Time
$102k - $130k
2 days ago
Worldwide
AI Governance & Programs
Mid
Process Management
Stakeholder management
Governance Frameworks
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
AWS
Cloud Security
Security Engineering
+4 more
Full Time
$120,000 - $200,000*
3 days ago
Worldwide
AI Governance & Programs
Staff
Python
Pandas
API design
+5 more
Full Time
$170k - $200k
3 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
Application Security
AWS
+5 more
Full Time
$70k - $80k
3 days ago
Worldwide
AI Security & Privacy
Entry
Python
AWS
Azure
+5 more
Full Time
$130k - $145k
3 days ago
Worldwide
AI Security & Privacy
Mid
AWS
Azure
GCP
+5 more
Full Time
$52k - $61.6k
3 days ago
United States
Worldwide
Model Risk Management & Validation
Mid
Model Risk
Risk Management
Program Management
+3 more
Full Time
$120,000 - $180,000*
3 days ago
United States
Worldwide
AI Security & Privacy
Mid
AI/ML
Security
Threat modeling
+5 more
Full Time
$104k - $171.5k
3 days ago
United States
Worldwide
Model Risk Management & Validation
Senior
Model Inventory
Quantitative Risk Management
Risk and Control Frameworks
+3 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
MLOps tools
SQL
Python
+4 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
AI
Security
Engineering
Full Time
$100,000 - $150,000*
3 days ago
Worldwide
AI Governance & Programs
Mid
probability theory
stochastic processes
statistics
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
Enterprise Security Architecture
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
Enterprise Security Architecture
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
Enterprise Security Architecture
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
Vector DBs
Fine-tuning Pipelines
Python
+4 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more