Remote Opportunity

Head of AI Evaluation & Reliability Engineering

Join Codvo.ai as a senior professional working remotely from Worldwide. Explore the role, benefits, and apply in one place.

Full Time
$120,000 - $180,000*
3 months ago
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
AI
+4 more

Job Description

Head of AI Evaluation & Reliability Engineering Location: Flexible / Hybrid Reports To: Head of Engineering Role Mission Build and scale Codvo’s AI Evaluation & Reliability Engineering capability as a core engineering function supporting the design, validation, and continuous improvement of enterprise AI systems in production. You will architect the frameworks, tooling, benchmark assets, and operational processes required to ensure AI systems deployed by Codvo and its customers meet enterprise standards for reliability, safety, performance, and governance. This role is deeply embedded within engineering and serves as the quality and reliability backbone for Codvo’s AI platform and delivery organization. Why This Role Matters As AI systems move from pilots to business-critical workflows, reliability and evaluation become core engineering disciplines—not optional afterthoughts. Codvo is building the infrastructure and operational rigor required to ensure every AI deployment is measurable, governed, and production-ready. Core Responsibilities Engineering Ownership - Build Codvo’s AI Evaluation & Reliability Engineering function as a core platform/engineering capability. - Define engineering standards for AI evaluation, testing, release gating, and runtime monitoring. - Integrate evaluation/reliability frameworks into Codvo’s engineering and delivery lifecycle. Evaluation Architecture - Design reusable evaluation frameworks for: - LLM / multimodal quality - RAG grounding / evidence fidelity - Agent reasoning / decision quality - Tool / workflow execution success - Safety / policy / compliance adherence - Cost / latency / production economics Benchmark Infrastructure - Build benchmark packs, golden datasets, and regression suites for priority enterprise workflows. - Define benchmark coverage and versioning standards. - Establish processes for edge-case capture and benchmark expansion. Runtime Reliability Systems - Design systems/processes for: - Runtime drift / degradation monitoring - Failure mode analysis / incident diagnostics - Human review / escalation pathways - Continuous evaluation and improvement loops Technical Leadership - Partner closely with platform, product, and solution engineering teams. - Serve as internal SME on AI reliability, benchmark design, and evaluation methodology. - Help shape architecture standards for AI-native product and workflow delivery. Team Leadership - Build and lead a team of: - Evaluation Engineers - Benchmark / QA Engineers - Reliability / Observability Engineers - Domain Review / Feedback Ops Specialists Required Qualifications - 10+ years in engineering / AI / ML leadership roles. - 5+ years building or operating production AI / ML systems. - Proven experience designing or operating: - AI/LLM evaluation frameworks - Benchmark / regression systems - AI QA / testing / validation infrastructure - Production ML / observability / monitoring systems - Reliability engineering / quality engineering organizations Technical Expertise - LLM / multimodal evaluation methodologies - Benchmark / golden dataset design - Agent / tool-use / workflow evaluation - RAG evaluation / grounding analysis - AI observability / telemetry / tracing - Human-in-the-loop feedback systems - AI safety / governance / policy testing - Release gating / CI/CD / engineering quality systems Preferred Backgrounds - AI Infrastructure / Evaluation Platforms - AI Observability / MLOps Companies - Enterprise AI Platform Teams - Applied AI Product / Platform Organizations - Reliability / QA Engineering Leadership in Complex Systems Success Metrics - Establish Codvo-wide AI evaluation/reliability standards - Integrate evaluation frameworks into engineering lifecycle - Launch reusable benchmark packs for target workflows - Reduce AI production failure / exception rates across deployments - Improve release confidence and deployment velocity for AI systems - Increase benchmark/evaluation asset reuse across customers Ideal Candidate Profile - Systems/reliability engineer mindset with strong AI depth - Product-minded builder who can create reusable engineering frameworks - Obsessed with operational excellence and measurable quality - Comfortable driving standards across engineering organizations Note- Please apply via our official careers portal only, as applications sent directly to executives may not be considered.

Requirements

  • 10+ years in engineering / AI / ML leadership roles
  • 5+ years building or operating production AI / ML systems
  • Proven experience designing or operating AI/LLM evaluation frameworks
  • Benchmark / regression systems
  • AI QA / testing / validation infrastructure

Benefits

  • 401k Matching
  • Health Insurance
  • Paid Time Off
  • Remote Work
  • Stock Options

Skills

Python
Machine Learning
AI
Engineering
Evaluation Frameworks
Reliability Engineering
Benchmarking

About AI-Estimated Salary

The salary range shown was not provided by the employer. Our AI has estimated it based on the job title, required experience, location, and industry standards (confidence: 80%). This estimate should be used as a general guide only and may not reflect the actual compensation. Always confirm salary details directly with the employer during the application process.

Ready to Apply?

Join Codvo.ai today

Salary Range (AI-Estimated)*
$120,000 - $180,000
80% confidence
Posted 3 months ago

Explore more remote openings

Browse fresh listings from our global community of remote-friendly teams.

Full Time
$94.8k - $166.2k
5 days ago
United States
Engineering
Senior
Git
Full Time
5 days ago
United States
AI
Senior
Python
AWS
Git
+1 more
Full Time
5 days ago
United States
Data
Mid
Python
API
Full Time
$175.75k - $260k
5 days ago
United States
AI
Executive
AWS
API
Full Time
5 days ago
United States
AI
Mid
API
Full Time
6 days ago
United States
AI
Executive
Git
Full Time
2 weeks ago
Worldwide
AI
Senior
API
Full Time
$145k - $180k
2 weeks ago
United States
AI
Executive
Python
AWS
API
Full Time
$140k - $170k
2 weeks ago
Worldwide
AI
Senior
Python
Git
API
Full Time
2 weeks ago
United States
AI
Senior
API
Full Time
2 weeks ago
United States
AI
Senior
API
Full Time
2 weeks ago
United States
AI
Executive
Full Time
2 weeks ago
United States
AI
Executive
Full Time
2 weeks ago
United States
AI
Senior
API
Full Time
$111.6k - $163.1k
2 weeks ago
United States
AI
Senior
Full Time
$0.03k - $0.035k
2 weeks ago
Worldwide
AI
Entry
Full Time
$145k - $155k
2 weeks ago
United States
AI
Executive
AWS
Git
Full Time
2 weeks ago
United States
AI
Senior
Full Time
$89.865k - $155.767k
2 weeks ago
United States
Product
Mid
Python
Java
AWS
+1 more
Full Time
2 weeks ago
United States
AI
Executive
Git
Full Time
2 weeks ago
United States
AI
Senior
AWS
Git
API
Full Time
2 weeks ago
United States
AI
Executive
AWS
API
Full Time
2 weeks ago
United States
AI
Senior
Full Time
2 weeks ago
United States
AI
Mid
Python
SQL
Full Time
RON 16k - RON 19k
2 weeks ago
United States
AI
Senior
Python
AWS
Full Time
$242k - $302k
2 weeks ago
United States
AI
Executive
API
Full Time
$105k - $235k
2 weeks ago
United States
AI
Senior
AWS
Git
Full Time
$105k - $235k
2 weeks ago
United States
AI
Senior
AWS
Git
Full Time
2 weeks ago
United States
AI
Senior
API
Full Time
2 weeks ago
United States
AI
Senior
API
Contract
2 weeks ago
Worldwide
AI
Executive
AWS
API
Contract
2 weeks ago
Worldwide
AI
Executive
AWS
API
Full Time
2 weeks ago
United States
AI
Senior
Full Time
2 weeks ago
United States
AI
Senior
Full Time
2 weeks ago
Worldwide
AI
Senior
AWS