Remote Opportunity

AI Evaluation & Reliability Engineer (Agents & LLM Systems)

Join abra as a senior professional working remotely from Worldwide. Explore the role, benefits, and apply in one place.

Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
Python
LLMs
Evaluation Frameworks
+2 more

Job Description

abra R&D is looking for a AI Evaluation & Reliability Engineer (Agents & LLM Systems)! abra R&D is looking for a AI Evaluation & Reliability Engineer who will take part in building the next-generation agentic analytics platform, the first real-time database optimized for AI agents at scale. We’re looking for a Senior AI Evaluation & Reliability Engineer to define and build how AI agents are measured, validated, monitored, and improved in production. This role sits at the intersection of LLM systems, evaluation research, and production-grade engineering. You will design evaluation methodologies, build LLM-as-a-judge systems, and develop agent-based testing frameworks to ensure correctness, robustness, and reliability of complex multi-agent workflows operating on real-time data. What You’ll Do: Design and implement evaluation frameworks for AI agents and multi-agent systems Build LLM-as-a-judge pipelines to assess correctness, reasoning quality, and output quality Develop agent-based evaluation systems (agents evaluating agents) for scalable testing Define metrics, benchmarks, scorecards, and methodologies for agent reliability and performance Build data-driven evaluation pipelines using synthetic and real-world datasets Identify and analyze failure modes, edge cases, and non-deterministic behaviors Improve agent robustness, consistency, and reliability in production environments Work with tools such as Google ADK, Opik, and related evaluation frameworks Collaborate closely with AI, platform, and database teams to shape agent–data interaction quality Requirements Must have: 4–8+ years of experience in software engineering, AI systems, or evaluation/QA engineering Strong programming skills in Python Hands-on experience working with LLMs in production environments Experience building evaluation systems, automation frameworks, or testing infrastructure Strong understanding of prompt engineering, tool use, and agent behavior Ability to think in terms of metrics, correctness, and system reliability Nice to have: Experience with LLM evaluation frameworks (Opik, LangSmith, etc.) Experience with Google ADK / agent frameworks Experience implementing LLM-as-a-judge or ranking systems Background in data systems, analytics, or real-time pipelines Experience with multi-agent systems Familiarity with statistical evaluation methods or experimentation (A/B testing, scoring systems)

Requirements

  • 4–8+ years of experience in software engineering, AI systems, or evaluation/QA engineering
  • Strong programming skills in Python
  • Hands-on experience working with LLMs in production environments
  • Experience building evaluation systems, automation frameworks, or testing infrastructure
  • Strong understanding of prompt engineering, tool use, and agent behavior
  • Ability to think in terms of metrics, correctness, and system reliability

Benefits

  • 401k Matching
  • Certification Support
  • Flexible Hours
  • Health Insurance
  • Home Office Budget
  • Learning Budget
  • Paid Time Off
  • Remote Work

Skills

Python
LLMs
Evaluation Frameworks
Google ADK
Opik

About AI-Estimated Salary

The salary range shown was not provided by the employer. Our AI has estimated it based on the job title, required experience, location, and industry standards (confidence: 80%). This estimate should be used as a general guide only and may not reflect the actual compensation. Always confirm salary details directly with the employer during the application process.

Ready to Apply?

Join abra today

Salary Range (AI-Estimated)*
$120,000 - $180,000
80% confidence
Posted 3 days ago

More AI Governance & Programs roles you might like

Discover similar opportunities from companies that are also hiring remotely.

Full Time
$102k - $130k
20 hours ago
Worldwide
AI Governance & Programs
Mid
Process Management
Stakeholder management
Governance Frameworks
+5 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
AWS
Cloud Security
Security Engineering
+4 more
Full Time
$120,000 - $200,000*
1 day ago
Worldwide
AI Governance & Programs
Staff
Python
Pandas
API design
+5 more

Explore more remote openings

Browse fresh listings from our global community of remote-friendly teams.

Full Time
$186.9k - $220.4k
17 hours ago
Worldwide
AI Security & Privacy
Staff
API
Cloud
Encryption
+5 more
Full Time
$170k - $200k
1 day ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
Application Security
AWS
+5 more
Full Time
$70k - $80k
1 day ago
Worldwide
AI Security & Privacy
Entry
Python
AWS
Azure
+5 more
Full Time
$130k - $145k
1 day ago
Worldwide
AI Security & Privacy
Mid
AWS
Azure
GCP
+5 more
Full Time
$52k - $61.6k
2 days ago
United States
Worldwide
Model Risk Management & Validation
Mid
Model Risk
Risk Management
Program Management
+3 more
Full Time
$120,000 - $180,000*
2 days ago
United States
Worldwide
AI Security & Privacy
Mid
AI/ML
Security
Threat modeling
+5 more
Full Time
$104k - $171.5k
2 days ago
United States
Worldwide
Model Risk Management & Validation
Senior
Model Inventory
Quantitative Risk Management
Risk and Control Frameworks
+3 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
MLOps tools
SQL
Python
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
AI
Security
Engineering
Full Time
$100,000 - $150,000*
2 days ago
Worldwide
AI Governance & Programs
Mid
probability theory
stochastic processes
statistics
+5 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
Enterprise Security Architecture
+5 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
Enterprise Security Architecture
+5 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
Enterprise Security Architecture
+5 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
Vector DBs
Fine-tuning Pipelines
Python
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
$100,000 - $150,000*
2 days ago
Netherlands
AI Governance & Programs
Senior
AI
Machine Learning
Risk Management
+2 more
Full Time
$100,000 - $150,000*
2 days ago
Netherlands
AI Governance & Programs
Senior
AI
Machine Learning
Risk Management
+2 more
Full Time
$80,000 - $140,000*
2 days ago
Worldwide
AI Governance & Programs
Mid
Data governance
Project Management
Data & AI Policy
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
AI Security
Machine Learning
Cloud Security
+3 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Quantum Physics
Quantum Optics
Photonics
+4 more
Full Time
$156.8k - $285.6k
2 days ago
United States
Canada
Worldwide
AI Security & Privacy
Senior
API Experience
Backend Engineering
Machine Learning
+3 more
Full Time
CAD 120k - CAD 153.8k
2 days ago
Canada
Worldwide
AI Risk & Controls
Executive
Model Risk Management
Data Management
Stress Testing
+4 more
Full Time
$0.03k - $0.035k
2 days ago
Worldwide
AI Governance & Programs
Senior
AI
Data Annotation
Linguistics
+4 more
Full Time
$0.03k - $0.035k
2 days ago
Worldwide
AI Governance & Programs
Senior
French
AI
Machine Learning
+2 more
Full Time
$0.03k - $0.035k
3 days ago
Worldwide
AI Governance & Programs
Senior
German
AI
Data Annotation
+4 more
Full Time
$0.03k - $0.035k
3 days ago
Worldwide
AI Governance & Programs
Senior
English
AI
Machine Learning
+3 more
Full Time
$0.03k - $0.035k
3 days ago
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Data Annotation
+2 more
Full Time
$0.03k - $0.035k
3 days ago
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Data Annotation
+2 more
Full Time
$128k - $200k
5 days ago
Worldwide
AI Security & Privacy
Staff
Secrets Management
Security Architecture
Threat modeling
+5 more
Full Time
$150k - $185k
5 days ago
Worldwide
AI Security & Privacy
Senior
Azure
Threat modeling
Networking
+5 more
Full Time
$80,000 - $140,000*
6 days ago
United Kingdom
Worldwide
AI Governance & Programs
Lead
Python
SQL
SAS
+3 more
Full Time
$80,000 - $140,000*
6 days ago
United Kingdom
Worldwide
AI Governance & Programs
Lead
Python
SQL
SAS
+3 more