Remote Opportunity

AI Evaluation & Reliability Engineer

Join abra as a senior professional working remotely from Worldwide. Explore the role, benefits, and apply in one place.

Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
Python
LLMs
Evaluation Frameworks
+2 more

Job Description

abra R&D is looking for a Reliability Engineer! abra R&D is looking for a Reliability Engineer who will take part in building the next-generation agentic analytics platform, the first real-time database optimized for AI agents at scale. We’re looking for a Senior AI Evaluation & Reliability Engineer to define and build how AI agents are measured, validated, monitored, and improved in production. This role sits at the intersection of LLM systems, evaluation research, and production-grade engineering. You will design evaluation methodologies, build LLM-as-a-judge systems, and develop agent-based testing frameworks to ensure correctness, robustness, and reliability of complex multi-agent workflows operating on real-time data. What You’ll Do: Design and implement evaluation frameworks for AI agents and multi-agent systems Build LLM-as-a-judge pipelines to assess correctness, reasoning quality, and output quality Develop agent-based evaluation systems (agents evaluating agents) for scalable testing Define metrics, benchmarks, scorecards, and methodologies for agent reliability and performance Build data-driven evaluation pipelines using synthetic and real-world datasets Identify and analyze failure modes, edge cases, and non-deterministic behaviors Improve agent robustness, consistency, and reliability in production environments Work with tools such as Google ADK, Opik, and related evaluation frameworks Collaborate closely with AI, platform, and database teams to shape agent–data interaction quality Requirements Must have: 4–8+ years of experience in software engineering, AI systems, or evaluation/QA engineering Strong programming skills in Python Hands-on experience working with LLMs in production environments Experience building evaluation systems, automation frameworks, or testing infrastructure Strong understanding of prompt engineering, tool use, and agent behavior Ability to think in terms of metrics, correctness, and system reliability Nice to have: Experience with LLM evaluation frameworks (Opik, LangSmith, etc.) Experience with Google ADK / agent frameworks Experience implementing LLM-as-a-judge or ranking systems Background in data systems, analytics, or real-time pipelines Experience with multi-agent systems Familiarity with statistical evaluation methods or experimentation (A/B testing, scoring systems)

Requirements

  • 4–8+ years of experience in software engineering, AI systems, or evaluation/QA engineering
  • Strong programming skills in Python
  • Hands-on experience working with LLMs in production environments
  • Experience building evaluation systems, automation frameworks, or testing infrastructure
  • Strong understanding of prompt engineering, tool use, and agent behavior
  • Ability to think in terms of metrics, correctness, and system reliability

Benefits

  • 401k Matching
  • Certification Support
  • Flexible Hours
  • Health Insurance
  • Home Office Budget
  • Learning Budget
  • Paid Time Off
  • Remote Work

Skills

Python
LLMs
Evaluation Frameworks
Google ADK
Opik

About AI-Estimated Salary

The salary range shown was not provided by the employer. Our AI has estimated it based on the job title, required experience, location, and industry standards (confidence: 80%). This estimate should be used as a general guide only and may not reflect the actual compensation. Always confirm salary details directly with the employer during the application process.

Ready to Apply?

Join abra today

Salary Range (AI-Estimated)*
$120,000 - $180,000
80% confidence
Posted 1 day ago

More AI Governance & Programs roles you might like

Discover similar opportunities from companies that are also hiring remotely.

Full Time
$120,000 - $180,000*
8 hours ago
United States
Worldwide
AI Governance & Programs
Senior
GDPR
AI Governance
Machine Learning
+3 more
Full Time
$120,000 - $180,000*
12 hours ago
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
Large Language Models
+4 more
Full Time
$120,000 - $180,000*
15 hours ago
Worldwide
AI Governance & Programs
Senior
Data governance
Data Architecture
Cloud Computing
+2 more

Explore more remote openings

Browse fresh listings from our global community of remote-friendly teams.

Full Time
$120,000 - $180,000*
10 hours ago
United States
Worldwide
AI Security & Privacy
Senior
AI
Machine Learning
Security
+4 more
Full Time
$171k - $230.534k
14 hours ago
Worldwide
AI Security & Privacy
Senior
AI
Machine Learning
Security
+5 more
Full Time
$80,000 - $150,000*
17 hours ago
Worldwide
AI Governance & Programs
Mid
Python
C++
probability theory
+5 more
Full Time
$180k - $200k
1 day ago
United States
Worldwide
AI Governance & Programs
Senior
Asset Liability Management
Model Risk Management
Stress Testing
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
Python
PyTorch
TensorFlow
+5 more
Full Time
$163.2k - $280.5k
1 day ago
Worldwide
AI Security & Privacy
Lead
API
AI/ML
Security
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
Go
Python
Java
+3 more
Full Time
$69k - $170k
1 day ago
United States
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
Model Risk Management
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Data Science
+2 more
Full Time
$130.5k - $145k
2 days ago
United States
Worldwide
AI Governance & Programs
Senior
AI Policy
AI frameworks
AI Development
+4 more
Full Time
$139.764k - $287.749k
2 days ago
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
Generative AI
+4 more
Full Time
$120,000 - $180,000*
2 days ago
United States
Worldwide
AI Security & Privacy
Senior
AI/ML systems
Cloud Security
Threat Detection
+4 more
Full Time
$147.25k - $215k
2 days ago
Worldwide
Model Risk Management & Validation
Senior
probability theory
stochastic processes
statistics
+5 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Data Protection
AI Governance
Compliance
+1 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Data Protection
AI Governance
Compliance
+1 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Data Protection
AI Governance
Compliance
+1 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Data Protection
AI Governance
Compliance
+1 more
Full Time
$54.4k - $120.75k
2 days ago
Worldwide
AI Governance & Programs
Mid
Risk Management Frameworks
Model Risk
Transparency
+5 more
Full Time
$80,000 - $140,000*
2 days ago
United States
Worldwide
AI Governance & Programs
Mid
Python
Excel
Data Analysis
+3 more
Full Time
$136k - $197k
2 days ago
Worldwide
AI Compliance & Legal
Senior
API
Compliance
Risk Management
+5 more
Full Time
$189.721k - $332.012k
3 days ago
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
AI
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
AI Security
Machine Learning
Python
+4 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
Software Development
Testing
Artificial Intelligence
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
Model Risk Management & Validation
Senior
Credit Risk Models
Stress Testing
Model Performance
+4 more
Contract
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
Python
Data Analysis
Model Validation
+4 more
Full Time
$120,000 - $180,000*
3 days ago
United States
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Algorithmic tools
+2 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
Data governance
AI Policy
Regulatory Compliance
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Data classification
Governance Frameworks
DLP tools
+3 more
Full Time
$120,000 - $180,000*
5 days ago
Worldwide
AI Compliance & Legal
Executive
Data Protection
Artificial Intelligence
Compliance
+4 more
Full Time
$120,000 - $180,000*
6 days ago
India
Worldwide
AI Governance & Programs
Senior
Java
AWS
SQL
+5 more
Full Time
$169.1k - $270.8k
6 days ago
Worldwide
AI Governance & Programs
Staff
Machine Learning
Generative AI
Python
+1 more
Contract
$0.01k - $0.014k
6 days ago
Worldwide
AI Governance & Programs
Entry
Large Language Models
Problem-solving
Language analysis
+3 more
Contract
$0.01k - $0.014k
6 days ago
Worldwide
AI Governance & Programs
Entry
Italian
Large Language Models
Structured Guidelines
+3 more
Contract
$0.01k - $0.014k
6 days ago
Worldwide
AI Governance & Programs
Entry
Italian
Large Language Models
Structured Guidelines
+3 more
Full Time
$150,000 - $250,000*
6 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI/ML
IAM
+5 more
Contract
$0.01k - $0.014k
6 days ago
Worldwide
AI Governance & Programs
Entry
Italian
Large Language Models
Problem-solving
+2 more