Remote Opportunity

AI Evaluation & Reliability Engineer

Join abra as a senior professional working remotely from Worldwide. Explore the role, benefits, and apply in one place.

Full Time
$120,000 - $180,000*
2 months ago
Worldwide
AI Governance & Programs
Senior
Python
LLMs
Evaluation Frameworks
+2 more

Job Description

abra R&D is looking for a Reliability Engineer! abra R&D is looking for a Reliability Engineer who will take part in building the next-generation agentic analytics platform, the first real-time database optimized for AI agents at scale. We’re looking for a Senior AI Evaluation & Reliability Engineer to define and build how AI agents are measured, validated, monitored, and improved in production. This role sits at the intersection of LLM systems, evaluation research, and production-grade engineering. You will design evaluation methodologies, build LLM-as-a-judge systems, and develop agent-based testing frameworks to ensure correctness, robustness, and reliability of complex multi-agent workflows operating on real-time data. What You’ll Do: Design and implement evaluation frameworks for AI agents and multi-agent systems Build LLM-as-a-judge pipelines to assess correctness, reasoning quality, and output quality Develop agent-based evaluation systems (agents evaluating agents) for scalable testing Define metrics, benchmarks, scorecards, and methodologies for agent reliability and performance Build data-driven evaluation pipelines using synthetic and real-world datasets Identify and analyze failure modes, edge cases, and non-deterministic behaviors Improve agent robustness, consistency, and reliability in production environments Work with tools such as Google ADK, Opik, and related evaluation frameworks Collaborate closely with AI, platform, and database teams to shape agent–data interaction quality Requirements Must have: 4–8+ years of experience in software engineering, AI systems, or evaluation/QA engineering Strong programming skills in Python Hands-on experience working with LLMs in production environments Experience building evaluation systems, automation frameworks, or testing infrastructure Strong understanding of prompt engineering, tool use, and agent behavior Ability to think in terms of metrics, correctness, and system reliability Nice to have: Experience with LLM evaluation frameworks (Opik, LangSmith, etc.) Experience with Google ADK / agent frameworks Experience implementing LLM-as-a-judge or ranking systems Background in data systems, analytics, or real-time pipelines Experience with multi-agent systems Familiarity with statistical evaluation methods or experimentation (A/B testing, scoring systems)

Requirements

  • 4–8+ years of experience in software engineering, AI systems, or evaluation/QA engineering
  • Strong programming skills in Python
  • Hands-on experience working with LLMs in production environments
  • Experience building evaluation systems, automation frameworks, or testing infrastructure
  • Strong understanding of prompt engineering, tool use, and agent behavior
  • Ability to think in terms of metrics, correctness, and system reliability

Benefits

  • 401k Matching
  • Certification Support
  • Flexible Hours
  • Health Insurance
  • Home Office Budget
  • Learning Budget
  • Paid Time Off
  • Remote Work

Skills

Python
LLMs
Evaluation Frameworks
Google ADK
Opik

About AI-Estimated Salary

The salary range shown was not provided by the employer. Our AI has estimated it based on the job title, required experience, location, and industry standards (confidence: 80%). This estimate should be used as a general guide only and may not reflect the actual compensation. Always confirm salary details directly with the employer during the application process.

Ready to Apply?

Join abra today

Salary Range (AI-Estimated)*
$120,000 - $180,000
80% confidence
Posted 2 months ago

More AI Governance & Programs roles you might like

Discover similar opportunities from companies that are also hiring remotely.

Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Data governance
AI Governance
Metadata Management
+5 more
Contract
$100,000 - $150,000*
3 days ago
United Kingdom
Worldwide
AI Governance & Programs
Senior
AI
Data governance
Compliance
+2 more
Full Time
$100,000 - $150,000*
3 days ago
Worldwide
AI Governance & Programs
Mid
Cybersecurity
AI
Machine Learning
+4 more

Explore more remote openings

Browse fresh listings from our global community of remote-friendly teams.

Full Time
1 day ago
United States
AI
Senior
Full Time
2 days ago
United States
AI
Senior
Python
AWS
Git
Full Time
GBP 91k - GBP 106k
2 days ago
United States
AI
Senior
Python
SQL
API
Full Time
2 days ago
United States
AI
Executive
AWS
Git
API
Full Time
2 days ago
United States
AI
Executive
AWS
Git
API
Full Time
CAD 92.6k - CAD 142.6k
2 days ago
United States
AI
Senior
Git
API
Full Time
$95.1k - $169.8k
2 days ago
United States
AI
Senior
Git
API
Full Time
$124k - $140k
2 days ago
United States
AI
Senior
Git
API
Full Time
$124k - $140k
2 days ago
United States
AI
Senior
Git
API
Full Time
$95.1k - $169.8k
2 days ago
United States
AI
Senior
Git
API
Full Time
CAD 92.6k - CAD 142.6k
2 days ago
United States
AI
Senior
Git
API
Full Time
CAD 92.6k - CAD 142.6k
2 days ago
United States
AI
Senior
Git
API
Full Time
$95.1k - $169.8k
2 days ago
United States
AI
Senior
Git
API
Full Time
$95.1k - $169.8k
2 days ago
United States
AI
Senior
Git
API
Full Time
CAD 92.6k - CAD 142.6k
2 days ago
United States
AI
Senior
Git
API
Full Time
2 days ago
United States
AI
Senior
Git
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
AI/ML
Threat modeling
Python
+3 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
Cybersecurity
Machine Learning
Cloud Security
+2 more
Full Time
$80,000 - $150,000*
3 days ago
United Kingdom
Worldwide
AI Governance & Programs
Lead
Python
SQL
SAS
+3 more
Full Time
$80,000 - $150,000*
3 days ago
United Kingdom
Worldwide
AI Governance & Programs
Lead
Python
SQL
SAS
+3 more
Full Time
$80,000 - $150,000*
3 days ago
United Kingdom
Worldwide
AI Governance & Programs
Lead
Python
SQL
SAS
+3 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Lead
Python
Node.JS
Go
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
Python
Pandas
Transformers
+4 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
Python
Pandas
Transformers
+4 more
Full Time
GBP 190k - GBP 225k
3 days ago
Worldwide
AI Security & Privacy
Senior
Compliance
Networking
Data Architecture
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Security & Privacy
Senior
AI Security
Machine Learning
Python
+4 more
Contract
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
SQL
Informatica Intelligent Data Management Cloud (IDMC)
Informatica CLAIRE AI
+5 more
Full Time
$138.2k - $224.6k
4 days ago
Worldwide
AI Security & Privacy
Senior
AI/ML Risks
Model Connectivity & Secure Deployment
AI Lifecycle Security
+5 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Governance & Programs
Senior
Machine Learning
Data governance
Model Risk
+5 more
Full Time
$120,000 - $180,000*
4 days ago
United Kingdom
AI Governance & Programs
Staff
AI Governance
Cloud Security
Product Management
+2 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI Security
DevSecOps
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI Security
DevSecOps
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI Security
DevSecOps
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI Security
DevSecOps
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI Security
DevSecOps
+4 more
Full Time
$120,000 - $200,000*
4 days ago
Worldwide
AI Governance & Programs
Senior
Generative AI
Swift
Objective C
+4 more