Remote Opportunity

AI Evaluation Engineer (Knowledge & Research)

Join C-Serv as a senior professional working remotely from Worldwide. Explore the role, benefits, and apply in one place.

Contract
$120,000 - $180,000*
2 weeks ago
Worldwide
AI Governance & Programs
Senior
Python
Docker
Machine Learning
+2 more

Job Description

About Us Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs. Role overview We are looking for an AI Evaluation Engineer with a strong research background to design and evaluate complex, multi-agent tasks used to benchmark next-generation AI systems. In this role, you will work at the intersection of research, data structuring, and AI evaluation, building high-quality tasks that require deep document understanding, structured reasoning, and multi-step synthesis. You will create datasets and evaluation frameworks that test whether AI agents can truly read, reason, and extract knowledge from large-scale unstructured data. This is a high-precision, detail-oriented role requiring strong analytical thinking, structured problem decomposition, and the ability to translate research content into measurable evaluation tasks. Commitments Required: 8 hours per day with an overlap of 4 hours with PST. Employment type: Contractor assignment (no medical/paid leave) Duration of contract: 5 weeks+ Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam Interview: take home assessment (60min) Responsibilities Build multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collections Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material Design LLM judge prompts that evaluate agent output field-by-field against the oracle Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis) 5+ years of experience in research (academic or industry) in a scientific, technical, or analytical domain Strong ability to read, analyze, and extract structured information from unstructured documents Experience designing or working with structured data formats (JSON, schemas, validation) Proficiency in Python scripting (data processing, validation, or evaluation scripts) Experience with AI evaluation, coding benchmarks, or structured reasoning tasks (e.g., SWE-bench, Terminal-bench, or similar) Experience working with Docker (building images, debugging containers) Strong attention to detail, especially when defining exact, verifiable outputs Ability to design complex, multi-step problem-solving workflows

Requirements

  • 5+ years of experience in research (academic or industry) in a scientific, technical, or analytical domain
  • Strong ability to read, analyze, and extract structured information from unstructured documents
  • Experience designing or working with structured data formats (JSON, schemas, validation)
  • Proficiency in Python scripting (data processing, validation, or evaluation scripts)
  • Experience with AI evaluation, coding benchmarks, or structured reasoning tasks
  • Strong attention to detail, especially when defining exact, verifiable outputs
  • Ability to design complex, multi-step problem-solving workflows

Skills

Python
Docker
Machine Learning
AI evaluation
JSON

About AI-Estimated Salary

The salary range shown was not provided by the employer. Our AI has estimated it based on the job title, required experience, location, and industry standards (confidence: 80%). This estimate should be used as a general guide only and may not reflect the actual compensation. Always confirm salary details directly with the employer during the application process.

Ready to Apply?

Join C-Serv today

Salary Range (AI-Estimated)*
$120,000 - $180,000
80% confidence
Posted 2 weeks ago

More AI Governance & Programs roles you might like

Discover similar opportunities from companies that are also hiring remotely.

Full Time
$7048.161k - $1061.802k
20 hours ago
United States
Worldwide
AI Governance & Programs
Senior
Python
SQL
AI/ML
+4 more
Full Time
$135k - $150k
21 hours ago
Worldwide
AI Governance & Programs
Mid
Python
Machine Learning
LLM
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
Data governance
AI Policy
Risk Management
+5 more

Explore more remote openings

Browse fresh listings from our global community of remote-friendly teams.

Full Time
$120,000 - $180,000*
21 hours ago
Worldwide
Americas
AI Security & Privacy
Senior
AI
Machine Learning
Cyber Security
+3 more
Full Time
$150,000 - $250,000*
1 day ago
Worldwide
AI Security & Privacy
Senior
OWASP ZAP
Nmap
Postman
+5 more
Full Time
$150k - $200k
1 day ago
Worldwide
AI Governance & Programs
Mid
AI
Python
Clinical AI
+5 more
Full Time
$120,000 - $180,000*
1 day ago
Australia
Worldwide
AI Governance & Programs
Senior
Data governance
AI Ethics
Regulatory Compliance
+3 more
Full Time
$120,000 - $180,000*
1 day ago
Australia
Worldwide
AI Governance & Programs
Senior
Data governance
AI Ethics
Regulatory Compliance
+3 more
Full Time
$85k - $95k
1 day ago
United States
Model Risk Management & Validation
Senior
Model Risk Management
Quantitative Risk Management
Financial Modeling
+4 more
Full Time
$85k - $95k
1 day ago
United States
Model Risk Management & Validation
Senior
Model Risk Management
Quantitative Risk Management
Financial Modeling
+5 more
Full Time
$80,000 - $140,000*
1 day ago
United States
AI Risk & Controls
Mid
Excel
SQL
Python
+1 more
Full Time
$80,000 - $120,000*
1 day ago
United States
Model Risk Management & Validation
Mid
Excel
SQL
Python
+1 more
Full Time
$129k - $175k
2 days ago
Worldwide
AI Audit / Assurance / Controls Testing
Senior
API
Automation
Python
+3 more
Full Time
$129k - $175k
2 days ago
Worldwide
AI Audit / Assurance / Controls Testing
Senior
API
Automation
Python
+3 more
Full Time
$119.7k - $191.1k
2 days ago
Worldwide
AI Governance & Programs
Senior
Risk Management
Model Risk
Governance
+5 more
Full Time
$120,000 - $180,000*
2 days ago
Ireland
Worldwide
AI Compliance & Legal
Senior
Data Protection
AI Compliance
Regulatory Requirements
+3 more
Full Time
$100,000 - $150,000*
2 days ago
Worldwide
AI Governance & Programs
Mid
AI/ML Concepts
Tableau
JIRA
+1 more
Full Time
$204k - $255k
2 days ago
Worldwide
AI Policy, Enablement & Training
Senior
AI
Machine Learning
Policy Development
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Staff
Python
ISO 27001
ISO 27701
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Staff
Python
ISO 27001
ISO 27701
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Staff
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
Up to PHP 150k
2 days ago
Worldwide
AI Security & Privacy
Senior
PyTorch
TensorFlow
Containerized Environments
+4 more
Full Time
Up to PHP 150k
2 days ago
Worldwide
AI Security & Privacy
Senior
PyTorch
TensorFlow
Gradient-based attacks
+4 more
Full Time
$209k - $309k
4 days ago
Worldwide
AI Security & Privacy
Senior
API
AI
Security
+1 more
Full Time
$239.5k - $351.5k
4 days ago
Worldwide
AI Security & Privacy
Senior
API
AI
Security
+1 more
Full Time
$230k - $280k
4 days ago
United States
Worldwide
AI Governance & Programs
Senior
OWASP
NIST AI RMF
AI/ML systems
+5 more
Full Time
$230k - $280k
4 days ago
Worldwide
AI Governance & Programs
Senior
Agentic Trust Framework
OWASP
NIST AI RMF
+5 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Python
Go
Git
+5 more
Full Time
$159.3k - $273.2k
5 days ago
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
Data Science
+5 more
Full Time
$120,000 - $180,000*
5 days ago
Worldwide
AI Security & Privacy
Staff
Python
Go
Threat modeling
+3 more
Full Time
$80,000 - $140,000*
5 days ago
Worldwide
AI Governance & Programs
Mid
Responsible AI
ISO/IEC 42001
ISO/IEC 27001
+2 more
Full Time
$120,000 - $180,000*
5 days ago
United States
Worldwide
AI Governance & Programs
Senior
AI Ethics
Risk Management
AI governance frameworks
+5 more
Full Time
$120,000 - $180,000*
6 days ago
Worldwide
AI Security & Privacy
Senior
Security Operations
Cybersecurity
NG-SIEM
+5 more
Full Time
$163k - $237k
6 days ago
Worldwide
AI Governance & Programs
Senior
API
Product Management
AI
+4 more
Full Time
$80,000 - $140,000*
6 days ago
United States
Worldwide
AI Governance & Programs
Mid
Python
Data Analysis
Financial Data
+3 more
Full Time
$80,000 - $140,000*
6 days ago
United States
Worldwide
AI Governance & Programs
Mid
Python
Data Analysis
Machine Learning
+2 more
Full Time
$80,000 - $140,000*
6 days ago
Worldwide
AI Governance & Programs
Mid
Python
Excel
Google Sheets
+4 more
Full Time
$120,000 - $180,000*
6 days ago
Australia
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Data Science
+4 more
Full Time
$120,000 - $180,000*
6 days ago
Worldwide
AI Governance & Programs
Senior
AI Governance
Model Risk Management
Regulatory Compliance
+5 more
Full Time
$120,000 - $180,000*
1 weeks ago
Worldwide
AI Governance & Programs
Senior
Python
ML frameworks
LLM/GenAI tooling
+2 more