Remote Opportunity

Test & AI Evaluation Lead

Join C-Serv as a senior professional working remotely from United Kingdom. Explore the role, benefits, and apply in one place.

Full Time
$120,000 - $180,000*
1 day ago
United Kingdom
Worldwide
AI Governance & Programs
Senior
AI
Machine Learning
Python
+4 more

Job Description

Salary: Competitive depending on experience Location: 2-3 days on-site at our Harwell office with travel to client site when required Contract type: Full-time permanent - 37.5 hours A note from the Founders Oxford Dynamics is at an inflection point. We operate in some of the most complex and high‑stakes environments in the world - defence, national security, AI and robotics. The decisions we make now, will define not just how fast we grow, but who we become. You will work closely with all the team. You will be trusted with judgment calls. You will influence the business. And you will see the impact of your work every day in the work we do. If you are excited by ownership, pace and purpose - and by building something that genuinely matters - we would love to hear from you. Who We Are Founded in 2020, Oxford Dynamics (OD) is a fast‑growing UK deep‑tech company developing AI and robotic systems designed to operate in mission‑critical environments. Our flagship AVIS (A Very Intelligent System) AI framework fuses multi‑modal data - text, imagery, telemetry and sensor feeds - enabling operators to interrogate complex information at speed and make better decisions under pressure. Our STRIDER robotic platform performs autonomous tasks in hazardous environments, protecting people while extending operational reach. Our ambition is simple but demanding: to converge AI and robotics so machines can sense, understand and act in complex, real‑world environments. We work with defence and security organisations internationally to help protect nations, infrastructure and lives. What you will be doing here/ why this role matters Oxford Dynamics is a small team who rely on a collaborative and positive approach and so the right attitude for this role is equally as important as experience. We are at an important stage and time in our growth, and as a Senior AI Generative Robotics Engineer you will be an essential part of our success. You’ll work at the cutting edge of agentic and generative AI, building systems that move beyond lab demos and into real-world deployment at pace. At Oxford Dynamics, you’ll have the freedom to experiment in a fast-moving environment, the responsibility to deliver, and the opportunity to shape how multi-agent AI systems operate in complex, constrained, and high-trust environments. If you’re excited by agent orchestration, VLLMs, and deploying AI where it matters, this role is built for you! Role Summary We're hiring a Test & AI Evaluation Lead to own how Oxford Dynamics validates its AI-driven, mission-critical systems - from multi-agent orchestration and LLM outputs through to cloud infrastructure and real-time user-facing applications. You'll design and lead test approaches where correctness, resilience, and security matter as much as feature velocity. Working embedded with AI, Backend, Frontend, and DevOps, you'll shape how we validate agent behaviours, data pipelines, and end-to-end operational workflows - from research prototypes through to production deployments for Defence and Security customers. Quality is built in from day one, not inspected at the end. Key ResponsibilitiesTest Strategy & Leadership Define and own the end-to-end test strategy across AI, backend, frontend, and infrastructure layers. Establish testing standards appropriate for agentic AI systems, including non-deterministic behaviour and probabilistic outputs. Ensure testing aligns with mission-critical, safety-conscious, and security-first delivery expectations. Act as the primary quality authority across projects, advising engineering and product leadership on risk and readiness. AI & Data-Focused Testing Design approaches for testing multi-agent workflows, including orchestration logic, memory/state handling, and tool integrations. Define validation strategies for LLM outputs, including groundedness, hallucination detection, task success rates, and regression testing. Work with AI Engineers to embed evaluation metrics and pass/fail thresholds into pipelines. Validate data ingestion, transformation, and inference pipelines across structured and unstructured data sources. Automation & Tooling Drive a test-automation-first mindset, integrating tests into CI/CD pipelines (GitHub Actions, Argo CD). Oversee automated testing across API and service layers, UI (E2E and accessibility), and infrastructure and deployment workflows. Select, implement, and evolve testing tools and frameworks appropriate to modern cloud-native and AI systems. Non-Functional Testing Own performance, scalability, reliability, and resilience testing for distributed systems. Coordinate security testing activities in line with secure-by-design principles (e.g. IAM, secrets handling, data boundaries). Validate backup, disaster recovery, and failover scenarios alongside DevOps and Backend teams. Delivery & Collaboration Embed with delivery teams to ensure testing is planned early and executed continuously. Work closely with Product and Engineering to define clear acceptance criteria and definition of done. Provide clear, decision-ready quality reporting to technical and non-technical stakeholders. Support customer-facing demonstrations, trials, and operational readiness assessments. Required Skills & Experience Proven experience as a Test Manager, Senior Test Lead, or equivalent on complex software systems. Strong track record of taking applications into production in regulated environments. Strong background in automated testing across APIs, services, and UIs, integrated into CI/CD pipelines. Experience testing distributed, cloud-native systems (AWS, GCP, or Kubernetes), including performance, reliability, and resilience. Awareness of compliance frameworks (e.g. ISO 27001, NIST, OWASP). ISTQB Advanced / Test Manager certification or equivalent practical experience. SC Clearance or eligibility to obtain UK SC Clearance. Preferred Experience Experience in UK defence, public sector, or security environments. Experience testing AI/ML/LLM-based systems, including non-deterministic outputs. Exposure to agent-based or workflow-driven architectures. Soft Skills A pragmatic, delivery-focused mindset - able to balance speed with rigour. Comfortable operating in fast-moving, ambiguous, R&D-heavy environments. Confidence challenging assumptions and raising quality risks early. Strong written and verbal communication, especially around complex technical risk. Why Oxford Dynamics? Join the most exciting growth area in the UK: AI and Robotics! Every member of the Oxford Dynamics team has a major impact on the products and services we provide. Regardless of job title, you’ll get to make a real difference and learn from colleagues about all areas of our business. Benefits include: Salary: negotiable based on experience and attitudes Rapid career progression with meaningful ownership of core systems Opportunity to shape the future of a fast-growing, successful, early-stage business Flexible working hours Hybrid working model Company pension (UK Government NEST scheme) with company contributions at 4% Private Healthcare 29 days holiday in addition to public holidays (Full Time Equivalent) Oxford Dynamics is committed to creating an inclusive team experience for all. Regardless of race, gender, religion, sexual orientation, age, disability, or parental status, we believe our work is at its best when everyone feels free to be their authentic self. Why This Role? You'll play a critical shaping role in how Oxford Dynamics delivers trustworthy, production-ready AI systems into some of the most demanding operational environments there are. If you enjoy working close to the technology, influencing how systems are built - not just tested - and tackling the realities of validating AI-driven software, this role gives you genuine ownership and impact.

Requirements

  • Design and lead test approaches for AI-driven, mission-critical systems
  • Validate agent behaviours, data pipelines, and end-to-end operational workflows
  • Work embedded with AI, Backend, Frontend, and DevOps teams
  • Shape how we validate agent behaviours, data pipelines, and end-to-end operational workflows
  • Collaborate with the team to ensure correctness, resilience, and security of AI systems

Benefits

  • Flexible Hours
  • Gym Membership
  • Health Insurance
  • Home Office Budget
  • Learning Budget
  • Performance Bonus
  • Remote Work
  • Remote Work Stipend

Skills

AI
Machine Learning
Python
Java
C++
Cloud Infrastructure
Robotics

About AI-Estimated Salary

The salary range shown was not provided by the employer. Our AI has estimated it based on the job title, required experience, location, and industry standards (confidence: 80%). This estimate should be used as a general guide only and may not reflect the actual compensation. Always confirm salary details directly with the employer during the application process.

Ready to Apply?

Join C-Serv today

Salary Range (AI-Estimated)*
$120,000 - $180,000
80% confidence
Posted 1 day ago

More AI Governance & Programs roles you might like

Discover similar opportunities from companies that are also hiring remotely.

Full Time
$120,000 - $180,000*
9 hours ago
India
Worldwide
AI Governance & Programs
Senior
Java
AWS
SQL
+5 more
Full Time
$169.1k - $270.8k
11 hours ago
Worldwide
AI Governance & Programs
Staff
Machine Learning
Generative AI
Python
+1 more
Contract
$0.01k - $0.014k
12 hours ago
Worldwide
AI Governance & Programs
Entry
Large Language Models
Problem-solving
Language analysis
+3 more

Explore more remote openings

Browse fresh listings from our global community of remote-friendly teams.

Contract
$0.01k - $0.014k
12 hours ago
Worldwide
AI Governance & Programs
Entry
Italian
Large Language Models
Structured Guidelines
+3 more
Contract
$0.01k - $0.014k
12 hours ago
Worldwide
AI Governance & Programs
Entry
Italian
Large Language Models
Structured Guidelines
+3 more
Full Time
$150,000 - $250,000*
13 hours ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
AI/ML
IAM
+5 more
Contract
$0.01k - $0.014k
13 hours ago
Worldwide
AI Governance & Programs
Entry
Italian
Large Language Models
Problem-solving
+2 more
Full Time
$130k - $155k
15 hours ago
United States
Worldwide
AI Governance & Programs
Mid
Artificial Intelligence
Machine Learning
Data Analysis
+4 more
Full Time
$120,000 - $180,000*
16 hours ago
United States
Worldwide
Model Risk Management & Validation
Senior
Python
Machine Learning
Data Analysis
+2 more
Full Time
$120,000 - $180,000*
1 day ago
United States
Model Risk Management & Validation
Senior
Model Risk Management
Model Validation
Model Governance
+3 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
Java
Python
Spring
+5 more
Full Time
1 day ago
United States
AI
Senior
Python
Git
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Data governance
AI Compliance
Regulatory Compliance
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Data governance
AI Compliance
Regulatory Compliance
+4 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
Data governance
AI Compliance
Regulatory Compliance
+3 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
Python
Machine Learning
Application Security
+4 more
Full Time
$135k - $150k
2 days ago
Worldwide
AI Governance & Programs
Mid
statistics
Economics
Finance
+4 more
Contract
$0.015k - $0.017k
2 days ago
Canada
Worldwide
AI Governance & Programs
Mid
German
Large Language Models
Problem-solving
+2 more
Contract
$0.015k - $0.017k
2 days ago
Worldwide
AI Governance & Programs
Mid
German
Large Language Models
Problem-solving
+2 more
Contract
$0.017k - $0.018k
2 days ago
Worldwide
AI Governance & Programs
Mid
German
Large Language Models
Problem-solving
+2 more
Full Time
$99.75k - $120.225k
2 days ago
United States
Worldwide
AI Governance & Programs
Senior
Python
Data classification
AI tools
+4 more
Full Time
$127.5k - $191.25k
2 days ago
Worldwide
AI Governance & Programs
Senior
Data classification
Metadata Management
Policy Enforcement
+5 more
Full Time
$180k - $200k
2 days ago
United States
Worldwide
AI Governance & Programs
Senior
Asset Liability Management
Model Risk Management
Risk Oversight
+5 more
Full Time
$120,000 - $200,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
TensorFlow
PyTorch
Deep Learning
+4 more
Contract
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
AI/ML systems
Threat Detection
Incident Response
+5 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
APIs
Log analysis
Cloud Security Services
+5 more
Contract
$120,000 - $180,000*
2 days ago
Worldwide
AI Security & Privacy
Senior
AI Security
Data Protection
Artificial Intelligence
+5 more
Full Time
$120,000 - $180,000*
2 days ago
Worldwide
AI Governance & Programs
Senior
APIs
Data Processing
Cloud Computing
+5 more
Full Time
$242k - $302k
3 days ago
United States
Worldwide
AI Compliance & Legal
Senior
AI
Data Privacy
Regulatory Compliance
+3 more
Full Time
$5.716k - $9.211k
3 days ago
Worldwide
AI Governance & Programs
Senior
Agentic AI
Generative AI
Logging
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
Python
SQL
Data Analysis
+4 more
Full Time
$160k - $175k
3 days ago
United States
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
Data Science
+3 more
Full Time
$120,000 - $180,000*
3 days ago
United States
Worldwide
AI Governance & Programs
Lead
Artificial Intelligence
Generative AI
Data Loss Prevention
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
Generative AI
Machine Learning
Python
+1 more
Full Time
$100,000 - $150,000*
4 days ago
Worldwide
AI Governance & Programs
Mid
API
GitLab
Backend Development
+2 more
Contract
$0.012k - $0.016k
4 days ago
Worldwide
AI Governance & Programs
Entry
Large Language Models
Problem-solving
Language analysis
+2 more
Contract
$0.012k - $0.016k
4 days ago
Worldwide
AI Governance & Programs
Entry
Japanese
Large Language Models
Problem-solving
+2 more
Contract
$0.012k - $0.016k
4 days ago
Worldwide
AI Governance & Programs
Entry
Japanese
Large Language Models
Problem-solving
+5 more
Contract
$0.012k - $0.016k
4 days ago
Worldwide
AI Governance & Programs
Entry
Japanese
Large Language Models
Linguistics
+5 more