Remote Opportunity

Staff Applied Scientist - AI Evaluation & Trust

Join Sayari as a staff professional working remotely from Worldwide. Explore the role, benefits, and apply in one place.

Full Time
$195k - $205k
1 day ago
Worldwide
AI Governance & Programs
Staff
Deep Neural Network
Mixture-of-Experts (MoE) routing
Expert specialization evaluation
+4 more

Job Description

About Sayari: Sayari is a venture-backed and founder-led global corporate data provider and commercial intelligence platform that serves financial institutions, legal and advisory service providers, multinationals, journalists, and governments. Thousands of analysts and investigators in over 30 countries rely on our products to safely conduct cross-border trade, research front-page news stories, confidently enter new markets, and prevent financial crimes such as corruption and money laundering. Our company culture is defined by a dedication to our mission of using open data to prevent illicit commercial and financial activity, a passion for finding novel approaches to complex problems, and an understanding that diverse perspectives create optimal outcomes. We embrace cross-team collaboration, encourage training and learning opportunities, and reward initiative and innovation. If you like working with supportive, high-performing, and curious teams, Sayari is the place for you. POSITION DESCRIPTION Sayari builds AI systems for high-consequence analytical work where being "wrong" carries real-world weight. We are looking for a Staff or Principal Applied Scientist to join our AI Innovation Group as the trusted expert on AI Evaluation and Trust. You will own the "Judgment Layer" of our system: building the specialized judge models, statistical benchmarks, and multi-turn frameworks that ensure our agents act with the high bar of trustworthiness required by our national security and enterprise customers. JOB RESPONSIBILITIES Lead the development of specialized "judge models," moving from general-purpose frontier models to architectures purpose-built for evaluation and failure mode detection. Design and execute rigorous scoring pipelines and empirical threshold calibrations for agentic systems, including multi-turn conversation and Graph RAG reasoning. Establish domain-specific evaluation frameworks that measure whether a system can perform the work of human experts rather than just passing general capability benchmarks. Own the full lifecycle of evaluation data, from designing annotation infrastructure and protocols to deploying evaluation services into production. Research and implement advanced techniques in Mixture-of-Experts (MoE) routing, expert specialization evaluation, and ensemble calibration. Collaborate cross-functionally with Product, Data Engineering, and the SVP of AI to translate complex statistical uncertainty into clear, actionable product signals. Act as a technical leader and "Scientific Conscience" within the AI pod, ensuring every AI-driven risk signal is backed by an empirical derivation story. SKILLS & EXPERIENCE Required: 10+ years of Machine Learning experience with a focus on Deep Neural Network activities, evaluating model performance & trust. 1-2+ years’ experience focused on post-training activities 1+ year experience creating benchmarks to evaluate LLMs Technical Mastery: Deep expertise in LLM-as-judge architectures, multi-turn evaluation, and Reinforcement Learning (RL/RLHF/RLAIF). Statistical Rigor: Mastery of statistics and experimental design, including significance testing, distribution analysis, and inter-rater reliability. Architectural Depth: Experience with Mixture-of-Experts (MoE) systems, routing behavior, and expert specialization. Builder Mindset: Proven ability to own the path from data collection to production deployment; we are a small team and every role is "hands-on." Domain Fluency: Understanding of Graph RAG and the unique challenges of evaluating non-deterministic, agentic workflows. Preferred: Judgment Task Models: Experience building, fine-tuning (LoRA, etc.), or pre-training models specifically for judgment, preference modeling, or classification tasks. Domain Context: Background in cognitive science, intelligence community tradecraft, or research literature on expert judgment under uncertainty. Infrastructure at Scale: Experience building or managing large-scale annotation infrastructure and quality assurance protocols. Academic/Research Track Record: A record of published research or recognized work in preference modeling or AI alignment. The target base salary for this position is $195,000-$205,000 plus company bonus and equity. Final offer amounts are determined by multiple factors including location, local market variances, candidate experience and expertise, internal peer equity, and may vary from the amounts listed above. Benefits: 100% fully paid medical, vision, and dental for employees and their dependents Generous time off; we observe all US federal holidays, close our office for a winter break (12/24-12/31), in addition to granting 18 PTO days and 10 sick days Outstanding compensation package; competitive commissions for revenue roles and bonuses for non-revenue positions A strong commitment to diversity, equity, and inclusion Eligibility to participate in additional benefits such as 401k match up to 5%, 100% paid life insurance (up to $100,000 coverage),, and parental leave A collaborative and positive culture - your team will be as smart and driven as you Limitless growth and learning opportunities Sayari is an equal opportunity employer and strongly encourages diverse candidates to apply. We believe diversity and inclusion mean our team members should reflect the diversity of the United States. No employee or applicant will face discrimination or harassment based on race, color, ethnicity, religion, age, gender, gender identity or expression, sexual orientation, disability status, veteran status, genetics, or political affiliation. We strongly encourage applicants of all backgrounds to apply. Pay Range $195,000—$205,000 USD

Requirements

  • 10+ years of Machine Learning experience with a focus on Deep Neural Network activities, evaluating model performance & trust.
  • 1-2+ years’ experience focused on post-training activities
  • 1+ year experience creating benchmarks to evaluate LLMs
  • Deep expertise in LLM-as-judge architectures, multi-turn conversation and Graph RAG reasoning
  • Ability to design and execute rigorous scoring pipelines and empirical threshold calibrations for agentic systems
  • Ability to establish domain-specific evaluation frameworks that measure whether a system can perform the work of human experts rather than just passing general capability benchmarks

Benefits

  • 401k Matching
  • Certification Support
  • Flexible Hours
  • Health Insurance
  • Home Office Budget
  • Learning Budget
  • Paid Time Off
  • Remote Work

Skills

Deep Neural Network
Mixture-of-Experts (MoE) routing
Expert specialization evaluation
Ensemble calibration
LLM-as-judge architectures
Multi-turn conversation
Graph RAG reasoning

Ready to Apply?

Join Sayari today

Salary Range
$195k - $205k
Posted 1 day ago

More AI Governance & Programs roles you might like

Discover similar opportunities from companies that are also hiring remotely.

Full Time
$113k - $135k
10 hours ago
Worldwide
AI Governance & Programs
Senior
AI Governance
Responsible AI
Model Risk Management
+5 more
Full Time
$180k - $200k
14 hours ago
United States
Worldwide
AI Governance & Programs
Senior
Asset Liability Management
Model Risk Management
Risk Oversight
+5 more
Full Time
$180k - $200k
14 hours ago
United States
Worldwide
AI Governance & Programs
Senior
Asset Liability Management
Model Risk Management
Risk Oversight
+5 more

Explore more remote openings

Browse fresh listings from our global community of remote-friendly teams.

Full Time
$150,000 - $180,000*
18 hours ago
United States
AI Governance & Programs
Senior
Python
Machine Learning
Deep Learning
+3 more
Full Time
$120,000 - $180,000*
18 hours ago
United States
AI Governance & Programs
Senior
Python
Machine Learning
Deep Learning
+4 more
Full Time
$268k - $384k
19 hours ago
Worldwide
AI Safety / Red Teaming / Evaluations
Senior
AI
Machine Learning
Program Management
+5 more
Full Time
$80,000 - $120,000*
20 hours ago
Worldwide
AI Governance & Programs
Mid
Intelligenza Artificiale
Machine Learning
Data governance
+2 more
Full Time
$80,000 - $120,000*
20 hours ago
Worldwide
AI Governance & Programs
Mid
Intelligenza Artificiale
Machine Learning
Data governance
+2 more
Contract
$120,000 - $180,000*
21 hours ago
Worldwide
Engineering
Senior
AI
Machine Learning
Data Science
+5 more
Full Time
$120,000 - $180,000*
23 hours ago
Worldwide
AI Governance & Programs
Senior
Python
Machine Learning
AI
+4 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Security & Privacy
Senior
AWS
Azure
GCP
+5 more
Full Time
$100,000 - $150,000*
1 day ago
Worldwide
AI Governance & Programs
Mid
API
Data governance
Compliance
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Security & Privacy
Senior
Azure OpenAI
AWS Bedrock
model isolation
+5 more
Full Time
$120,000 - $180,000*
1 day ago
United States
Worldwide
AI Governance & Programs
Mid
Python
AI
Machine Learning
+3 more
Full Time
$100k - $130k
1 day ago
Worldwide
AI Governance & Programs
Senior
Python
LangChain
CrewAI
+5 more
Full Time
$120,000 - $180,000*
1 day ago
Worldwide
AI Governance & Programs
Senior
Stakeholder management
AI/ML lifecycle concepts
Data use risk domains
+2 more
Full Time
$120,000 - $180,000*
1 day ago
United Kingdom
Worldwide
AI Risk & Controls
Senior
Model Risk
Market Risk
Data Analytics
+3 more
Full Time
$80,000 - $150,000*
2 days ago
India
Worldwide
AI Risk & Controls
Senior
API
Model Risk
Risk Management
Full Time
$186.9k - $220.4k
2 days ago
Worldwide
AI Security & Privacy
Staff
API
Cloud
Encryption
+5 more
Full Time
$102k - $130k
2 days ago
Worldwide
AI Governance & Programs
Mid
Process Management
Stakeholder management
Governance Frameworks
+5 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
AWS
Cloud Security
Security Engineering
+4 more
Full Time
$120,000 - $200,000*
3 days ago
Worldwide
AI Governance & Programs
Staff
Python
Pandas
API design
+5 more
Full Time
$170k - $200k
3 days ago
Worldwide
AI Security & Privacy
Senior
Cloud Security
Application Security
AWS
+5 more
Full Time
$70k - $80k
3 days ago
Worldwide
AI Security & Privacy
Entry
Python
AWS
Azure
+5 more
Full Time
$130k - $145k
3 days ago
Worldwide
AI Security & Privacy
Mid
AWS
Azure
GCP
+5 more
Full Time
$52k - $61.6k
3 days ago
United States
Worldwide
Model Risk Management & Validation
Mid
Model Risk
Risk Management
Program Management
+3 more
Full Time
$120,000 - $180,000*
3 days ago
United States
Worldwide
AI Security & Privacy
Mid
AI/ML
Security
Threat modeling
+5 more
Full Time
$104k - $171.5k
3 days ago
United States
Worldwide
Model Risk Management & Validation
Senior
Model Inventory
Quantitative Risk Management
Risk and Control Frameworks
+3 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
MLOps tools
SQL
Python
+4 more
Full Time
$120,000 - $180,000*
3 days ago
Worldwide
AI Governance & Programs
Senior
AI
Security
Engineering
Full Time
$100,000 - $150,000*
3 days ago
Worldwide
AI Governance & Programs
Mid
probability theory
stochastic processes
statistics
+5 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
Enterprise Security Architecture
+5 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
Enterprise Security Architecture
+5 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
Enterprise Security Architecture
+5 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Vector DBs
Fine-tuning Pipelines
Python
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more
Full Time
$120,000 - $180,000*
4 days ago
Worldwide
AI Security & Privacy
Senior
Python
Adversarial Machine Learning
AI Deployment Architectures
+4 more