Scalable Human Oversight for Aligned LLMs
webA 2025 peer-reviewed paper from Babcock University (Nigeria) proposing a hybrid oversight framework for LLM alignment; relevant to scalable oversight research but published in a mid-tier journal and warrants scrutiny of experimental rigor.
Metadata
Summary
This paper proposes a Scalable Hybrid Oversight (SHO) framework combining selective human feedback, proxy reward modeling, behavioral auditing, and alignment metrics into a closed-loop system for LLM alignment. The framework addresses limitations of existing methods like SFT and RLHF, particularly high annotation costs and poor real-world generalization. Experiments across five datasets covering truthfulness, ethics, and adversarial prompts show SHO outperforms conventional approaches in safety and oversight efficiency.
Key Points
- •Proposes Scalable Hybrid Oversight (SHO) combining selective human feedback, proxy reward modeling, and behavioral auditing in a closed-loop alignment system.
- •Addresses key limitations of RLHF and SFT: high annotation costs, poor generalization to ethically sensitive or ambiguous real-world contexts.
- •Evaluated across five datasets including truthfulness, ethics, and adversarial prompts, outperforming conventional alignment baselines.
- •Introduces 'intent fidelity' as a core alignment metric, focusing on whether LLM outputs reliably reflect human values and intentions.
- •Targets sustainable, scalable deployment of LLMs in dynamic environments where oversight resources are constrained.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Alignment | Approach | 91.0 |
Cached Content Preview
Scalable Human Oversight for Aligned Large Language Models: A Hybrid Framework for Intent Fidelity | IIETA
Skip to main content
Home Journals ISI Scalable Human Oversight for Aligned Large Language Models: A Hybrid Framework for Intent Fidelity
ISI
About
Aims and scope
Editorial Board
Instructions for Authors
Article Processing Charge
Publication Ethics
Submission
Current Issue
Archive
Citation List
JOURNAL METRICS
CiteScore 2024: 2.4 ℹ CiteScore:
CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.
SCImago Journal Rank (SJR) 2024: 0.247 ℹ SCImago Journal Rank (SJR):
The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.
Source Normalized Impact per Paper (SNIP) 2024: 0.582 ℹ Source Normalized Impact per Paper(SNIP):
SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.
Scalable Human Oversight for Aligned Large Language Models: A Hybrid Framework for Intent Fidelity
Scalable Human Oversight for Aligned Large Language Models: A Hybrid Framework for Intent Fidelity
Folasade Y. Ayankoya * | Shade O. Kuyoro | Olubukola D. Adekola | Oluwasefunmi B. Famodimu
Department of Computer Science, Babcock University, Ilishan-Remo 121003, Nigeria
Department of Software Engineering, Babcock University, Ilishan-Remo 121003, Nigeria
Corresponding Author Email: ayankoyaf@babcock.edu.ng
Page: 2011-2020
|
DOI: https://doi.org/10.18280/isi.300807
Received: 17 May 2025 |
Revised: 4 August 2025 |
Accepted: 16 August 2025 |
Available online: 31 August 2025 |
Copy " data-placement="left">Citation
© 2025 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license ( http://creativecommons.org/licenses/by/4.0/ ).
isi_30.08_07.pdf
OPEN ACCESS
Abstract: Large lan
... (truncated, 55 KB total)311a21a10c96b10d | Stable ID: sid_q84v4Ctluk