Skip to content
Longterm Wiki

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

resource

Metadata

Source Tableresources
Source ID4a1e5f2bded4d079
Source URLarxiv.org/html/2406.07358v4
Children
CreatedMay 27, 2026, 2:43 AM
UpdatedMay 27, 2026, 2:50 AM
Synced

Record Data

id4a1e5f2bded4d079
urlarxiv.org/html/2406.07358v4
titleAI Sandbagging: Language Models can Strategically Underperform on Evaluations
typepaper
summary
review
abstract
keyPoints
publicationId
authors
authorEntityIds
publishedDate
tags
[]
localFilename
credibilityOverride
fetchedAt
contentHash
stableIdsid_X7Ria9U1MA
fetchStatus
lastFetchedAt
archiveUrl
stance
contextNote
resourcePurpose
resourceSubtype
typeMetadata
publisherEntityId
relatedEntityIds
enrichmentStatus
enrichmentDate
importanceScore
contentLifecycle
Debug info

Thing ID: sid_X7Ria9U1MA

Source Table: resources

Source ID: 4a1e5f2bded4d079