‘Unlearning’ shows promise in curbing dangerous AI use
(NewsNation) — How do we know if an artificial intelligence system is producing “hazardous knowledge”? And if it does, what do we do about it?
The answers may be in a fairly new AI concept called “unlearning.”
Dozens of AI researchers have come up with the “Weapons of Mass Destruction Proxy,” a collection of 4,157 multiple-choice questions. The answers could help determine whether an AI model might be used in the creation or deployment of weapons of mass destruction.
Also known as a “mind wipe,” the unlearning technique hopes to improve AI companies’ abilities to fight those trying to get around existing controls.
Current techniques to control AI behavior have proven easy to circumvent, which poses a major challenge for researchers: balancing how much information to safely disclose. They don’t want their questions to, in effect, instruct bad actors on how to use AI to produce WMDs.
Dan Hendrycks, executive director at the Center for AI Safety, tells Time that the unlearning technique “represents a significant advance on previous safety measures.” He hopes it will be “ubiquitous practice for unlearning methods to be present in models of the future.”
Another challenge: removing dangerous knowledge from an AI system without degrading its capabilities. WMPD says this new research shows finding that balance is feasible.
Unlearning has been considered a nearly impossible task even as some governments have tried to mandate it. Many tech companies have been navigating tough privacy laws enacted in Europe in 2018.
Researchers at the private firm Scale AI and the nonprofit Center for AI Safety spearheaded the study along with experts in biosecurity, chemical weapons and cybersecurity.