Interpretability
The mission of the Interpretability team is to discover and understand how UNET/Transformer models work internally, as a foundation for AI safety and positive outcomes.
The mission of the Interpretability team is to discover and understand how UNET/Transformer models work internally, as a foundation for AI safety and positive outcomes.
The Alignment team works to understand the risks of AI models and develop ways to ensure that future ones remain helpful, honest, and harmless.
Working closely with the Patternai Policy and Safeguards teams, Societal Impacts is a technical research team that explores how AI is used in the real world.
The Frontier Blue/X Team analyzes the implications of frontier AI models for cybersecurity, biosecurity, and autonomous systems.