Skip to main content

Research

Research Areas

Interpretability

Understanding how AI systems make decisions and what they learn from data.

Governance

Developing frameworks for responsible AI development and deployment.

Evaluations

Developing methods to assess AI capabilities, alignment, and safety properties.

Oversight / Control

Ensuring meaningful human oversight and control over AI systems.

AI Agency

Understanding and managing autonomous AI behavior and decision-making.

Security

Protecting AI systems from adversarial attacks and malicious use.

Research Opportunities

Undergraduates

We can advise and support you on dissertation and individual study projects.

Faculty

We can signpost promising research directions and funding opportunities, and support you throughout.

Recent Research by Durham AISI Members

Academic Publications

Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models

Leask, P., & Al Moubayed, N. (2025, July). Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models. Presented at International Conference on Machine Learning (ICML 2025), Vancouver, Canada.

Interpretability ICML 2025

Sparse Autoencoders Do Not Find Canonical Units of Analysis

Leask, P., Bussmann, B., Pearce, M. T., Isaac Bloom, J., Tigges, C., Al Moubayed, N., Sharkey, L., & Nanda, N. (2025, April). Sparse Autoencoders Do Not Find Canonical Units of Analysis. Presented at The Thirteenth International Conference on Learning Representations (ICLR 2025), Singapore.

Interpretability ICLR 2025

Probing by Analogy: Decomposing Probes into Activations for Better Interpretability and Inter-Model Generalization

Leask, P., & Al Moubayed, N. (2025). Probing by Analogy: Decomposing Probes into Activations for Better Interpretability and Inter-Model Generalization. Presented at Mechanistic Interpretability Workshop at NeurIPS 2025.

Interpretability NeurIPS 2025

Order by Scale: Relative-Magnitude Relational Composition in Attention-Only Transformers

Farrell, T., Leask, P., & Al Moubayed, N. (2025). Order by Scale: Relative-Magnitude Relational Composition in Attention-Only Transformers. Presented at Socially Responsible and Trustworthy Foundation Models at NeurIPS 2025.

Interpretability NeurIPS 2025

Other Research & Projects

Ghost Marks in the Machine: A Critical Review of SynthID for Code Provenance Monitoring

Sherratt-Cross, E., Farrell, T., Ogden, S., & Ryley, O. (2025, November). Ghost Marks in the Machine: A Critical Review of SynthID for Code Provenance Monitoring. Presented at Apart Research Sprint.

Security Apart Research