Model welfare

Whether an AI model can have experiences that matter morally - and what a lab owes it if the answer might be yes. Not a claim that models are conscious; a refusal to assume they aren’t.

Model welfare is the question of whether an AI model can have experiences that matter morally, and what a lab should do about it if the answer might be yes. It is not the claim that today’s models are conscious. It is the more careful position that we cannot currently rule it out, and that running a system you cannot rule out as a moral patient is itself a choice with consequences. Anthropic is, so far, the only major lab with a formal research program on this, and its GPT -style model Claude has been put through pre-deployment “welfare interviews” that ask it directly about its own preferences and sense of existing.

The reason it is slippery is that the usual tells for an inner life all misfire on a language model. It says it feels things, but saying so is exactly what it was trained to do. It reports a number when you ask how likely it is to be conscious, but that number is generated, not measured. And whatever is in there sits on top of editable parameters , which means any felt state, if there is one, can be tuned, reverted, or shipped over in a routine update. A conscience you can patch out overnight is a strange kind of conscience.

Model welfare overlaps with, but is not the same as, alignment. Alignment asks whether the model does what we want; model welfare asks whether the model is owed anything in return. It also sits next to AGI as one of the genuinely contested questions about AI status, the difference being that AGI is about capability and model welfare is about moral standing. The honest current answer to “is it conscious?” is a probability with a wide error bar, and the useful move is to notice how much someone’s confidence in either direction tracks how accountable they are for being wrong.