I’m not an IQ expert by any means, so take this post with a grain of salt. But I want to lay out a fairly straightforward, intuitive case for why we ought to be skeptical of a certain flavor of IQ fundamentalism. I feel like a lot of terrible arguments have been made against IQ, so I’d like to add arguments that I think are more solid.
Let’s start by laying out what I understand to be the basic theoretical model that sits behind IQ. The most important observation is that performance on cognitive tasks is correlated. So a person who is good at one cognitive task tends to be good at other cognitive tasks, even if the types of cognition involved seem fairly different.
This is… kind of intuitive? I think we all sort of know that some people are smart and other people are less smart. But it does cut against certain other bits of popular folklore. For instance, we often think of some people as being good with numbers and other people being good with words. This isn’t really supported by intelligence studies, as far as I can tell. There’s not a hard trade-off between language and math skills - people who are good at one are also good at the other.
This may seem unintuitive, so let’s dig into why people might feel differently. For any particular individual, they probably find certain tasks are easier than others. So imagine a gifted writer. They excel at many language tasks, but may comparatively feel like they struggle with math. Note, however, that this is an internal subjective feeling. It can be true even if, objectively, they are still better at math than most people. Subjectively, writing feels easy for them and math feels hard for them. But from the outside, we can look and say that they’re good at everything while being particularly skilled at writing.
Obviously there can also be exceptions. Some people really are just hopeless at certain things. Or sometimes people are incredible at one really specific thing. These people don’t ruin the basic model, which is just a statistical finding, not a hard law. Performance across tasks can be correlated without it having to be always 100% the same for every person on every task.
The standard explanation for why cognitive performance across tasks is correlated is that humans possess some measure of general intelligence, and some people are more intelligent than others. General intelligence is often referred to as g. You can think of it as a sort of baseline intelligence a person brings to anything they do. Note, however, that g is more of a model than a specific measurable sort of thing. It’s not like height where we can just use a tape measure and figure it out. It’s possible that one day we’ll have a more specific thing we can measure directly - something like neuronal count, or synapse complexity, or whatever else. As I understand it though, we don’t quite have anything like that right now.
We can also talk about specific intelligences we bring to different tasks. I’ve never seen anyone else put it like this, but to me it makes sense to talk about s as a specific intelligence for a given task. This captures the fact that people are often better at some things than others. So in this model, your performance at a task is determined by g + s, where g is your general intelligence across all tasks and s is your specific intelligence related to that one particular task. g is the same across all tasks, but s varies.
The idea of an IQ test is to try and measure g. But since there’s no direct way to measure g, we try to get at it indirectly. We try and find tasks that are as unspecific as possible so as to minimize the effect of s. We may also combine different types of tasks so that the differences in s cancel out. We can then postulate that the overall score on this test is a reasonable approximation of g.
There are a few things we can say about IQ. The most important is that it is absolutely a useful measurement. IQ is very predictive of all sorts of things we care about. So IQ tests, on a basic level, work as intended - they’re a useful way to approximate a person’s general intelligence.
However, I think people are sometimes a little too impressed by this. Let’s go back to our initial empirical observation: performance across cognitive tasks is correlated. Given that, almost any test will give you a lot of information about someone’s general intelligence.
In fact, we see this with SAT scores. SAT scores are also very predictive of all sorts of things. A lot of people have suggested this is because SAT is basically a proxy for IQ. But if you sit on that for a minute, you realize that’s sort of a dumb way to think about it. The SAT isn’t a proxy for IQ. IQ and SAT tests are both proxies for general intelligence. It would be weird if they weren’t strongly correlated with each other, because they’re basically trying to measure the same thing.
And this leads into a major critique I have of IQ fundamentalism. It’s a case of mistaking the map for the territory. The language of IQ makes it seem like an objective fact about someone, like their height. But it’s not. IQ tests are fundamentally designed to rank people, normalized around 100 as a median. IQ tests are regularly tweaked and changed and renormalized.
The point about IQ tests being regularly renormalized and changed is sometimes brought up to discredit IQ as a measure entirely. That goes too far though, and ends up saying something stupid. IQ clearly works as a measurement. IQ is a useful predictive instrument. And the fact it keeps getting tweaked over time makes it work better. If they didn’t keep updating and renormalizing it, IQ measurements would eventually become untethered from reality.
But I do think we should see clearly what’s happening. IQ is not a fundamental fact of the universe, but rather a fairly crude attempt to measure something useful. There is nothing it “means” to be 115 IQ other than, at whatever point in time you took the test, you sat at some particular point in the distribution. We know from repeated testing that you’ll probably stay around the same point in the distribution over time, and we know that if you take other tests you’d probably end up at more or less at the same spot. We also can make certain soft predictions about life outcomes based on that number. But we don’t actually understand the raw physics behind human intelligence, and therefore don’t really have a firm grasp of what precisely it is that we’re measuring. We just know that a bunch of things are correlated.
One other thing that I think is important to keep in mind is that an intelligence test could be very biased but still be a useful predictor. Remember: performance on cognitive tasks is correlated. So if I make a test that boys are naturally better at than girls, it would still predict all sorts of things pretty well. Higher scores on the test would correlate with better life outcomes for both boys and girls. Even comparing individuals, you could probably compare a boy and a girl and guess the higher-scoring one is smarter. You’d be wrong every once in a while, but that’s true anyway, since the test is imperfect no matter what you do. And if you stick to comparisons within each group, the predictions would be spot-on. And similarly, if the test disproportionately favors certain tasks over others, it could still be useful. The scores would be a bit skewed, but not so badly as to ruin the test entirely.
You actually have to work pretty hard to make a test so bad it doesn’t measure anything useful. You can look at people’s grades in middle school, a completely unstandardized process, and make decent guesses that the A students will do pretty well in life and the F students will struggle.
In the above cases, since we don’t have some underlying physical reality to fall back on, I might easily just say “boys are more intelligent than girls” or something like that. If I define intelligence to mean a score on some particular test, then I wouldn’t even be wrong. It would be true that boys score better on the test than girls. But it’s unclear what I can do with that information. The fact my test shows some statistical skew between groups tells me as much about my test as it tells me about the groups being measured. And because it’s not that hard to come up with a predictive intelligence test, we shouldn’t take evidence of predictive power to be evidence of some incredible achievement in test design. And we certainly shouldn’t assume that predictive power means there can’t be any bias to the test.
The biggest takeaway here is that I think everyone, especially people thinking about education topics, should repeat this as a mantra: “performance on cognitive tasks is correlated.” If you remember this basic fact, you will have a robust world model. A student who succeeds in one class will probably succeed in other classes. An employee who is good at one thing will probably be good at other things. Smart engineers are good writers. Good writers are usually skilled at logical thinking. These things aren’t universally true, but they’re true more often than not.
But secondarily, I think everyone should be more skeptical of anything that treats specific IQ scores as being extremely meaningful, especially when you’re trying to compare across different populations. This does not mean you should reject the idea of intelligence as a measurable thing at all. Rather, once you properly internalize how intelligence works, I think you should be somewhat less impressed by how predictive any particular test is. Even very bad tests can give us a rough cut of who is smart and who isn’t. But we lack a physical model for what intelligence even is, so there’s no way to verify that a test is “really” measuring intelligence. Intelligence is clearly real, or at least it’s a model that cuts closely enough to reality to be useful. But I think we ought to be cautious about assigning too deep meaning to specific test results.
I think I would call myself an IQ consequentialist. If IQ makes you good at a skill or job, then testing for just the relevant skill should still test general intelligence. If they are separated from each other, then that only makes IQ less important. Testing for IQ in a job interview, for example, would only benefit people who have failed to use their intelligence in a meaningful way to have a stronger application. If performance is g+s, and measuring g by itself is always going to have some bias, then you should try to measure performance and at least have a more practical approximation.