In the MISO soup


Robin Hanson declares that thanks to Big Data, we will soon discover the SUPER FACTORS that drive all of human differences:
In a factor analysis, one takes a large high-dimensional dataset and finds a low dimensional set of variables that can explain as much as possible of the total variation in that dataset. A big advantage of factor analysis is that it doesn’t require much theoretical knowledge about the nature of the variables in the data or their relations – factors are mostly determined directly by the data... 
[P]eople vary in far more ways than intelligence, ideology, and personality, and factor analyses have been applied to many of these other human feature categories. For example, there have been factors analyses of jobs, brands, faces, body shape, gait, accent, diet, leisure behavior, friendship networks, physical health, mortality, demography, national cultures, and zip codes. 
[F]actors found in different feature categories are often substantially correlated with one another. This suggests that if we put together a huge super-dataset describing many individual people in as many ways as possible, a factor analysis of this dataset may find important new super-factors that span many of these features domains. Such super-factors would be promising candidates to use in a wide range of social research, and social policy... 
I’d guess that the super-factors found in a super dataset of human details will instead be revolutionary. We will afterward see uncovering them as a seminal milestone in our progress in understanding human variation. A Nobel prize worthy level of seminality, or more. All it will take is lots of tedious work to collect a super dataset, and then do some straightforward number crunching.
Here's an object lesson in the perils of analyzing data without theory to guide you! Yes, it's easy to do a principal component analysis on a multidimensional data set and find some relatively small set of "factors" that "explain" most of the data. If we do what Robin says and throw everything we know about human characteristics into one massive data set and hit the PCA button, the STATA of the future will pop out our "super-factors" in short order.

One of the biggest super-factors will be income.

See, factor analysis doesn't tell you whether the factors cause all the other stuff, or are effects of the other stuff. In the world, there can be effects with multiple causes, and causes with multiple effects. In signals theory (a very different kind of signaling than the kind Robin is used to thinking about!), this might be called Multiple-Input-Single-Output and Single-Input-Multiple Output, or MISO and SIMO.

An example of SIMO would be anxiety disorder. A penchant for severe anxiety is going to affect your working life, your interpersonal life, your hobbies, etc. in statistically predictable ways. One cause, many effects.

An example of MISO would be income. Our marvelous market economy allows people to make money using a dizzying myriad of talents, skills, and resources. Some people make money by hitting a ball with a stick and running around a field. Some people make money by making big macro bets in financial markets, getting the first one right by luck, and then taking in billions of dollars in management fees. Some people make money by being friends with the right politician. Some people make money by inventing new kinds of semiconductors. And so on, and so on. One effect, many causes.

Since money can buy a ton of stuff, everyone wants money. And since money can buy a ton of stuff, almost anything valuable can be sold for money. So if income is among the set of characteristics in Robin's ultimate data set, it will undoubtedly emerge as one of the most important factors.

You can already see evidence of this in the media. Barely a day goes by without an announcement by Quartz or the Huffington Post that income differences predict differences in...you name it. School success, romance, self-confidence, frequency of weird eyebrow twitches. The assumption, of course, is that wealth privileges people in innumerable ways - i.e., that income is a SIMO kind of thing. But whether that's true, it's also likely true that income is a MISO kind of thing, where almost any positive or desirable trait can be leveraged - or is correlated with something that can be leveraged - to produce income. That, really, is why income is going to be correlated with almost any desirable human trait, no matter how little "privilege" remains in society.

So Robin's "super-factors" are quite possibly going to be very mundane things. MISO processes will cause a few desirable goals to be highly correlated with a large number of human traits that are useful in obtaining those goals.

Interesting, but hardly worthy of a Nobel. And a reminder that pure statistical analysis, without explicit theory to guide it, will be guided by implicit, simplistic theories.


P.S. - One thing Robin wrote that I didn't understand was the following:
As many people know, intelligence is the main factor explaining variation in cognitive test performance, ideology is the main factor explaining variations in political positions, and personality types explain much of the variation in stable attitudes and temperament.
Aren't these basically just labels? "Intelligence" is our word for cognitive test performance. "Personality type" is our word for stable attitudes and temperament. Seems to me that simply isolating a principal component and labeling it is a far cry from actually understanding what you're looking at.

Comments

Popular posts from this blog

Econ theory as signaling?

Robert Lucas in biology class

Russ Roberts predicts my policy positions