Measuring what matters: slides from the Happiness Conference

I’m moderating the “Measuring what matters” panel at the Happiness Conference, and giving a very short presentation as part of that. My slides never stand up well on their own because I avoid putting text on them that overlaps with what I’m going to say, so rather than simply posting the powerpoint file I decided to turn it into a blog post. Here’s roughly what I’ll be saying at the panel, fleshed out with more supporting detail than I’ll have time to go into at the conference:

I was part of the team that put together the Happiness Initiative’s survey last year, and I’ve been looking at the data it’s collected. I won’t go into the process behind it, because Ryan Howell gave a good account of that yesterday, but I do want to very briefly show you the sort of data it gives us, some evidence to support my faith in this tool, and some open questions that we have to address.

The survey starts by asking three very broad life evaluation questions:

  1. [“Cantril’s Ladder“] Please imagine a ladder with steps numbered from zero at the bottom to ten at the top. Suppose we say that the top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible. If the top step is 10 and the bottom step is 0, on which step of the ladder do you feel you personally stand at the present time?
  2. All things considered, how satisfied are you with life as a whole nowadays?
  3. Taking all things together, how happy would you say you are?

The mean of a person’s answers to those three questions is cited as their Satisfaction With Life (SWL) score, and is arguably the headline number from the survey. Everything I’m interested in doing with this tool is ultimately aimed at increasing peoples’ SWL, so the importance of all the other questions stems from their relevance to SWL. We also ask for a lot of demographic information, so one of the things we can do is look at how different groups of people vary in their life satisfaction. As an example, here’s a map of SWL in the Seattle area:

Click on the image for the interactive version. Note that only ZIP codes with 10 or more responses are included, to protect individual respondents’ privacy.

After the obvious SWL questions, the survey asks about 12 “domains of happiness”, each of which is an area of peoples’ lives that contributes to overall satisfaction with life. We didn’t pluck these out of the air—they’re all based on extensive prior research and Ryan’s validation study—but it’s still important to test the validity of that assumption. We’ve now had over 12,000 responses to our survey, so we can look at how they interact in a nice big sample. Here are the correlations between each domain and Satisfaction With Life:

To put these numbers in context: a correlation coefficient runs from -1 to 1, with 1 meaning that the two scores vary completely in step with each other, 0 meaning they’re completely unrelated, and -1 meaning that one is the inverse of the other. Some of these may not look like particularly strong effects, but because the sample is so huge they’re all significant at p<0.005 or better, which means that there is less than a 0.5% risk that this relationship would appear by chance.

I’m not going to present a chart of it, but I also checked the correlation of every individual question with SWL, and that provided more support for the validity of the measure. All but one of the questions that we had expected to contribute to SWL correlates significantly at p<0.05 or better, which is the usual minimum threshold to call a result “significant” in a psychology paper. The one question that doesn’t asks people how much they agree with the statement “[over the past week] I have had plenty of spare time”, and I’m not sure what to make of that, but to only miss on one out of 67 questions is good enough for me until I have time to explore that further.

After all that patting myself on the back, though, it’s time to start in with the caveats. We’re not just measuring happiness for the sake of it – this is explicitly a tool to promote and support policy changes that improve everybody’s well-being. To that end, it’s disappointing that the most strongly correlated domains are things that governments either can’t influence, or have no business messing with:

  • Governments can provide the kind of support that helps the people who score worst on psychological well-being, but I’m not sure what they can do to move people from the middle of that scale to the top.
  • Social Support” as a domain asks about pretty personal stuff: I don’t want even the most well-meaning of governments getting involved in how many friends I have or whether I feel loved.

I don’t see these as major weaknesses of the initiative—all this personal stuff is also worth measuring and teaching people about—but it does affect where we need to focus our attention if we’re going to influence policy. I’m more worried about the next caveat, which is that we have to do a lot of conjecturing about what causes differences in scores. As an example, here’s a map of the Community Vitality scores for the Seattle area, which assess peoples’ trust in their neighbours, feeling of safety where they live, and involvement in their community:

Note the three areas I’ve given nametags to. They all scored below average, and my best guess for why is quite different in each case. Bothell’s the sort of suburb where people typically drive a long way for work and for most services, and it’s well documented that people who live in that kind of place are less engaged with their community and trust their neighbours less. The University District has a particularly large number of short-term residents because it’s home to a huge university, and they tend to be less engaged with their neighbourhood precisely because they’re short-term residents. And the Central District combines some real poverty, a bit of a crime problem, a major image problem and ongoing controversy about gentrification. To make things murkier still, at least two of these neighbourhoods I’ve described only account for a subset of the ZIP code, but I’m assuming that they’re a big enough chunk to drag down the overall score. The real problem is that word assuming: backing up my assumptions with hard data would need more information than we have from this survey. This is a solvable problem, but it needs much more labour-intensive research that throwing a survey at the web and seeing what comes out, and it needs people with local knowledge to interpret the data.

The next caveat also relates to this issue of throwing a survey at the web. Most of our data comes from what we in the trade call a “convenience sample”: we simply launched the survey, promoted it as well as we could, and let whoever felt like taking it take it. The beautiful thing about this is that we can get huge quantities of data that let us pick up even relatively weak relationships between aspects of the survey. But the cost is that we get systematic sampling biases that make our data less representative of the population at large than it would be if we had a proper scientifically sound sample. Some of the expected biases are that people are more likely to want to take the survey if they are wealthier, more highly educated, have more free time, and are more interested in happiness as a subject. All of those make me expect a convenience sample to overestimate happiness, and I’ve actually been able to test that intuition. As part of Eau Claire, WI’s happiness initiative, the University of Wisconsin-Eau Claire gathered a convenience sample and a scientific sample in parallel, and here are the results:

The differences aren’t huge, and I must confess I haven’t done any significance testing on these, but the consistency is telling. On every single domain the convenience sample scores higher than the scientific sample which we can expect to be more accurate. So why do we use convenience samples at all? Money: collecting a scientific sample is a costly process requiring a certain amount of survey expertise and labour, and the Happiness Initiative has been struggling for money from the outset. In an ideal world, we’d base all of our discussion on scientific samples only. Until we can get to that, the more comforting news is that the relative pattern of the domains is pretty consistent between the two sampling techniques, so I still think we can have a meaningful conversation about convenience sample data. We just have to remember that it’s probably a bit overoptimistic.

So that’s an overview of the survey, some of my concerns about it, and why none of those concerns are show-stoppers. Now for some detail. Here are the individual questions that correlate most strongly with satisfaction with life:

Question[s] Pearson’s r (correlation coefficient) Significance threshold (probability that it’s a false positive)
Entire Psychological Well-Being section (5 questions) Each >0.58 p < 0.0005
How satisfied are you with your personal relationships? 0.55 p < 0.0005
How often did you feel lonely during the past week? 0.54 p < 0.0005
How often did you feel loved during the past week? 0.52 p < 0.0005
For how much of the past week did you have a lot of energy? 0.49 p < 0.0005
How satisfied were you with your ability to perform your daily living activities? 0.48 p < 0.0005
How satisfied are you with your work life? 0.48 p < 0.0005
The conditions of my job allow me to be about as productive as I could be. 0.45 p < 0.0005
How satisfied are you with the support you get from your friends? 0.45 p < 0.0005

But as you might expect from what I had already written above, most of these aren’t things that City Council is ever likely to take action on. So here’s a possibly more useful set. This next table is those of the questions that I expect local government to take an interest in which correlate most strongly with SWL:

Question[s] Pearson’s r Significance threshold
How would you describe your feeling of belonging to your local community? 0.43 p < 0.0005
I have enough money to buy things I want. 0.36 p < 0.0005
In my daily life, I seldom have time to do the things I really enjoy 0.33 p < 0.0005
How satisfied are you with your access to well-paying job opportunities? 0.32 p < 0.0005
How healthy is your physical environment? 0.32 p < 0.005
How much do you trust businesses in your community? 0.31 p < 0.005
How satisfied are you with your access to activities to develop skills through informal education? 0.31 p < 0.005
How satisfied are you with your access to sports and recreational activities? 0.30 p < 0.005

Now this looks more like data I can hand to the mayor and challenge him to improve. But there are still two big issues left to discuss, and these are the ones I think give us the most trouble.

Although each of these has a correlation coefficient with SWL of roughly a third, that does not mean that each contributes a third of peoples’ happiness. If only it did! We could just ask the mayor to fix three things and we’d all be happy. The serious challenge for us is that each of these things is individually such a small factor that it’s very difficult for us to pick out the signal of an improvement in well-being because the city did something good from all the noise of everything else going on in everybody’s lives. That makes it harder to enlist politicians, since they always want to know they’ll get credit for the things they do, and harder to counter the inevitable opposition that any change gets.

The other issue is more of a general problem with happiness research: we can quote correlations until the cows come home, but correlations do not prove causality. This is so important that I’m going to repeat it, and the next time you see a correlation presented in a news story as proof that A causes B please repeat this ten times like a mantra: correlation does not prove causality. For example, knowing that there’s a 0.55 correlation between how satisfied someone is with their personal relationships and how satisfied they are with their life as a whole does not tell us which, if any, of these explanations holds:

  • Satisfaction with personal relationships is a significant driver of satisfaction with life.
  • Being happier in general makes people feel better about their personal relationships.
  • People who have a general tendency to use positive words to talk about their own lives rate everything that we ask them higher.
  • Both measures are in fact driven by the phase of the moon when a person was born, and they just happen to both be influenced in the same way.

These two issues, taken together, set up what I think is the grand challenge for measuring what matters: if we’re serious about using happiness data to influence public policy, we need to go beyond reporting distributions and correlations and start actually proving that the things we say influence happiness do so.

This entry was posted in My work and tagged , , , , , . Bookmark the permalink.