Thinking fast, testing slow

Many years ago I read Thinking, fast and slow by Daniel Kahneman. It seemed an interesting read at the time. It was also something of a wake-up moment. It makes many claims that, while not untestable, would not be easy to test. However, one claim it makes, in the introduction of the book, is easy to test.

In one of our studies, we asked participants to answer a simple question about words in a typical English text:

Consider the letter K.
Is K more likely to appear as the first letter in a word OR as the third letter?

As any Scrabble player knows, it is much easier to come up with words that begin with a particular letter than to find words that have the same letter in the third position. This is true for every letter of the alphabet. We therefore expected respondents to exaggerate the frequency of letters appearing in the first position—even those letters (such as K, L, N, R, V) which in fact occur more frequently in the third position.

— Kahneman, D. (2012) Thinking, fast and slow.

This statement is provided as evidence of a bias he calls availability bias. This is the idea that we are biased towards things that come more easily to mind. It seems like a reasonable guess.

I put this question to my brother, who is an enthusiastic Scrabble player. He was sure that there were more words beginning with K. But who is more likely correct, my brother, or Nobel Prize winning academic and best selling author Daniel Kahneman?

As it turns out, my brother was correct, as others have since pointed out:

Where does this claim originate?

The reference given in the book is Schwarz, N., Bless, H., Strack, F., Klumpp, G., Rittenauer-Schatka, H. and Simons, A. (1991) Ease of Retrieval as Information: Another Look at the Availability Heuristic, Journal of Personality and Social Psychology, 61 (2), pp. 195–202.

That paper states:

In a related study (Tversky & Kahneman, 1973, Experiment 3), subjects were found to overestimate the number of words that began with the letter r but to underestimate the number of words that had r as the third letter.

— Journal of Personality and Social Psychology

And the reference from that paper leads to Tversky, A. and Kahneman, D. (1973) Availability: A heuristic for judging frequency and probability, Cognitive Psychology, 5 (2), pp. 207–232.

Study 3: Judgment of Word Frequency

Suppose you sample a word at random from an English text. Is it more likely that the word starts with a K, or that K is its third letter? According to our thesis, people answer such a question by comparing the availability of the two categories, i.e., by assessing the ease with which instances of the two categories come to mind. It is certainly easier to think of words that start with a K than of words where K is in the third position. If the judgment of frequency is mediated by assessed availability, then words that start with K should be judged more frequent. In fact, a typical text contains twice as many words in which K is in the third position than words that start with K.

According to the extensive word-count of Mayzner and Tresselt (1965), there are altogether eight consonants that appear more frequently in the third than in the first position. Of these, two consonants (X and Z) are relatively rare, and another (D) is more frequent in the third position only in three-letter words. The remaining five consonants (K,L,N,R,V) were selected for investigation.

— Tversky, A. and Kahneman, D. (1973) Availability: A heuristic for judging frequency and probability

This is a slightly different claim than the one in the book. Here they are talking about word frequency rather than the number of words. Another little detail in this paper states “Words of less than three letters were excluded from the count.”

The following Python program goes through each text in the Open American National Corpus and calculates the frequencies of words with each consonant in the first and third position. It excludes words with less than three letters, as required above. And it reports all those letters where the frequency is greater for words with letters in the third position compared to the first.

count.py
def files_in_directory(directory):
	import os
	files = []
	for root, _, filenames in os.walk(directory):
		for filename in filenames:
			if filename.endswith('.txt'):
				files.append(os.path.join(root, filename))
	return files

first = {}
third = {}
words = 0
for file in files_in_directory('OANC-GrAF'):
	with open(file, 'r', encoding='utf8') as text_file:
		for line in text_file:
			for word in line.split():
				if len(word) > 2:
					words += 1
					letter = word[0].upper()
					if (letter in first):
						first[letter] += 1
					else:
						first[letter] = 1
					letter = word[2].upper()
					if (letter in third):
						third[letter] += 1
					else:
						third[letter] = 1

letters = []
for letter in "BCDFGHJKLMNPQRSTVWXYZ":
	likelihood_first = first[letter]/words
	likelihood_third = third[letter]/words
	if likelihood_third > likelihood_first:
		letters.append(letter)

print(f"{len(letters)} consonants appear more frequently in the third rather than in the first position: ({','.join(letters)})")

The result:

Console
$ python count.py
7 consonants appear more frequently in the third rather than in the first position: (D,L,N,R,V,X,Z)

According to this they are wrong in two cases: (D) is more frequent in the third position regardless of word length, and (K) is not more frequent in the third position at all. In fact, of the letters they list, (K) is the only one that isn’t found more frequently in the third position. And yet he chose (K) to illustrate the point in the book. Perhaps it showed a stronger bias in their experiment. If so, it actually provides stronger evidence against their thesis due to this error.

The source of these claims about letter frequency in words is MAYZNER M. S (1965) Tables of Single-letter and Digram Frequency Counts for Various Word-length and Letter-position Combinations, Psychonomic Monograph Supplements, 1 (2), pp. 13–32. Did they misread or misinterpret the data from these tables, or is the source data wrong?

I have tracked down this paper and I am awaiting an Inter Library Loan. We shall see


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.