I just finished listening to Security Now Episode 303, in which Steve Gibson talks about his concept of password haystacks. The idea is that rather than making strong passwords in a purely theoretical sense, you design them to resist nearly all possible attacks.
I recommend you go and read Steve's page on the topic or listen to the podcast, as it really is a good idea, one which I will implement in a few key places after writing this post. In fact, you should read it right now because I'm assuming knowledge of the concept. There's two reasons that password padding works for us: one psychological, and one computer scientific. As a Computer/Cognitive Scientist, password usability is something I've taken a stab at before, and I'll even show you my implementation, but Steve's is even better, and I'll explain why.
Before I begin, I'd like to point out that Steve's reasoning about password padding is completely correct. Without any knowledge of the pattern you use to create a password, it absolutely forces an attacker to brute-force your password, at which point (alphabet size)^length is your friend. So we're going to take it as a given that the passwords are strong against attacks. What I'd like to focus on is why are they better for us as humans than 8, 10, or 12 characters of random gibberish.
Let's start with a discussion of the psychological. It has been known for 50 years that memory stores things in "chunks." For example, if I gave you the following list of letters: F-R-G-T-H-I-O-F-W-C-A-Q-N-M-F-K-I-P, and then took it away and asked you to repeat it, the research suggests that you'd be able to get about 7 letters in a row right, plus or minus 2. However, if I gave you this list A-B-C-Q-R-S-E-F-G-L-M-N-X-Y-Z-I-J-K you'd probably do much better because your brain is able to break it into six chunks of three consecutive letters each.
It's immediately obvious to see how chunking helps you remember a long, padded password. You only have to remember a word and a few algorithmic steps to remember the password, instead of all 25 characters (for example).
But there's lots of techniques like this. In fact, it was chunking that inspired my old password generation technique1 (no longer in use since I started using LastPass): Take 2 random words between 5 and 8 characters, and 2 random other characters. Arrange them randomly. Randomly capitalise one of the letters. It creates passwords of between 12 and 18 characters, but you only have to remember 5 chunks (2 words, 2 symbols, and where the capital letter is). Much easier, but very strong nonetheless.
I begin the Computer Science side of this post with an anecdote: Let's say I flipped a coin 1000 times, and it came up heads every single time. You'd probably look to see if the coin was rigged, because "What are the chances of that?" Actually, the chances are exactly the same as any other possible outcome, but for some reason we humans regard this outcome as special. Why is that?
In computer science, particularly formal languages theory 2, there's a concept known as Kolmogorov Complexity. Roughly speaking, the Kolmogorov Complexity of a string is the size (number of bits) in the shortest program that will print the string (the language used is provably irrelevant, no input allowed).
So what does this mean? Well, I can write a program that spits out 1000 heads in a row easily:
1000.times { |x| print 'H' }
But a program that printed a more complex string like 'HTTHTHTHHTHHHTHTTHTHHTHHHTHHHTHHHTHTHHHTHHTHTHHHT...' (imagine 1000 characters of that) would be considerably longer, and maybe the shortest possible program is:
print 'HTTHTHTHHTHHHTHTTHTHHTHHHTHHHTHHHTHTHHHTHHTHTHHHT...'
That's why we see 1000 heads in a row as special: its Kolmogorov Complexity is low. As you can imagine, this concept has huge implications in several fields such as the compressibility of strings.
When Steve and Leo were discussing ways to pad your password out on the podcast, they were choosing algorithmic steps that were "Kolmogorov-ically simple", such as adding 20 dots, or surrounding it with parentheses and six dashes. All of those steps are simple enough that our brains can store them into a single chunk.
But that's the cool part: you can express any length string with a "chunk-able" algorithm step, whereas the length you can express by using words as your chunks is limited by your vocabulary. Since we have a limited number of chunks that we can store, algorithmic steps can lead to longer passwords than words.
That's why the system Steve came up with is demonstrably better than the one I came up with. Both of our systems force an attacker into brute-force mode, but in mine the length of the password is limited by the length of words. With Steve's password padding, you can get much longer passwords, and in a brute force attack increasing alphabet size and length are the only things that matter.
1: It's not secure, so don't use it to get real passwords, it's just there as a demo.
2: I'd like to give a shoutout to my Formal Languages and Parsing prof, Jeffrey Shallit. His course was incredibly difficult, but he taught me a lot of stuff that is now part of my intuitive understanding of computer science. I can't remember a single theorem from the course, but I understand computing a lot better.