Writing multithreaded code is like juggling chainsaws; amazing when it works and truly sucky when it doesn't.
Truly summarizes the feeling when you've had to deal with a multithreaded system. He argues that probably one of the most difficult thing to achieve is "avoiding doing nothing". I agree with his thoughts in the sense that if you are even considering multithreading something, you are trying to achieve maximum utilization (i.e. not wasting resources). But I'm not sure that getting maximum utilization is the hardest part by itself.
Besides the usual problems, like avoiding deadlocks or protecting shared data, I always found that the hardest part was, to paraphrase Andrew, "avoid doing something".
What I mean is that in multithreaded applications (it also applies to distributed applications), probably the hardest part is coming up with ways to avoid needing synchronization.
At least in my experience, figuring out ways to make the system exhibit consistent and predictable behavior without relying on atomicity, has always been the part that most of the design effort is invested in, and if done properly, where the greatest gains in scalability is achieved.
Take for example the Google File System, a massive, multi-petabyte storage filesystem. It is designed to work on clusters of several thousand machines, distibuted across several datacenters (even on different countries).
To achieve the expected performance, they had to throw away the usual file semantics, and think completely out-of-the box. But don't take my word for it, go, fetch the paper (if you haven't already done so), and read it yourself.