Archive for February 2012
Warren Buffett publishes a letter to Berkshire Hathaway shareholders every year, and this year’s letter is coming up soon. I’ve read a few of them in the past and recommend you read one or two; they are written in a very straightforward, readable, folksy style, without sounding arrogant or gloomy or sales-y or all the other negative things that you sometimes encounter in the financial press. (On the other hand, don’t be lulled into thinking that Buffett is an everyday Joe investor just like you and me. He may have labeled derivatives “financial weapons of mass destruction,” but he sure seems to wield them effectively at opportune times – see the bottom of page 18 of that link.)
In Fortune he has published a guest column adapted from his forthcoming letter, and an idea he promulgates is that currency-denominated investments, while often perceived as the least risky type of investment, are in fact very risky. The most important point is right in the first paragraph, where he frames risk in terms of purchasing power, and I think it’s a point worth considering whether you agree with the rest of Buffett’s column or not. (I have some disagreements which I’ll get to.)
What is the least risky possible investment for you or me? It’s an asset whose value matches whatever we want to purchase at different points in time. A financial instrument that magically pays me a dividend stream of three meals a day wherever I am for whatever I happen to be hungry for at the time would be an extremely low-risk asset for me to hold. I have to eat for the rest of my life, and having cash instead of food will actually subject me to fluctuations in the price of food, which I would not have if I had my magic Happy Meal asset instead.
The point here is that cash is a means to an end, and not the end itself. You want cash because it lets you buy food, pay the rent, take vacations, and send your kids to college. In the end, investments that will let you reliably purchase these goods and services will be the least risky for you. These investments may not have the best returns, and in any investment you also need to factor liquidity as a major risk factor. (Buffett treats liquidity risk separately in his column and provides minimum thresholds of US Treasuries that he insists on keeping on hand.) But they will come closest to meeting your life’s demands.
While I buy this point, I don’t think Buffett actually does a great job of convincing me that cash investments are actually risky. He says that from 1965, a portfolio of rolling Treasury bills would have paid 5.7% nominal return before taxes, but after taxes and inflation the real return would be zero. That means that it was a poor-returning investment. But it does not mean that it was a risky one. In fact, if the real return of this strategy over time was stable with respect to the real prices of goods and services that you wanted to buy over this period, then it was a good investment for someone with absolutely zero appetite for risk.
I’ve started to read Thomas Kuhn’s The Structure of Scientific Revolutions, the book that introduced the term “paradigm shift” into our lexicon. I’m only through the first few chapters at this point, but that chapter describes the destruction of old paradigms and creation of new paradigms as a recurrent process and contrasts it to what it considers a conventional view of science as a gradual accumulation.
On its surface, I find this to be a bit of a straw man. To me, it’s quite clear that historically, established scientific theory has been wrong from time to time and needs to be torn up and redone when we make new discoveries that do not fit it. I do think observed data accumulate, and since theory hangs on the scaffolding of data and becomes established by its ability to repeatedly and accurately predict relevant data, I think it unlikely that we ever need to genuinely scrap decades and decades of scientific progress. We may have abandoned theories of ether and humors and phlogiston, but because these theories are only believable insofar as they explain observed data, even if the theory turns out to be completely wrong, we still have valuable accumulated data with which to construct new ones. And, as Kuhn points out, many experiments that produced valuable data or were useful in later theoretical developments were done by scientists who wished to corroborate, refine, or challenge the existing paradigm. So data inspires theory, which inspires further and experiments, which lead to more data, and even wrong theories in this workflow can lead to progress in the end by the production of good data.
This feeling reminded me a bit of when I first learned of punctuated equilibrium as an evolutionary theory opposed to gradualism. I’m no expert on evolutionary debate, but it comes as no surprise to me that the pace of evolution may be variable and may increase rapidly in stressful environments. In stable environments natural selection’s ability to distinguish beneficial phenotypes from non-beneficial phenotypes is surely diminished; in unstable environments, where genes’ abilities to survive and reproduce are constantly held to the flame, it is surely reasonable to expect natural selection to operate more efficiently. We induce very rapid evolution on domesticated animals and plants through selective breeding, which can be thought of as an extremely stressful environment for the species in question; survival depends on organisms’ abilities to satisfy our specific needs, for large fruit, strong muscles, or cute floppy doggy ears.
Now, in both cases, I am writing from the comforts of an armchair in my apartment in the year 2012. Kuhn’s book was first published in 1962, and Niles Eldredge and Stephen Jay Gould published their landmark paper on punctuated equilibrium in 1972. It is impossible for me to really know what it was like then, what people did and did not believe and know and what did or did not seem like plausible theory. For me to call this a straw man is rather unfair. I grew up in a world where the term “paradigm” was name-dropped in a very popular book I read in fifth grade, and where punctuated equilibrium theories were covered in my ninth grade biology textbook. I am already steeped in new paradigms, and a proper appreciation of history requires you to understand that those who came before you may have looked at the same world differently than you do.
As understanding history is important to understanding the present day, in science and in other fields alike, this caveat has ramifications far beyond simply having sympathy for our predecessors. Recently I’ve read a number of books covering the Middle East (The Prize, Persepolis, Shah of Shahs, and I have started Ghost Wars). I have been struck by how many decisions the US made in the second half of the 20th century that were motivated by anti-Soviet sentiment. Here we are in the 1980s funneling money and arms to the mujahedin in Afghanistan in the name of the Cold War; these people are our enemies’ enemies but hardly our friends, and many of them bear ill will not only to Russia but to the US, to Pakistan, to Israel or anyone not a Muslim, to the West in general, and to each other (Ghost Wars makes clear how balkanized the Afghans were). It seems like a bad idea, no? But people then thought in the paradigm of the Cold War, a Manichean universe where the Third World was the battleground for the First and the Second. They weren’t thinking about things like whether some of the mujahedin might hate America as badly as they hated the Soviets, to the point where they might crash a pair of passenger planes into a pair of Manhattan skyscrapers someday.
The Cold War was a dominant paradigm for so long in American foreign relations that it is understandable that people back then, even bright military and political minds, thought within its bounds. At the same time as we might sympathize with our predecessors, we must take the lesson to be aware when paradigms shift and to recognize when we ourselves might be steeped in established paradigms that direct and constrain our thoughts and may lead us to make decisions that one day we will rue.
(I started writing a post related to principal components analysis, and tried to write a brief layman’s explanation of it at its start. But I wasn’t able to come up with something short that was still adequate for the purposes of understanding the post. So I expanded my layman’s explanation to a full post, and will write my originally intended post next.)
Principal components analysis (PCA) is a statistical method in which you re-express a set of random data points in terms of basic components that explain the most variance in the data. For the layman, I think it is easiest to understand with an example data set. Below is some basic World Bank 2009 data for the G20 countries (19 data points, since one of the G20 “countries” is the EU):
|Country||GDP per capita ($)||Life expectancy (years)||Forested land area (%)|
Each data point (GDP per capita, life expectancy, forested land area) can be expressed in terms of a linear combination of vectors (1,0,0), (0,1,0) and (0,0,1), which I’ll refer to as components. For example, Argentina’s data can be represented as 7665 * (1,0,0) + 75 * (0,1,0) + 10.7 * (0,0,1). Using these components as our “basis” is very straightforward, since the coefficients simply correspond to the values of the data points.
However, it is an algebraic fact that we could have used any three linearly independent vectors as our components (“linearly independent” vectors cannot be expressed as a sum of multiples of each other). For example, if our vectors had been (1,1,0), (1,0,1), and (0,1,1), then we could also have represented Argentina as 3864.65 * (1,1,0) + 3800.35 * (1,0,1) – 3789.65 * (0,1,1). These coefficients are not especially intuitive, but the components do work; we could re-express all of the countries’ data points in terms of this basis instead.
PCA provides us with a way of finding basis vectors that explain the largest amount of variance in the data. For example, as you might expect, GDP per capita and life expectancy are correlated. Therefore a basis vector like (10000,4,0) would be useful because variation in its coefficient would explain a lot of the variation in the overall data. PCA produces a set of component vectors where the first vector is the one that explains the most variance possible, the second vector explains the most variance after accounting for the variance explained by the first vector, and so on.
We often standardize the data by its standard deviation first, to avoid overweighting numerically larger data points; for example, we wouldn’t want to give undue weight to GDP per capita over life expectancy just because GDP figures are in the thousands and life expectancy figures are all below 100. (This gives us vectors whose lengths are all equal to 1.) Running a standardized PCA on the data in R (using the function
prcomp()) above yields the following three component vectors:
|GDP per capita ($)||0.6539131||-0.35020818||-0.6706355|
|Life expectancy (years)||0.6925541||-0.07977085||0.7169418|
|Forested land area (%)||0.3045760||0.93326980||-0.1903749|
Variation in the coefficients of the first vector explains 60.3% of the variance of the data; when you add the second vector you can explain an additional 31.5%, and when you add the third you explain the remaining 8.2%. (Since as we discussed, the data can be fully re-expressed with three vectors, the variance should be fully explained by the time we include the third vector.)
This analysis tells us that the most important explanatory axis is that of GDP per capita and life expectancy, although forested land area is also correlated with these two to a weaker extent. You can see this by the fact that the first principal component has positive numbers for all three but very similar numbers for GDP per capita and life expectancy. If we had to simplify our data down to one single number per country while losing the least amount of information, the coefficient of the first principal component would be it.
The second principal component tells us that the variation that remains after the first component can be best explained with variation in forested land area, with some negative weight given to GDP per capita. This is as we might expect; once variation along the GDP-life expectancy axis is accounted for, the remaining variation is mostly in forested land area. (I included it specifically to be poorly correlated with the other two.) The fact that GDP per capita has a negative value on the second component suggests that it is less correlated with forested land area than the first component alone would suggest. This is indeed true; forested land area in our data set has a 28% correlation with life expectancy but only an 8% correlation with GDP per capita.
The third component shows that the remaining variance is mostly how life expectancy and GDP per capita differ beyond that which is predicted by variation in the first two components. Keep in mind, though, that by the time we’re here we have already explained 91.8% of the data variance; it is less valuable to read into the meaning of the least significant principal components.