A while ago I wrote about the importance of broadly familiarizing yourself with the federal budget before forming any opinions about government spending. The subject was brought up several times during the last presidential debate. When watching one of these debates, in which the candidates are specifically interested in scoring points over each other even if it comes at the expense of sound reasoning, it is particularly important to keep budget figures in front of you when evaluating claims. It is also important to standardize all numbers into one unit, as un-standardized statistics can be very psychologically misleading. I’ll use billions.
One memorable sound bite involved Romney labeling PBS as a potential spending cut (I’m sure if he could do it again he would have phrased it differently, as the audience has generally remembered it as a threat to fire Big Bird). PBS, NPR, and other public broadcasting entities are funded in part by the Corporation for Public Broadcasting (private donors also contribute). Its federal appropriation in 2012 was $0.444 billion. Total enacted federal spending in 2012 (see “Outlays” under table S-1) was approximately $3,796 billion. So defunding the CPB, which would include not just PBS’s funding but all federal public broadcasting subsidies, would cut about 0.01% of federal spending. I was talking with a friend after the debates and compared it with trying to lose weight by getting a haircut, which in terms of actual proportion of one’s hair weight to one’s overall weight is probably reasonably accurate.
I think Romney’s “borrowing from China” test is a reasonable one, but at the same time anyone who wants to make government spending more manageable has to start off by looking at where we actually spend our money. And that means you’re basically looking at two things: defense and social safety nets (Social Security, Medicare, and Medicaid). Everything else is a sideshow in comparison.
Cricket’s Twenty20 World Cup has just started. I certainly didn’t follow cricket growing up but a few years ago a co-worker who grew up in cricket-mad India taught me the rules and got me interested in the sport. I still only know the basics and I only really tune in at major events such as this one, but I think it’s actually a sport that American baseball fans can really learn to enjoy. (Conversely, I think cricket fans could learn to enjoy baseball as well.)
The Twenty20 format is the shortest format of cricket, with matches lasting a few hours rather than all day or all week. I understand that it represents cricket’s attempts to reach a broader, global audience and to make matches easier to televise. Below, I wrote a twenty-step process to transform a game of baseball into a game of cricket. Of course there are simplifications, and there are no discussions of strategy (perhaps you can deduce them from the rule changes?), but this offers a little bridge to cricket for the typical American, who is already familiar with the rules of baseball.
- Eleven players to a side.
- The cricket bat is flatter, like a long paddle. The cricket ball has a single seam going in a circle around its middle.
- Make the field oval-shaped.
- Eliminate two bases and move the remaining two bases to the center of the oval. Replace the physical bases with lines behind which runners/batsmen are safe. The area behind the lines are the “creases.”
- Eliminate foul territory.
- Keep runners on both bases at all times. As there are only two bases, the batting team scores a run every time they switch places.
- Eliminate the pitcher’s mound; the pitcher (“bowler”) throws from behind one of the bases.
- Give the bowler a running start, but also require him to throw with his arm unbent.
- Give the batsman more freedom to move around, rather than remaining constrained in a batter’s box. However, if they run too far forwards they do risk being put out as they will be out of their crease.
- Instead of an umpire calling three strikes and you’re out, put wickets in the ground at both bases (“ends”). When these are struck by the ball (even if the ball hits the bat first), the batsman is out.
- If the ball hits you and the umpire judges that it would have otherwise hit the wicket, you’re out (“leg before wicket,” or “lbw”).
- If a bowled ball is not batted and passes reasonably close to the wickets, play simply continues (the batsmen can try to score runs, called “byes”).
- If a bowled ball is too far away to be reasonably batted, the umpire calls it a “wide,” the batting team gets a free run, and the bowler redoes the delivery (it does not count towards the over; see points 18 and 20 below).
- A ball that strikes the batsman but is not lbw is treated like a batted ball; runs scored then are “leg byes.”
- A batted ball that leaves the grounds on one or more bounces (like a ground-rule double) is called a “four” and is worth four runs. A ball that leaves the grounds in the air (like a home run) is called a “six” and is worth six runs. A bowled ball that leaves the grounds as a “wide” is worth five runs and the bowler redoes the delivery. There are no fences on the boundaries.
- Instead of fielders tagging or forcing out runners, the fielders get runners out by striking the wickets at each end with the ball before the runners can reach safety.
- Ten outs (“wickets”) end the innings. Note that since there are eleven players to a side, once you’re out, you will not return for the remainder of the innings.
- Every six bowled balls is an “over,” and for every over the fielding team can bring in a new bowler. A bowler who is relieved can bowl again later.
- When a bowler is substituted, he switches places with a fielder. The bench only has injury substitutes.
- In Twenty20 the number of overs is limited to 20, and each team bats only once. This differs slightly under the one-day international format (50 overs) and substantially in the traditional Test format (no overs limit, but teams can declare their innings over) but I won’t get into that here.
I recently spent a lot of money on healthcare-related expenses. I had a bad ankle sprain about 6 weeks ago that necessitated a couple of trips to an orthopedist and some physical therapy, and I also had an eye checkup and bought a new pair of glasses because my current ones aren’t very good (they get bent out of shape easily and even in shape don’t fit that well on my face and slide off a lot). Paying for healthcare is, in my opinion, a very confusing and opaque process. I think some very un-burdensome laws mandating insurers to provide clearer information could go a long way towards improving the industry.
If you’ve read Nudge, by Richard Thaler and Cass Sunstein, this concept should be familiar. For those who haven’t read Nudge, one of its recurrent themes is that people should be free to make their own economic choices, but that policymakers should do what they can to encourage people to make good decisions. One way in which they can do this is to require clear presentation of information: prices, pros and cons, simple recommendations, and so on. This approach emphasizes that the government’s role is never to tell people what they have to do but to encourage them to do the right thing, and it is a philosophy with which I broadly agree (see “Libertarian paternalism”).
Here’s the information on my medical insurance card (I have Aetna):
- The name of my insurer, the name of my insurance plan, and the employer through which I have obtained this plan
- My name, ID number, and insurance group number
- Some phone numbers, the insurer’s website address, and the insurer’s physical address
- Some cryptic figures: “PCP 20%” and “SPC 20%”
- Some fine print on the back
The point of carrying around a card like this is to have a quick reference for anything you might need during a medical procedure so that you’re as well-informed at the clinic, hospital, or wherever your point of action is. It’s great to have a nicely organized website you can look things up on at home, but you should still provide customers something that they can check on the fly. And it’s absurd to waste the real estate on the card with fine print. We should all be able to accept that an insurance card does not contain the full, definitive details of the insurance plan; a simple reference to an online document with the definitive details should do fine.
In its place, insurers ought to include more information about the plan itself. My plan is described as “BASIC – DED. $2,100/$4,200.” Does that mean my annual deductible is $2,100 or $4,200? Answer from looking on the website: it’s $2,100; the $4,200 refers to family coverage. What do those “PCP 20%” and “SPC 20%” numbers mean? Answer: I think this refers to the fact that in general I pay 20% for both primary care physicians (PCPs) and specialists (SPCs) after my annual deductible is met. Why not add a few extra words and spell this out more clearly on the card? There is blank space that could be used.
That’s it, by the way, for information about the plan. Left unstated on the card is the out of pocket maximum, which is $5,000 for me; I think it’s important to know the maximum amount of money you can spend on health care per year under your insurance plan. Other nice things to know: what are the coverage levels and frequency limits, if any, on physical exams/immunizations, cancer screenings, hospitalizations, and urgent care centers? I think coverage on these four services with some brief explanations would fit nicely on the card.
There’s only so much text you can fit on a card, but you can fit much more in non-human readable code, and you can fit much more code in a magnetic stripe or chip. If insurance formulas are standardized into a coded format, then what you could do is encode the full plan data into a magnetic stripe on an insurance card. Then, to calculate the cost to a patient, a medical services provider with a computer program capable of reading this format could simply enter the services that the patient will require, swipe the card, and have it output the total dollar price. This solves an issue that I found very annoying regarding medical bills, which is that I generally don’t know how much I’m going to be charged until days or weeks later, when I get a claim in my email. Part of the difficulty in telling patients what they’ll owe right away is variation in coverage levels. But if insurance plan coverage is standardized into a coded format, the receptionist just needs to itemize the bill and swipe, and a bill appears for the patient right then and there. The program could even automatically send claim information to the insurer as well. This would be more convenient for the patient and medical service office and, I think, less error-prone (I have an error on one of my bills that I’m going to have to sort out on the phone next Monday, and I can’t say that I am looking forward to the back-and-forth calls that this will entail).
This post may be easier to read if you have some comfort with financial mathematics.
Thousands of people across the history of finance have dutifully memorized one of the most famous results in financial mathematics, the Black-Scholes formula for pricing a European option. For the sake of completeness (skip ahead if you like), here is the formula for pricing a European call (C) or put (P) on a non-dividend-paying asset, which you can also find in countless textbooks and on countless websites:
and S is the underlying asset price, K is the strike price of the option, t is the time to option expiry, r is the interest rate out to time t, σ is the volatility of the underlying asset, and N() represents the cdf of a standard normal distribution.
It is important to remember that while this is a ubiquitous formula used to price options, so much so that option prices are thought of by many traders in terms of their Black-Scholes volatility rather than their dollar price, it is only a mathematical model and is only correct insofar as its assumptions are met. And as with all models, real life matches the model assumptions imperfectly. You could come up with another option pricing model based off of different assumptions and in some sense it would be no more “right” or “wrong” than Black-Scholes; the area of debate would be how well those assumptions fit reality.
For example, let’s say that you had an option on a small pharmaceutical company that was awaiting FDA approval on its only product, a drug upon which the entire firm’s fortunes rested. If the FDA approved, the stock would go to $100, and if not, the stock would go to $0. In this case Black-Scholes’s assumptions about the dynamics of the stock price are very poorly met, and it would not be a great model to use.
Some financiers who are particularly dutiful have also memorized formulas for the basic Black-Scholes greeks. For example, the deltas (sensitivities to underlying asset price) of a call and a put are
The relationship between the delta of a call and a put of the same strike and expiry is therefore: call delta – put delta = 1. The formulas for the deltas are strictly Black-Scholes; you can get them by taking the derivative of the Black-Scholes pricing formula, and they might not be accurate under a different option pricing model. But the relationship between the two is not, depending solely on put-call parity.
Put-call parity states that the price of a call minus the price of a put equals the discounted present value of the asset price minus the strike price. It is a much weaker assumption than those that underlie Black-Scholes. You don’t need to say anything about volatility, or Brownian motion, or continuous-time hedging. Not only that, it’s very intuitive and logical: if you have the right to buy a stock above $100 at some point in the future, and someone has the right to sell a stock to you below $100 at that same point in time, you essentially have a forward agreement to buy the stock at $100, which at that point in time will be worth the expected value of the stock less $100, and which today will be worth the stock price less the discounted value of $100 at expiry. It’s much harder to imagine scenarios in which put-call parity would be violated than in which Black-Scholes assumptions are violated (in fact Black-Scholes assumptions imply put-call parity).
What this means is that any options model that accepts the weak and almost always realistic assumption of put-call parity must have the same relationship between call delta and put delta. Let’s look at another slightly trickier example, regarding vega (sensitivity to volatility) and theta (sensitivity to the passage of time). The Black-Scholes formulas for vega and theta of a call are:
(The negative sign in the theta is there because I have represented t as time to expiry, and theta is typically thought of as how value changes as time moves forward, in which case t would be decreasing.) Let’s further assume that the interest rate is zero, so that the theta simplifies to:
In this case, the relationship between vega and theta is:
This relationship, though under the further assumption of a zero interest rate, holds under a weaker assumption than Black-Scholes: it requires that your volatility parameter (however you define that) and your time to expiry are used in the price solely in the form of an intermediate parameter σ * sqrt(t). To see this mathematically, let’s write the call price as some unspecified function of this intermediate parameter:
Then if we take derivatives with the chain rule:
and you can see that the relationship holds. If interest rates are zero, Black-Scholes does satisfy this weaker assumption; if we define V = σ * sqrt(t), the d1 and d2 terms can be rewritten as:
We might call V “total” volatility. The intuition behind tying σ and t together is that an option price depends on the probability distribution of the asset out to time t, which in turn depend on a) the value of t is and b) how “innately” volatile the asset is, represented by σ. A high-volatility asset will have a wider distribution than a low-volatility asset over the same time frame, but the low-volatility asset will have a wider distribution at some point if you examine it over a sufficiently longer time frame than the high-volatility asset. Combining the two parameters as V = σ * sqrt(t) is to say that you’ve defined your σ as a per-root-time measure of volatility, or, more simply, you’ve defined σ2 as a per-time measure of volatility. For those who have taken some stochastic math, you’ll know that this is indeed true of standard Brownian motion: variance at time t is σ2t.
Why might you be interested in this (which otherwise seems like a small mathematical exercise to kick at financial interview candidates)? Of course, the fewer assumptions your models need, the better, and we can more broadly and confidently apply any aspects of our modeling framework that depend on only a subset of the full assumptions. It’s not simply that we need to worry that much less about matching assumptions and reality, but also that these aspects of the model will be robust to changes in a real-world environment. In times of financial crisis, certain assumptions that were a very strong fit to reality for a long time may suddenly fall apart. Rather than either relying on violable assumptions or throwing out a model that does actually work most of the time, we can assess what aspects of our models rely on exactly what assumptions and be aware of what will and will not hold up in a changing environment.
I started work on a second data competition, the Heritage Health Prize, which is well-known in the community as it has a very large purse, $3 million to the winning team. The objective of this competition is to predict hospitalizations for patients, given health insurance claims data for those patients in previous years. It is a tremendous application of data analysis, as I think healthcare is extremely fertile ground for increasing efficiency by being smarter about care and prescription and procedure. I may be off-and-on with this one, working on it for a while and then letting it sit for a while; as before, my objective is to learn as much as I can, not realistically to win, and if I feel like I’m spinning my wheels I’ll drop it for a while.
What I particularly like about this competition is the “Milestone Prizes” that the organizers also award. The competition will last for two years, and every 6 months the top 2 entrants win a much smaller but not insubstantial prize, in the five-digit dollar range. In order to claim the Milestones, the winning teams must submit a write-up of their methodology, to the organizers’ satisfaction. Here are links to the Milestone 1 and Milestone 2 papers. (You can only read those PDFs if you are in the competition, unfortunately, and I don’t intend to re-share them if the organizers don’t want them to be shared.)
Two Milestones have passed, with the third coming up in a few weeks. The papers have been tremendously helpful in getting started; my initial approach has been a highly simplified version of their procedures, and it’s good enough to get to 211th place out of 1268 (though only 818 entries right now clear a naïve-ish benchmark where every entry is predicted at an optimized constant value, and I say “naïve-ish” because the method for deducing that optimized constant is thoughtful). Unfortunately my efforts at sophisticating my models along the lines of the papers have not yielded much improvement beyond my initial go, but hopefully I’ll figure something out.
Although two Milestones have passed, it is helpful to read the first Milestone papers first, because the later ones build on/make reference to the previous ones. I was surprised by the similarity of the papers’ structure, despite being written independently:
Features: from the raw data supplied by the competition, what variables became the input into your prediction algorithms? In some cases, there is no transformation; you feed the competition data right through. In other cases, the papers calculated per patient averages, minimums and maximums, etc. and fed those through.
Algorithms: in general, strong entries use more than one (see “Ensembling” below). This is where some of the ornery mathematics comes into play, and to really do a good job here you need to read some academic papers. But many of the established statistical models have already been implemented in languages such as R, so if you simply want to get an entry on the leaderboard you actually don’t need to know too much about the models; download them and run them as a black box. (I’m still learning about these models and yet I’ve managed to write implementations that use them.) R in particular has strong community development of these statistical models and is what I’ve been using. The algorithms that are new to me that I’ve been trying to learn so far are called gradient boosting and random forests.
Feature selection: a model is a combination of an algorithm and a subset of the available features. You might run the same algorithm on two different subsets of the features and call those two separate models. Models with the same algorithms may benefit less from the ensembling step (see below) because they may perform similarly well or similarly poorly on a given data point, but the papers both seem to employ this strategy to generate better predictions.
Ensembling: it seems the established way to get a strong overall model is to harness many different prediction models and ensemble them with a top-level algorithm that weights the models accordingly. The idea is that different models may perform well on different subsets of the data (for whatever reason; the “why” may not be well understood), so if you can combine them in a manner that uses the best suited model for each data point, you’ll have a very strong predictor. I actually find the papers to be a little sparse on some details here (maybe because I’m inexperienced) but I think the procedure followed by the Milestone winners is to run what’s called a ridge regression to calculate weightings for each model and for the final prediction to be a linear combination of the models.
Miscellany: One of the Milestone papers interestingly pointed out that the distribution for one feature changed sharply in the last year of available data. In finance we’d call this a “regime change.” The authors decided to toss that feature entirely as a result. They illustrated what clearly does appear to be a change in the nature of the feature’s statistical distribution but did not provide a concrete quantitative test for it, and my own efforts to write such a screen haven’t been successful so far. The issue is that you may not worry about a change in the mean or variance or even a few higher-order moments of a feature’s statistical distribution, but you may be worried if the variable’s family of distributions changed; if something used to be normally distributed and suddenly becomes uniformly distributed, that’s a real problem.
There was little attempt to impose a real-world interpretation on the raw data. The winners generally didn’t try to say something about why their models do what they do with drug prescriptions, hospitalization locations, etc. With minor exceptions, they focused on getting good data and good data mining algorithms. To some degree the selection of features induces some kind of interpretation – why did you calculate this feature? why are you picking this subset of features? – but that is not explained in much depth, and I interpret that to mean that it was not done on the basis of heavy thinking about real-world meaning of the data.
Having already been through a rookie stumbling phase with Amazon EC2, I am pleased to say I’m using it a bit more efficiently now. I’ve already got a “base” snapshot of a Linux install (Ubuntu) sitting around, and I’ve done all my work on a separate EC2 volume. If I ever want to cease work for a while, I can just detach the drive and stop the instance, and only pay for storage. If I want a lot of computing power or I want to try more than one thing in parallel, I can duplicate the volume, create some new higher-powered instances, attach the volumes to the new instances, and go. It’s pleasantly easy at this point to get started.
Over the past few days I consolidated some of what I’ve learned about EC2 and Linux into a Google document. It aims to teach someone who has some comfort with computers but may not necessarily be experienced with Linux (especially from an administrator’s perspective) how to get started with EC2. You can read it here. If you are ever interested in learning about EC2 I hope you find it useful to get your feet wet.