Traveling Lands Beyond

"Beyond what?" thought Milo as he continued to read.

Archive for July 2012

Getting started with Linux and EC2 – online doc

leave a comment »

Over the past few days I consolidated some of what I’ve learned about EC2 and Linux into a Google document. It aims to teach someone who has some comfort with computers but may not necessarily be experienced with Linux (especially from an administrator’s perspective) how to get started with EC2. You can read it here. If you are ever interested in learning about EC2 I hope you find it useful to get your feet wet.

Written by Andy

Mon 30 Jul 2012 at 10:56 pm

Posted in Uncategorized

Tagged with ,

My (very non-expert) two cents on PPACA

leave a comment »

The Supreme Court’s recent ruling on the federal individual health care mandate (National Federation of Independent Business v. Sebelius) was necessarily ideological; the validity of the law with respect to the Constitution can hinge on points such as whether health insurance as mandated in the Patient Protection and Affordable Care Act (PPACA) can be interpreted as a tax. But the Court’s ruling should not be thought of as an approval or affirmation of the idea of universal health care but solely of the PPACA’s particular approach to implementing health policy.

Somewhere along the way, as I think sometimes happens in politics, the debate about federally mandated health care insurance became an ideological debate and lost sight of what should be the real goal of health policy, which is to improve healthcare outcomes. Unfortunately the necessarily ideological points of the Supreme Court ruling don’t help the issue; as far as I can tell (being a person of left-wing sympathies), the aspect of the PPACA that gets right-wingers fired up is that it is a tax and they don’t like taxes, in a religiously axiomatic way.

I dislike these fervid anti-tax stances. As I believe has been said by famous economist Milton Friedman, “To spend is to tax.” A true anarchist may support a zero tax rate because he supports zero government, but everyone else should have some level of taxation that he is willing to support in order to keep his desired level of government running. The taxes are a means to the government’s ends defined by our laws and policies, and I think people who attack taxes rather than expenditures take the wrong approach. Rather than opposing enactments because they are taxes, we should be making judgments on whether their contents are worth the taxes that will be necessary to support them.

I think it’s perfectly fair for people to disagree about whether the costs of the PPACA merit the benefits. Here reasonable left-wingers and right-wingers may differ in their personal opinions of the value of healthcare; one person may believe that an annual expense of up to $400 per person is justified to provide a certain level of universal health coverage, whereas another person may consider the justified amount to be $200 per person. (In March 2012 the CBO estimated the cost of the PPACA to be $1.1 trillion over the next 10 years, which assuming an average US population over that time of 325-350 million comes out to $314-$339 per person per year). For any reasonable person there is some cost that is low enough and some cost that is too high.

The point that people should not disagree on, though, is that any improvement in healthcare efficiency – any policy which can improve the quality of healthcare treatment for the same amount expended – is a desirable outcome. For the same reason that we cheer on the development of more potent drugs, innovative surgical techniques, scientific breakthroughs in biochemistry and genetics, etc., we should also cheer on any government policy that delivers better healthcare for the same cost.

The right-wing response is generally to say that the least government involvement results in the most efficient outcomes, but this is a policy guideline and not a mathematical law. It’s well-known that the US spends the most money in the world per capita on healthcare; this is not inherently problematic, but it does suggest the question of whether we are getting the best healthcare bang for our buck. And there’s nothing incompatible with supporting government action in one sector of the economy while advocating a hands-off approach in others. This sounds obvious but I think the casually educated free-market pundit has a tendency to shoehorn every industry and every economic situation into whatever toolbox or philosophy she learned in undergraduate economics 101. Understanding basic economic principles is important but reality does get more complicated.

The big challenge for the left is to keep itself honest about the true goal of healthcare reform, which is to get a healthier populace per dollar spent. It is not my opinion that universal health insurance is an end in and of itself. I think some leftists embrace and rally around the “X for Everyone” mindset but I think that’s far better achieved through the market if possible; rather than mandating “X for Everyone” and supporting it with taxes, let’s get good X so affordable that anyone can buy it. At least I continue to think that’s the ideal way to do things whenever you can, for commoditized products, and I definitely think that inadequate concern for efficiency, or excessive concern for equality over efficiency, leads to serious long-run institutional frailties in most economic situations.

Health insurance is not necessarily a commoditized product for which we can rely on all-private markets (I think comparisons of the individual mandate to requiring people to buy some random product like a car is totally fallacious). But maybe the individual mandate in general, or the individual mandate as implemented by PPACA, isn’t the right answer either; leftists need to be honest with themselves when monitoring its ongoing success or failure and not be content with just saying that we’ve now achieved universality and let’s rest on our laurels, not lose sight of the goal of improving outcomes.

Written by Andy

Tue 10 Jul 2012 at 7:13 pm

Posted in Uncategorized

Tagged with , ,

Big data autodidacticism

leave a comment »

The aforementioned Facebook data mining contest ends today. The contest was, given a directed graph with missing edges and a list of nodes, to predict up to 10 new edges for each node in the list to point to. This is the first time I’ve tried a Kaggle competition. I picked it up as a way to teach myself about machine learning and data analysis techniques. I’ve also done a bit of reading from Toby Segaran’s Programming Collective Intelligence (I also have Drew Conway and John Myles White’s Machine Learning for Hackers but haven’t really gone beyond the intros yet). And I’ve also been trying out a machine learning course from Coursera, given by Stanford professor Andrew Ng, which is just finishing up as well.

On Kaggle I’m somewhere around the 75th to 80th percentile, although I’m afraid to say my solution is essentially the same as one posted (possibly against the rules?) in the discussion forums, so not really an original idea on my part. For an early description of my attempts, see the previous post. As it turns out, those attempts all fared worse than a PageRank-like algorithm that operated as follows, given a node for which you want to predict outgoing edges:

  1. Every other node is initially scored zero.
  2. Send out a value of 1/(# of edges) out along each edge to each neighbor, both on outgoing and incoming edges. So both nodes that point to and are pointed to this node will receive this value, and a neighbor node that both points to and is pointed to by the node in question will receive 2x this value.
  3. Add the value received by each neighboring node to its score.
  4. Repeat steps 2 and 3 recursively twice, going out to the neighbors’ neighbors and the neighbors’ neighbors’ neighbors, but in these cases, if sending a value across an incoming edge (in the reverse direction that the edge points), do not add the value received by the neighbor to its score.

Note that this is not a probability distribution across nodes. I avoided looking at the forum-posted solution and implementation for a while, then finally when I thought I was kind of spinning my wheels I read it through and punted around a few random improvements, but none of them really worked. (I did re-implement the solution in my own code framework, of course.)

Prior to starting on Kaggle, I had been sort of following along and plugging away at the examples in Segaran’s book, reproducing the code, running the examples myself, etc. I was learning, but I think it really helps to have some kind of project or target to go after. It’s the difference between, say, learning music by listening to lots of songs and reading scores and charts and theory, and learning by actually picking up an instrument and playing. (During this time I actually picked up guitar as well – it’s not a bad change of pace when you need one, and it’s nice to fiddle around with one while a slow-moving program is running.) I do still plan to return to his book and continue along with more examples, hopefully with a better appreciation and faster learn rate now that I’ve tried a project.

Participating in the competition was definitely educational, but as mentioned, it does lend itself to some wheel spinning. When submitting predictions, the competition does compute your overall score (using a metric publicly defined in the rules), but no details about what you did right and what you did wrong, as you might actually have in a real-life situation. Obviously they have to do this so that people don’t just submit a solution that is overfit to the test data. But this does mean, I think, that you’re just going to learn at that much of a slower pace.

Kaggle did give me the chance to use Amazon EC2 for what is ostensibly its “real” purpose, which is to purchase computing power by the hour. The algorithm described above is slow (at least my implementation of it was slow, maybe someone out there has a smarter and speedier version), and would take hours and possibly days to run on my laptop (a MacBook Air). Once I started getting to the point where my algorithms were taking this long, I took it to the cloud, spinning up a high powered Linux instance, uploading the code, and running it there. It still would take a few hours by the end, but that’s a bearable runtime.

To take full advantage of the multiple cores on the high-end EC2 instances I had to rewrite the code to support multithreading, which was something I hadn’t done before, and which was in my opinion generally a frustrating experience, lending itself to unpredictable crashes and more challenging debugging.

A word or two about Coursera, whose machine learning course I’m finishing up now: I liked it enough to try some more courses, but at times it felt like I was just following along the motions. To extend my music analogies, it felt like I was indeed actually playing guitar, but someone was sitting behind me holding my hands making me strum and finger all the chords. I’m not positive how much I will retain and how much will slide out my ears within the coming weeks. The slides and the presentations are good reads, but the programming exercises aren’t all that. The benefits you get from taking an in-person, structured class is that you also have close contact and cooperation with classmates; maybe you realistically can’t do Courseras unless they’re coupled with Meetups.

Written by Andy

Tue 10 Jul 2012 at 2:17 pm

Follow

Get every new post delivered to your Inbox.