Paul Snively's Blog

Musings on Computer Science, Physics, Mathematics, and Other Stuff

The Path to Jaynes

| Comments

The Path to Jaynes: Introduction

Genesis

Hi! My name is Paul Snively. Since I’m relaunching my blog, I figure I might as well start from the beginning, so here goes.

I write software for a living. Like many programmers, I’ve been fascinated by the notion of Artificial Intelligence for a long time. In fact, it’s why I became a programmer: I’m old enough to have played the Zork trilogy on a Model I TRS-80 as a teenager, and I wanted to know how a game could understand some pretty sophisticated English imperatives, such as “Take everything from the table but the brown bag.” I’ve also always had a severe itch to understand things at the most fundamental level possible. I credit this both to being a Lutheran pastor’s grandson and the son of two teachers: the former definitely informs my belief that we live in an ordered universe, and the latter definitely informs my belief in the human ability to learn.

I was in the right place at the right time. I grew up (for some definition of “growing up”) in Columbus, Indiana, which is about 40 miles away from Bloomington, Indiana, home of Indiana University. While still in high school in the early 1980s, I enjoyed studying math and physics as well as working with the math department’s computers, and found myself reading The Little Schemer (well, actually, back then it was the SRI edition of “The Little LISPer”) and Gödel, Escher, Bach: An Eternal Golden Braid. As you might imagine, I was hooked, which was pretty convenient given where I was. I went off to IU as a freshman in 1984, with the intention to take a double major in computer science and physics, having met Douglas Hofstadter while visiting the campus. I was very excited! I was pretty good at math—I’d passed an entrance exam that would have allowed me to take a calculus course that would have covered differential calculus one semester and integral the other, and I’d already published some software while in high school, had two years of physics, and a year of calculus, so I was feeling pretty confident.

I got to IU in the fall of 1984. I hung around Lindley Hall a lot, making a nuisance of myself around Dr. Dan Friedman, another IU hero of mine. But Dr. Hofstadter was on sabbatical that year, which was disappointing. Worse, no one replaced him in teaching Honors Intro Computer Science, so I was stuck writing FORTRAN on punch cards to feed to a CDC Cyber 855. My confidence on entering was misplaced: my physics class had us doing things like triple integrals. My homework assignments were taking me an average of an hour per problem, which might have worked out if it had been my only class. IU wouldn’t let me declare a major, and required undergrad courses like psychology, sociology, and macro- and microeconomics. As difficult as it might be to believe, I was even more immature at 18 than I am now. I didn’t deal with my disappointment and frustration well at all.

Crash

The embarrassing, humiliating, and shameful truth is that I flunked out. This wouldn’t be so bad in and of itself, but my wonderful parents were paying for this and had such high hopes for me. I had wasted thousands and thousands of dollars of their hard-earned money, and in 1984-1985, it wasn’t at all obvious how a geeky kid with a Model I TRS-80 was going to make his way in the world of adults. To make an already-long story at least a little shorter, things obviously worked out pretty well. It turns out that being a largely self-educated programmer is feasible, or at least it was in the mid 1980s. But I threw away a huge opportunity in not figuring out how to conform to the requirements of the liberal arts undergraduate program at IU. One of the things that came to an abrupt end was my education in continuous mathematics, which finally brings me to the point.

The Middle Ages

Even as a self-educated programmer, my exposure to discrete mathematics and formal logic has been, I believe, pretty good. Basic set theory, Boolean algebra, even propositional logic with ∀ and ∃ are reasonably familiar and comfortable. As far as AI technology goes, I spent a lot of time with rule systems, frame systems, decision trees… essentially all of the material that’s covered so well in Peter Norvig’s Paradigms of Artificial Intelligence: Case Studies in Common Lisp, which I was proud to have reviewed in manuscript while being the only MacDTS engineer at Apple supporting Macintosh Common Lisp. But then things changed out from under me, as they so often do.

Fast Forward

Peter’s next text with Stuart Russell, Artificial Intelligence: A Modern Approach, has very different goals from PAIP. PAIP was deliberately historical in nature, showing how iconic approaches to AI could be implemented elegantly in Common Lisp. As a treatise on advanced Common Lisp programming, PAIP still holds up quite well. But AIMA is more properly a survey of modern approaches to AI, particularly Bayesian probability. Although definitely oriented toward practicality, with sample code available in multiple languages, AIMA nevertheless is, in my opinion, best thought of as a very extended bibliography, offering pointers to where a deeper dive might be worthwhile. Looking around the landscape for more information about probability theory eventually led me to Probability Theory: The Logic of Science, by the late E.T. Jaynes.

Like Lightning From a Clear Sky

One thing that immediately struck me upon encountering Bayes’ Theorem: is that it reduces to modus ponens: when the support that evidence B gives to proposition A, P(B|A)/P(B), equals 1.0. To me, this gave Bayes’ Theorem some nice intuitive justification. But it never occurred to me to wonder if the observation could be generalized, and if so, how. This is where Jaynes came in:

…neither the Bayesian nor the frequentist approach is universally applicable, so in the present, more general, work, we take a broader view of things. Our theme is simply: probability theory as extended logic. The ‘new’ perception amounts to the recognition that the mathematical rules of probability theory are not merely rules for calculating frequencies of ‘random variables’; they are also the unique consistent rules for conducting inference (i.e. plausible reasoning) of any kind, and we shall apply them in full generality to that end. It is true that all ‘Bayesian’ calculations are included automatically as particular cases of our rules; but so are all ‘frequentist’ calculations. Nevertheless, our basic rules are broader than either of these, and in many applications our calculations do not fit into either category.

Probability is logic? Sold!

There’s Just One Hitch

The following material is addressed to readers who are already familiar with applied mathematics, at an advanced undergraduate level or preferably higher, and with some field, such as physics, chemistry, biology, geology, medicine, economics, sociology, engineering, operations research, etc., where inference is needed.

“Applied mathematics, at an advanced undergratuate level or preferably higher.” Yikes!

Here’s the thing: I want to understand Jaynes. For all kinds of reasons. Partially so I can write well-conceived machine learning software, partially because if it’s true that probability is an extended form of logic then I find that philosophically important, and partially, yes, for its own sake. To get back to some old loves of mine, mathematics and physics. And hopefully to open new doors in subjects I’m a passionate amateur in, like economics.

What Does It Mean to Understand Something?

I often liked to play tricks on people when I was at MIT. One time, in mechanical drawing class, some joker picked up a French curve (a piece of plastic for drawing smooth curves -a curly, funny-looking thing) and said, “I wonder if the curves on this thing have some special formula?”

I thought for a moment and said, “Sure they do. The curves are very special curves. Lemme show ya,” and I picked up my French curve and began to turn it slowly. “The French curve is made so that at the lowest point on each curve, no matter how you turn it, the tangent is horizontal.”

All the guys in the class were holding their French curve up at different angles, holding their pencil up to it at the lowest point and laying it along, and discovering that, sure enough, the tangent is horizontal. They were all excited by this “discovery” -even though they had already gone through a certain amount of calculus and had already “learned” that the derivative (tangent) of the minimum (lowest point) of ANY curve is zero (horizontal). They didn’t put two and two together. They didn’t even know what they “knew”.

They don’t have intelligence. They have what I call “thintelligence.” They see the immediate situation. They think narrowly and they call it “being focused.” They don’t see the surround. They don’t see the consequences.

I’m no Richard Feynman, but I want to do better than those other students. I want to see the consequences.

OK, don’t panic. Let’s see. E.T. Jaynes was a physicist. So what would a good applied mathematics curriculum look like from that point of view? At the very least, it would include analysis (real, complex, Fourier, functional…), linear algebra, and differential equations, probably in that order. So let’s think about analysis first. The good news for someone like me is that it turns out “analysis” is just calculus through the lens of logic! I already have some background in logic, as you may have noticed from my mentioning the modus ponens inference rule above. So this is already encouraging. If you don’t have any background in formal logic, please let me recommend Logic and Structure as a good starting point.

Some poking around Jaynes and browsing the various analysis texts available for the Kindle convinced me that I want to work through A First Course in Real Analysis. It seems nicely poised for someone with some exposure to logic, and doesn’t limit itself to the one-dimensional case the way many other introductory texts do. It also establishes a connection to differential equations, which I intend to pursue in more depth later. So my intention is to work through this text right here, in plain view, warts and all. I’ll discuss the problems and my attempts to solve them. Since this is about logic, I’ll be working with the Coq proof assistant.

Protter and Morrey begins by discussing the axioms defining a field, because the real numbers form a field. The first problem they pose, then is:

  • Show that in Axiom A-5 it is not necessary to assume there is only one number x such that a + x = 0.
Axiom A-5 is:

  • A-5. Existence of a negative. If a is any number, there is one and only one number x such that a + x = 0. This number is called the negative of a and is denoted by - a.
I immediately ran into a minor problem: Coq’s standard library’s field axioms for real numbers don’t have Axiom A-5! Instead, they assume the existence of the negative in order to say that a real plus its opposite is 0. Still, the existence of the negative is obviously in there, so it seems that instead of taking A-5 as an axiom, I can state it as a lemma and prove it using the axiom Coq does have, Rplus_opp_r. Now comes the kind of cool part: this entire blog post is written in Coq. The HTML you’re viewing was generated by coqdoc with a Makefile that massages things slightly so that I can just copy the resulting file into my blogging software and publish it. So without further ado, here is a very simple lemma proving the existence of the negative of any real number:

Require Import Reals Utf8.
Open Scope R_scope.

Lemma negative_exists : x, y, x + y = 0.
proof.
  let x:R.
  take (-x).
  thus thesis by Rplus_opp_r.
end proof.
Qed.

I import the Reals, obviously, and Utf8 because I want to use the formal symbols for “forall” and “exists,” because I’ll be seeing those in texts a lot. I name my lemma and state it: for all real numbers x, there exists a real number y such that x + y = 0. I then write the proof script in Coq’s Mathematical Proof Language, which deliberately strives to be at least somewhat similar in syntax to proofs done in the texts. In this very simple case, I just introduce the universally-quantified x, take its opposite to witness the existence of y, and then the thesis follows by the Rplus_opp_r axiom from Coq’s standard library.

This is, of course, just setting up the first problem, but this post is already quite long and it’s taken a solid couple of days of working on the infrastructure to get it to the point where I like how it works, so I think I’ll stop here. I hope a few people who want to master probability via Jaynes will join me on this journey. I will, of course, not only be publishing this, but putting the git repository with my Coq developments online as well. In the interest of full disclosure, all book links here go through my Amazon affiliate account. I’ve tried to link to Kindle editions where available both for convenience and to save trees, but sometimes important texts, like Jaynes itself, aren’t available for the Kindle. Finally, I should add that my ultimate goal isn’t just to work through Jaynes. I also want to develop software along the way, exploring the pragmatics of probability in software as well as the theory. This is going to be a long road. But I suspect that from both an intellectual and practical point of view, it may very well be one of the most rewarding ones to travel.

Acknowledgements: I’m inspired by the daily “stupid” questions after her first year of programming on @IrisClasson’s blog. Adam Chlipala graciously helped tutor me at U. Penn’s Coq tutorial after POPL a few years ago and is single-handedly proving you really can write a book entirely with coqdoc. Finally, my wonderful wife has encouraged me so often, but in particular recently, specifically to take up writing again.

This post is dedicated to my parents, who believe in me still.

Comments