Paul Snively's Blog

Musings on Computer Science, Physics, Mathematics, and Other Stuff

The Path to Jaynes via Gelfand Chapter 4

| Comments

This book is about algebra. This is a very old science and its gems have lost their charm for us through everyday use. We have tried in this book to refresh them for you.

You can’t do probability without calculus. You can’t do calculus without algebra. So as I mentioned in my last post, I’m revisiting this love of mine, thanks to I.M. Gelfand and A. Shen. But first, let me talk about setting up my computational environment. I’m running Mac OS X 10.6.8 (old skool for sure, and yes, it’s so I can continue to run some beloved games under Rosetta). I have Apple’s Developer tools, LLVM 3.2, and XQuartz 2.7.4 installed. Being an old Lisper, of course I’m using EMACS, and since I’m a Scheme aficionado, I have Quack installed. My tool choices, of course, reflect these prior choices, but should be relatively easy to adapt to other environments.

MIT Scheme

First, I installed MIT Scheme. This was simplicity itself: MIT makes several binary distributions available, including one for Mac OS X x86_64. What may be a bit surprising at first is that the system is distributed as an application bundle. It’s an X11 application, which is one reason I have XQuartz installed. Double-clicking it runs MIT Scheme in an environment provided by its native EMACSalike editor, edwin, in X11. This isn’t a bad thing, and if you’re already familiar with X11 and either aren’t familiar with EMACS or don’t mind learning edwin’s idiosyncracies relative to it, you may want to stay here.

MIT Scheme in Terminal.app

I don’t care for X11 and am already a fairly intense GNU EMACS user, so what I really wanted was the ability to just run MIT Scheme from the command line provided by Terminal.app, like any other language’s REPL. Thankfully, this isn’t at all hard to do, as Dustin Ingram has been kind enough to document here.

scmutils (aka Scheme Mechanics)

scmutils is the library, written in not-very-portable MIT Scheme, for symbolic and numerical computation aimed primarily at classical mechanics, with Structure and Interpretation of Classical Mechanics as its vehicle. After going through Gelfand and Bell, this is next on my list so far. How best to install scmutils isn’t immediately apparent from the page. What I wound up doing was clicking through the “this directory” link and downloading the most recent version, even though it says “gnu-linux.” Luckily, when I extracted the archive, I discovered that the distribution includes a src directory as well as bin and apparently-minimal MIT Scheme distribution directories, both of which I deleted, leaving only the manual and src directories. I then followed these instructions to compile scmutils and save a “band” to speed reloading later. I confirmed that I could reload the band from the command line with “mit-scheme -band mechanics.band” and everything worked fine.

Plotting

In the scmutils/manual directory, there’s a file, graphing.txt, that documents some plotting functions that scmutils implements. I tried them all out from the command line and they worked fine. I was feeling slightly paranoid about this because in some older versions of Mac OS X, command-line “tools” couldn’t use Aqua, and I wasn’t sure the same wasn’t true for X11. But of course it’s not.

Running scmutils in GNU EMACS

Finally, I followed these instructions on running scmutils in GNU EMACS. Since I’m not using the MIT Scheme that’s included in the scmutils distribution, I had to change the invocation accordingly, but it works fine.
So now I feel like I have the working environment I want: a very nice Scheme implementation; a CAS that should carry me through Gelfand, Bell, and the text it was developed for, SICM; some simple plotting features; and all running in EMACS. Actually, that’s not quite true. The plotting doesn’t work from within EMACS because the terminal type doesn’t support graphics (which apparently means “doesn’t support interaction with X11,” which is true). That’s OK, though; on those occasions upon which I want to plot, I’m willing to either run from the command line or from the X11 application bundle.
Before I begin digging into Gelfand and Shen’s “Algebra,” I want to shout out to another (much better!) math blog, Steven Wittens’ wonderful Acko.net. Run, don’t walk, to check it out.

Of course, you know arithmetic pretty well. However, we shall go through it once more, starting with easy things.

I’m not going to comment a lot before each problem. How much I comment on each problem will depend on the problem. Gelfand doesn’t even offer any problems until chapter 4, using the first few chapters to discuss commutativity of addition and multiplication. OK, here we go!

Problem 1. Several digits “8” are written and some “+” signs are inserted to get the sum 1000. Figure out how it is done. (For example, if we try 88 + 88 + 8 + 8 + 88, we fail because we get only 280 instead of 1000.)

This is a simple problem, and I thought it would make a good first example of using scmutils to solve simple equations, so I set out to do that. All I’ve managed to show, though, is my own naïvete. Solving this problem involves modular arithmetic, which doesn’t seem complicated. But as @cartazio pointed out to me, technically, I’m trying to solve an equation over a finite field, and it turns out this is hard. While OpenAxiom and Maxima both have pretty sophisticated support for finite fields, both systems’ support was more intimidating to me than I was prepared to deal with for Problem 1 in Gelfand, and take me too far afield from scmutils anyway. scmutils itself seems to have some nascent support for modular arithmetic, but if it ties into its equation-solving at all, I wasn’t able to discern how.
So I’ll just have to work it out longhand, but that doesn’t mean I can’t draw stuff to help me out. I know the addition looks like this:
  ...8
+  ...
+ ...8
______
  1000
I apologize for the ASCII art; the version of MathJax I’m using doesn’t yet have \cline in its \(\LaTeX\) support.
So in the beginning I don’t know how many rows I need, nor how many columns. But drawing the problem out reinforces the point that I need to find some number of 8’s that sum, modulo 10, to 0. I remember my multiplication table: \(8 \cdot 5 = 40\), so there’s my 0, carry the 4, and I now have five rows of 8’s, to which I’ll add a “carry row” at the top:
  ..40
+ ...8
+ ...8
+ ...8
+ ...8
+ ...8
______
  1000
Now I can just repeat the pattern: 4 plus how many 8’s has 0 as its last digit?
  .240
+ ...8
+ ...8
+ ...8
+ ..88
+ ..88
______
  1000
And again:
  .240
+ ...8
+ ...8
+ ...8
+ ..88
+ .888
______
  1000
So there it is: \(8 + 8 + 8 + 88 + 888 = 1000\). A fun little exercise in the whole point of algebra: thinking about unknown quantities and how they relate to each other.
On to…
Problem 2. In the addition example
  AAA
+ BBB
____
 AAAC

all A’s denote some digit, all B’s denote another digit and C denotes a third digit. What are these digits?

The key here is to realize that this time I do know the number of rows: 2. The next thing I notice is that I have an A hanging out there all by itself in the sum. That is, all A can possibly be is the carried value from the rest of the sum. With only two rows I can quickly check: what’s the largest value that digit can possibly have? Even if I begin by assuming that A and B are both 9 so their sum is 18, carrying the 1 never gets that above 19, i.e. I still carry the 1. So in fact I carry the 1 all the way out, and A = 1:
  111
+ BBB
____
 111C
I already know I’m carrying 1 all the way out. That means when there’s a 1 in my sum the only possible actual value for that column is 11. 11 minus the carried 1 minus the 1 from the first row leaves 9 as the only possible value of B, which leaves 0 as the only possible value of C:
  111
+ 999
____
 1110
And that’s that. The first four chapters of Gelfand have reinforced commutativity in addition and multiplication and how decimal addition works. These first problems, however, are somewhat deceptively simple: easily expressed, but requiring some thought about all of the constraints at once, and not really lending themselves to solution with a Computer Algebra System (although a constraint solver would make short work of them).
That’s it for this post, because that’s it for chapter 4 of Gelfand. This is definitely basic, but I promised myself to take no shortcuts through the texts. Given that Gelfand eventually gets to the Cauchy-Schwartz inequality, I trust that this initial limbering-up will be worth it. Thanks for reading!

My Curriculum

| Comments

Now that the holidays are well and truly behind us, it’s time to get back in the saddle. Recently, a couple of people have asked me how I arrived at the perspective I have on programming, math, logic and physics, and I suppose we need to add philosophy to the list, because math, logic, physics, and philosophy are all intimately related.
To get the fundamental stuff out of the way: to the extent genes matter, there are some question marks. I’m adopted, and my birth father’s father was a Ph.D. and economist at the World Bank. I’m guessing that helps somewhat. Similarly, my adoptive parents are both teachers. So my advice here is: choose the right parents.
Short of that, there are a couple of propositions that I believe are fundamental:
  • There is such a thing as objective reality.
  • Math/logic describes objective reality and is fun!
Of course, these assertions need some motivation, which I hope to begin to offer here. Going back to the early life theme for a moment, though, I think there’s one formative process I experienced that you can choose to experience, too: games. Game playing is hugely important to a child’s development, and I think we adults often underestimate its enduring power to shape our thought, to say nothing of forming and maintaining family bonds. A few games from my childhood stand out as influential and, I think, developmentally important (disclosure/reminder: I’m an Amazon affiliate; purchasing through these links throws a few cents my way and helps support my reading/writing habit):
When I say “Mouse Trap,” I’m apparently thinking of the revised rules of my childhood in the 1970s. This made Mouse Trap a strategy game, where you first try to build the Rube-Goldberg-inspired trap, then you attempt to maneuver your opponents onto the trap space, a process that can include being willing to sacrifice cheese you’ve earned along the way. It’s silly fun, but with a point about planning ahead, shifting from cooperation to competition, combining luck and skill, and even the concept of opportunity cost. Many modern games could learn from this childhood favorite.
“Clue” is the classic, unabashedly deductive-logic-based board game. The game consists of keeping track of what you know about the murder of Mr. Boddy based on the information you get from cards in your hand and what you learn from the other players over time. This relationship is so direct that there’s literally no point to playing with only two players, because whatever isn’t in your hand is in theirs, and vice-versa. That doesn’t mean the game is trivial, though: in my family of four as a child, one of two of us always wound up winning, and the other two never did, because with four players, the distributed knowledge problem does benefit from a certain predisposition to logical thinking and a gift for keeping who knows what straight.
“Mastermind” is similar to “Clue” in the sense that it’s straight deductive logic with two players. What’s interesting about it is its feedback cycle: the trick is to get new information with each move, minimizing the amount of redundant information you get, since you have a (severely!) limited number of guesses. It’s not surprising that “Mastermind” was designed by an Israeli postmaster and telecommunications expert. My guess is that its designer was quite familiar with Claude Shannon’s Information Theory and Coding Theory.
I can think of no better way to appreciate that math and logic (I dislike continuing to refer to them as if they were separate subjects—they aren’t, and haven’t been since the late 19th century, but even today, at least in America, schools haven’t gotten the memo) are fun than to direct people to Martin Gardner’s work. In the course of writing some 60 titles covering everything from mathematics to philosophy to religion to children’s literature—including his famous The Annotated Alice: The Definitive Edition—and his authorship of the “Mathematical Games” column in Scientific American magazine, he essentially single-handedly created the genre of “recreational mathematics” that became the backdrop of my intellectual childhood. The Colossal Book of Mathematics: Classic Puzzles, Paradoxes, and Problems and The Colossal Book of Short Puzzles and Problems are excellent surveys of Gardner’s recreational mathematical writing, and will keep you occupied for quite some time if you really delve into them.
So with this as background, what self-education curriculum am I laying out for myself to reach my goal of understanding Probability Theory: The Logic of Science? Unfortunately, one thing that’s become clear over the course of my journey so far is that I’m no longer the 18-year-old who had a year of calculus in high school and tested into the “advanced” calculus class in college. Because I haven’t had to manipulate equations every day, my skills have atrophied even more than I realized. So I’m going back a little farther than I had expected to need to, in the interest of achieving real understanding. I also think I want to avoid trying to identify, up front, all of the resources I’m going to rely on, because those may change based on what I learn from what I study first. With those thoughts in mind, here’s what I now expect to tackle first:
  1. Algebra—Recommended by a Twitter follower whose name escapes me now. There’s an informative review here.
  2. A Primer of Infinitesimal Analysis—Calculus isn’t actually complicated. It just seems that way if you base it on the wrong logic. Ironically, insisting on computability (Constructivism) simplifies things.
  3. Structure and Interpretation of Classical Mechanics—Jaynes was a physicist, with an expert grasp of applied mathematics, which he brings to bear on the questions arising in probability. Bell’s calculus text, above, focuses its attention on physics. Here we have an in-depth investigation of classical mechanics and its mathematics from a computational perspective. The book is also freely available as a PDF, and the associated scmutils software is here.
If you’re interested in taking this journey with me, please do at least download and install the scmutils software linked above. You can think of it as MIT Scheme extended to be quite a bit more like Mathematica, allowing you to do things like this:
1 ]=> (define (exp3 x) (expt x 3))
#| exp3 |#

1 ]=> (print-expression (exp3 5))
125
;No return value.

1 ]=> (print-expression ((D exp3) 'a))
( * 3 (expt a 2))
;No return value.

1 ]=> 
In other words, I seem to have found some resources to start acquiring a good computational perspective on applied mathematics with. How all of this relates to “programming is math” and type theory will hopefully be an evolving theme in my writing. For now, though, I’m content to begin with Gelfand’s “Algebra” with scmutils as my computational substrate, which I’ll do in my next post.

Constructive Criticism

| Comments

All unicorns do not have five legs, no matter what Logical Labyrinths says.

I’ve already talked about Coq’s Prop type, the logical connectives ∧ (and), ∨ (or), → (implies), ↔ (bi-implies), and ¬ (not). This gave me what’s called “propositional logic,” and I used it to tackle Problem 1.1 in Logical Labyrinths. Now it’s time to take on ∀ (“for all”) and ∃ (“for some”, or “there exists”). These symbols, called the “quantifiers,” let me move beyond making assertions about concrete things—native A, native B, etc.—into a more abstract realm: all natives, some (arbitrary) native, etc. called “first-order logic.” The quantifiers introduce power and subtlety: power, because I can now talk about possible things logically; subtlety, because the quantifiers introduce infinities, and reasoning in the presence of infinities is error-prone, to put it mildly. This is especially true in the presence of negation. For example, what is Smullyan’s rationale for saying all unicorns have five legs?

The only way the above sentence can be falsified is to exhibit at least one unicorn who doesn’t have five legs, which is not possible, since there are no unicorns. It is also true that all unicorns have six legs! Anything one says about all unicorns is vacuously true.

The problem is that Smullyan is right: this really is how “classical logic” works. But isn’t logic supposed to help keep us from talking nonsense? Particularly obvious nonsense? So why is “All unicorns have five legs” a problem? Let’s look closely at what Smullyan says: “The only way the above sentence can be falsified…” Wait, whoa, hold the phone! I’m not trying to falsify anything. I want to prove something. So it seems to me like I’m already off to a rocky start. “…is to exhibit at least one unicorn who doesn’t have five legs…” That is, as far as classical logic is concerned, \(\forall x. P[x] \leftrightarrow \neg \exists x. \neg P[x]\), “a universally-quantified proposition is true if and only if a counterexample to the proposition doesn’t exist.” But I can’t produce a counterexample for the same reason I can’t produce an example: unicorns don’t exist. Talking about how many legs they have is meaningless.

The fact that Smullyan wants to falsify the sentence, tacitly assuming its truth, should be more shocking than it seems to be in logical circles. Unfortunately, “classical logic” is called “classical” for a reason. Its principles go back to the likes of Euclid and Aristotle. One key such principle is \(P \lor \neg P\), and it’s so key that it has a name: the “Law of the Excluded Middle,” or “Tertium non Datur,” literally “the third is not given.” From the classical perspective, a proof of \(P\) can just as easily—often, more easily, which is a major source of the problem—be a proof of \(\neg\neg P\), “it’s a contradiction for P not to be true.” This all sounds harmless enough, but as I hope I’m demonstrating, it isn’t: it’s very problematic when the set of possible things to talk about is empty. It’s also problematic when the set of possible things to talk about is infinitely large. For example, consider \(\mathbb{N}\), the natural numbers, starting with 0. To prove \(\forall x. P[x]\) where \(x \in \mathbb{N}\), it suffices to prove the proposition for some \(n \in \mathbb{N}\) and then for \(n + 1 \in \mathbb{N}\) given \(P[n]\); this is called “mathematical induction,” and I have no doubt I’ll be using that, and other forms of induction, a lot in the future. To prove \(\exists x. P[x]\) where \(x \in \mathbb{N}\), though, all I can do is start with 0, see if \(P\) holds and, if not, try 1, then 2, then 3… until I either find a value where \(P\) holds or I keep looking forever.

And now you know why first-order logic is only semi-decidable (h/t @marchamann for the excellent suggestion to use \(\mathbb{N}\) as an easy example). Needless to say, I’m not the first person to notice that empty and infinite sets cause problems in classical logic. The school of mathematical logic that rejects the Law of the Excluded Middle and a related axiom from set theory, the Axiom of Choice, is called Constructivism. The philosophical foundation of Constructivism is that there’s no sound reason to trust a conclusion that doesn’t follow from a finite number of discrete steps in finite time. Here’s Section 5.1, “Constructive Reasoning,” from val Dalen’s “Logic and Structure” on the subject:

To adapt Hermann Weyl’s phrasing: we are used to think of infinite sets not merely as defined by a property, but as a set whose elements are so to speak spread out in front of us, so that we can run through them just as an officer in the police office goes through his file. This view of the mathematical universe is an attractive but rather unrealistic idealization. If one takes our limitations in the face of infinite totalities seriously, then one has to read a statement like “there is a prime number greater than \(10^{{10}^{10}}\)” in a stricter way than “it is impossible that the set of primes is exhausted before \(10^{{10}^{10}}\)”. For we cannot inspect the set of natural numbers in a glance and detect a prime. We have to exhibit a prime \(p\) greater than \(10^{{10}^{10}}\).

Paul Taylor’s “Practical Foundations of Mathematics,” which has a thoroughgoing emphasis on the relationship between mathematics and computation, is simultaneously even more stringent and, somehow, more playful. Taylor drives the Constructive stake into the ground in the very first chapter; I find the following on page 18:

It is similarly wrong to introduce putative things such as unicorns and then treat existence as a property of them.

Taylor doesn’t have much patience with the classical crowd:

Some Real Mathematicians use [proof by contradiction, or “reductio ad absurdum”] habitually, starting every proof with “suppose not” (“are you calling me a liar?”) without even attempting a positive approach. Many treatments of logic itself (rather than difficult applications) regrettably do not prove things, but refute their negations instead.

Finally, Taylor again, to bring it back home to my concerns about Chapter 12 of “Logical Labyrinths:”

Theorem 1.8.3: Negation is a duality in classical logic, interchanging ⊤ with ⊥, ∧ with ∨ and ∀ with ∃. That is,

\(\neg (\alpha \lor \beta) \iff (\neg \alpha) \land (\neg \beta)\)\(\neg \exists x.\phi [x] \iff \forall x. \neg \phi [x]\)
\(\neg (\alpha \land \beta) \iff (\neg \alpha) \lor (\neg \beta)\)\(\neg \forall x. \phi [x] \iff \exists x. \neg \phi [x]\)

The equations on the top row are valid without excluded middle. Those in the left column are called de Morgan’s Laws, although they were well known to William of Ockham. Excluded middle also makes any implication \(\alpha \implies \beta\) equivalent to its contrapositive, \(\neg \beta \implies \neg \alpha\).

Hmmm. That bi-implication on the lower right sounds familiar. Sure enough, if I just move the negation from the universal quantification to the existential (or “negate both sides,” if that feels more comfortable), I end up with the equivalence Smullyan uses to justify “All unicorns have five legs.” That justification relies on the Law of the Excluded Middle. “All unicorns have five legs” is nonsense. I want to avoid nonsense. I therefore reject the Law of the Excluded Middle (and the Axiom of Choice; this post is already too long to go into that). As a kind of added bonus (but not really; it’s very much part of the point), Constructive proofs are inherently computable.

I am, therefore, a Constructivist, and this series will be at great pains to be clear about when I adhere to those principals, which Coq also does by default, and when I will feel obliged to violate them, as may happen in particularly thorny areas of Analysis and/or be motivated by the existence of tools in the classical vein, whose utility may prove to be too high to forego.

More on Propositions

| Comments

I had intended to move on from propositional logic in this post, but I still feel bad about doing such a poor job the first time that I want to continue to take advantage of Adam Chlipala’s wonderful formalization and revisit the difference between Adam’s use of the tauto automation tactic and the traditional (for Coq) manual use of tactics. Here’s Adam’s formalization again:

Section Knights_N_Knaves.

Variable native : Set.

Variables A B C : native.

Variable Knight : native -> Prop.

Definition _says (n : native) (P : Prop) := Knight n <-> P.
Infix says” := _says (at level 95).

Hypothesis B_says : B says (A says ~Knight A).
Hypothesis C_says : C says ~Knight B.

Theorem C_is_a_Knight : Knight C.
unfold _says in *; tauto.

Proof completed.
Most of the time, though, I won’t be working in decidable fragments of Coq’s logic, so it’s important to get a handle on how to use Coq as a proof assistant. I haven’t said Qed yet, so I can…

Restart.

1 subgoal
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  B_says : B says (A says ~ Knight A)
  C_says : C says ~ Knight B
  ============================
   Knight C
Whenever I’m sitting there looking at a goal in Coq and feeling completely at sea, I know I need to focus on simplifying my goal and hypotheses as much as possible. I know _says is a Definition with some important logical content, so let’s make that content explicit by unfolding the definition, just like Adam did:

unfold _says in *.

1 subgoal
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  B_says : Knight B <-> (Knight A <-> ~ Knight A)
  C_says : Knight C <-> ~ Knight B
  ============================
   Knight C
The “*” in unfold _says in *. means “both the goal and the hypotheses.” The ~ is just syntactic sugar for not, but not itself is a Definition for -> False, so I want to make that explicit too:

unfold not in *.

1 subgoal
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  B_says : Knight B <-> (Knight A <-> Knight A -> False)
  C_says : Knight C <-> Knight B -> False
  ============================
   Knight C
Realistically, this is simple enough to work with easily, but just to make everything as clear as possible, let me point out again that <-> is also syntactic sugar for a conjunction (/\) of two ->, one in each direction. Anytime there’s a conjunction in a hypothesis, you can split the branches into separate hypotheses with destruct(ure):

destruct B_says.
destruct C_says.

1 subgoal
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  B_says : Knight B <-> (Knight A <-> Knight A -> False)
  C_says : Knight C <-> Knight B -> False
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  ============================
   Knight C
A word about context management: in real developments of any scale, your context can grow enough hypotheses, some of which are just restatements of other hypotheses such as I’ve just created here, that things get pretty unwieldy. Since I’ve just used destruct to destructure B_says and C_says, I don’t need them lying around anymore:

clear B_says.
clear C_says.

1 subgoal
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  ============================
   Knight C
I still have some <-> lying around, but only inside parenthesized terms. In particular, my goal is Knight C and I see that that’s the conclusion of H2. Whenever my goal is the conclusion of one of my hypotheses, I just:

apply H2.

1 subgoal
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  ============================
   Knight B -> False
Hopefully it’s obvious what just happened: I effectively said “I can prove Knight C from hypothesis H2,” and Coq is saying “to do that, you’ll have to prove the premise of H2, namely that Knight B -> False.” Now you see why I like to unfold not.: it reminds me that ~ P is actually an implication. With an implication as a goal, I:

intro.

1 subgoal
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  H3 : Knight B
  ============================
   False
To “prove False” means to show a contradiction. (Knight A <-> Knight A -> False) sure looks like a contradiction to me. Even better, it follows from Knight B, which I just introduced as a hypothesis! The plan of attack is clear. But (Knight A <-> Knight A -> False), while obviously equivalent to False, is not False, so I can’t just apply H. Instead, it seems clear that I can prove False from (Knight A <-> Knight A -> False), and I know I can prove Knight B when Coq tells me I need to. Anytime I can prove my goal from the conclusion of a hypothesis, I:

elim H.

2 subgoals
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  H3 : Knight B
  ============================
   (Knight A -> Knight A -> False) ->
   ((Knight A -> False) -> Knight A) -> False

subgoal 2 is:
 Knight B
Coq has done a few handy things here: first, it made H part of my goal. Second, it destructured the <-> into its two implications for me. Third, it added the fact that I need to prove Knight B as a second subgoal. OK, I once again have implications, two of them, in my goal, so:

intros.

2 subgoals
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  H3 : Knight B
  H4 : Knight A -> Knight A -> False
  H5 : (Knight A -> False) -> Knight A
  ============================
   False

subgoal 2 is:
 Knight B
Nice. Now I have those tasty contradictions where I can work with them. In fact, I now am again in that happy state where my goal is the conclusion of one of my hypotheses:

apply H4.

3 subgoals
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  H3 : Knight B
  H4 : Knight A -> Knight A -> False
  H5 : (Knight A -> False) -> Knight A
  ============================
   Knight A

subgoal 2 is:
 Knight A
subgoal 3 is:
 Knight B
That same situation holds again, so:

apply H5.

3 subgoals
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  H3 : Knight B
  H4 : Knight A -> Knight A -> False
  H5 : (Knight A -> False) -> Knight A
  ============================
   Knight A -> False

subgoal 2 is:
 Knight A
subgoal 3 is:
 Knight B
Another implication:

intro.

3 subgoals
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  H3 : Knight B
  H4 : Knight A -> Knight A -> False
  H5 : (Knight A -> False) -> Knight A
  H6 : Knight A
  ============================
   False

subgoal 2 is:
 Knight A
subgoal 3 is:
 Knight B
By now you should be developing an intution as to why tactic-based theorem-proving with Coq is sometimes said to feel like playing a video game: you keep shooting at targets until the boss shows up for you to annihilate:

apply H4.

4 subgoals
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  H3 : Knight B
  H4 : Knight A -> Knight A -> False
  H5 : (Knight A -> False) -> Knight A
  H6 : Knight A
  ============================
   Knight A

subgoal 2 is:
 Knight A
subgoal 3 is:
 Knight A
subgoal 4 is:
 Knight B
This time, my goal isn’t a conclusion of one of my hypotheses; it’s simply one of my hypotheses. For that, I use:

exact H6.

3 subgoals
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  H3 : Knight B
  H4 : Knight A -> Knight A -> False
  H5 : (Knight A -> False) -> Knight A
  H6 : Knight A
  ============================
   Knight A

subgoal 2 is:
 Knight A
subgoal 3 is:
 Knight B
If you don’t pay close attention, you might miss the fact that, although it doesn’t look like my goal changed, that’s because I used to have 4 subgoals, and now I have 3. It’s just that subgoal 1 and subgoal 2 were the same, so:

exact H6.

2 subgoals
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  H3 : Knight B
  H4 : Knight A -> Knight A -> False
  H5 : (Knight A -> False) -> Knight A
  ============================
   Knight A

subgoal 2 is:
 Knight B
Hey, where’d H6 go? Think of the proof state (hypotheses plus goals) as a stack. Some tactics push stuff onto the stack; some pop stuff off the stack. Now think of the context as a collection of local variables: what’s bound and what’s not depends on where you are in the stack. Those two exact H6 tactics popped two goals from the stack—nice progress—but brought me back to a point in the stack where H6 isn’t bound anymore. Thankfully, I can get there again just like I did before:

apply H5.
intro.
apply H4.
exact H6.
exact H6.

1 subgoal
  
  native : Set
  A : native
  B : native
  C : native
  Knight : native -> Prop
  H : Knight B -> (Knight A <-> Knight A -> False)
  H0 : (Knight A <-> Knight A -> False) -> Knight B
  H1 : Knight C -> Knight B -> False
  H2 : (Knight B -> False) -> Knight C
  H3 : Knight B
  ============================
   Knight B
I’m finally back to the easy bit:

exact H3.

Proof completed.
There. Now I’ve laboriously done what tauto did for me automatically.

But no one should be happy that there’s some backtracking and repetition in this script. Let’s see if I can do just a little better. I’ll group some stuff and then see if I can eliminate the repetition:

Restart.
unfold _says in *; unfold not in *; destruct B_says as [kBkAkAF kAkAFkB]; destruct C_says as [kCkBF kBFkC]; clear B_says; clear C_says;

apply kBFkC.
intro kB.
elim kBkAkAF.
intros kAkAF kAFkA.

apply kAkAF;

apply kAFkA;
intro kA;
apply kAkAF;
exact kA;
exact kA.

exact kB.

Proof completed.
Since I’ve already drawn the analogy between Coq’s context and local variables, I should add that letting Coq automatically come up with names like H, H0… first of all doesn’t help legibility, and secondly makes proof scripts quite fragile. In this script, I’ve used the ability to provide names for the destruct, intro, and intros tactics. I’ve also introduced the ; “tactical,” by analogy with “functional” in its somewhat arcane sense of “higher-order function.” That is, ; turns the tactic it follows into a higher-order tactic: the tactics that follow it are applied to all subgoals in the stack. I found myself in a situation, after apply kAkAF. for the first time, where I had the same goal in the stack twice, with the same context. Turning apply kAkAF; into a tactical with ; made it possible to write the five tactics to handle that goal in that context once instead of twice.

Oh, and my naming convention gave me a “kAFkA” contradiction as an added bonus.

Qed.

End Knights_N_Knaves.

I feel better. Adam’s formalization really expresses the problem crisply (and, more importantly, without begging the question) and serves as what I had intended my propositional logic post to be, namely a nice little puzzle example to use to contrast tactic-based proof with automated proof while having a little fun.

Next time, I’ll finally move on to the quantifiers, and expect to take a somewhat lengthy detour into classical vs. intuitionistic logic.

Errata 1

| Comments

This is just a brief post to point out that Adam Chlipala has done a much nicer formalization of Logical Labyrinths Problem 1.1 than mine. It sticks to Constructive logic; it doesn’t assume “True” for the proposition that anyone is a Knight; it states what B and C say as hypotheses; and it uses Coq’s support for user-defined syntax and precedence to make it all look good. I’m just going to put it out without further ado:

Section Knights_N_Knaves.

Variable native : Set.

Variables A B C : native.

Variable Knight : native -> Prop.

Definition _says (n : native) (P : Prop) := Knight n <-> P.
Infix says” := _says (at level 95).

Hypothesis B_says : B says (A says ~Knight A).
Hypothesis C_says : C says ~Knight B.

Theorem C_is_a_Knight : Knight C.
unfold _says in *; tauto.
Qed.

C_is_a_Knight is defined
This is more properly what a Coq development looks like. Thanks, Adam!

Propositional Logic in Coq

| Comments

As promised, I’m going to talk a little about simple propositional logic in the Coq proof assistant. Remember the first puzzle from Smullyan’s Logical Labyrinths?

PROBLEM 1.1 (A CLASSIC CASE). On the day of his arrival, Abercrombie came across three inhabitants, whom we will call A, B, and C. He asked A: “Are you a knight or a knave?” A answered, but so indistinctly that Abercrombie could not understand what he said. He then asked B: “What did he say?” B replied: “He said that he is a knave.” At this point, C piped up and said: “Don’t believe that; it is a lie!”
Was C a knight or a knave?

How would we formalize this puzzle in Coq?

First of all, Coq’s logic is what’s called “Constructive.” I won’t spend a lot of time on what that means now, but one of the implications is that Coq doesn’t assume all propositions are true or false—that is, \(P \lor \neg P\) isn’t an axiom. Another is that \(\neg \neg P \Rightarrow P\) doesn’t necessarily hold, either. Smullyan works in the classical tradition, so I’ll ask Coq to do the same.

Require Import Classical.

Coq has a number of scoping mechanisms. One of them, “Section,” follows the formal mathematical publishing tradition in its name. It’s an easy way to put bounds around things we want in our context when we start to construct a proof. Since I’m tackling Smullyan’s Logical Labyrinths Problem 1.1, I’ll name my section…

Section Smullyan_1_1.

The problem stipulates that we have natives. We can think of them as a set, mathematically speaking. From that set of natives, we have individual natives A, B, and C. Coq’s syntax for saying “the type of X is Y” is, like Pascal’s, the ML family’s, and Scala’s, X : Y (or X: Y, or…)

Variable Native : Set.

Variable A : Native.
Variable B : Native.
Variable C : Native.

Knights are always True. I won’t bother defining Knave, since Knaves always lie and that’s just ~ Knight. Coq’s syntax for definitions is also very Pascal-like: there’s an argument list with its type, and := to indicate the value of the definition. In this case, it’s just the constant True.

Definition Knight(X : Native) := True.

An important note: “True” is a Prop, not a bool! We’re dealing with logic here, not computation!

Check Knight.

Knight
     : Native -> Prop
OK, yes, I chuckled typing “Check Knight,” for all you chess fans.

Now I have to actually think about the problem. Abercrombie asked A which he is, a Knight or a Knave. A answered, but Abercrombie doesn’t know how, so neither do I. But there are only two ways this can go: since A is a native, he is, in fact, either a Knight or a Knave, and Knights always tell the truth and Knaves always lie. So either A is a Knight and truthfully said he is a Knave (a contradiction), or A is a Knave and dishonestly said he is a Knave (also a contradiction). Let’s capture that in a definition:
<-> reads as “logically equivalent,” “iff (if and only if),” or “bi-implication,” depending on how formal you want to sound. Since the natives state Propositions, I can use <-> to signify “says.” So either A is a Knight who says he’s not a Knight, or he’s a Knave who dishonestly says (<-> ~ or “not logically equivalent”) he’s a Knave.

Intuitively, I’m sure you’re already thinking “we already know the left side of the ‘or’ is false, and we already know the right side of the ‘or’ is false, and false or false = false. Why bother with this?”

Two reasons:

  1. The process of spelling out the logical situation made clear that “A said he’s a Knave” is a contradiction.
  2. I’m not looking for intuition. I’m looking for a proof.
Besides, A isn’t even the native we’re interested in, according to the problem. C is. But then, if “A said he’s a Knave” is a contradiction, that means B is lying, which means he’s not a Knight. C says B is lying, which means C is telling the truth, which means he’s a Knight. So that’s the theorem I want to prove. Putting it all together:
1 subgoal
  
  Native : Set
  A : Native
  B : Native
  C : Native
  ============================
   (A_claims_knavehood <-> ~ Knight B) <-> Knight C
Game on.

Because I’m in a Section, the Variables I declared are part of the context, which may or may not be helpful. The goal, of course, is the theorem I want to prove. For this post, I’m going to start by doing this proof in the traditional way in Coq: trying a series of tactics to iteratively simplify the goal until there’s no longer a goal at all.

Formal logic is all about syntax. Coq doesn’t actually “know” what A_claims_knavehood means, so the first thing I want to try is replacing the name with its definition. That’s done with the “unfold” tactic.

unfold A_claims_knavehood.

1 subgoal
  
  Native : Set
  A : Native
  B : Native
  C : Native
  ============================
   ((Knight A <-> ~ Knight A) \/ (~ Knight A <-> ~ ~ Knight A) <-> ~ Knight B) <->
   Knight C
So this makes the whole logical structure explicit, if you think of “Knight X” as “truth.” But Knight is just a definition, too, so:

unfold Knight.

1 subgoal
  
  Native : Set
  A : Native
  B : Native
  C : Native
  ============================
   ((True <-> ~ True) \/ (~ True <-> ~ ~ True) <-> ~ True) <-> True
Now the logical structure of the problem is very explicit! But believe it or not, it could be still more explicit. Negation in Coq is just syntactic sugar for “implies False.” “~” is just notation for the definition “not,” which I can also unfold:

unfold not.

1 subgoal
  
  Native : Set
  A : Native
  B : Native
  C : Native
  ============================
   ((True <-> True -> False) \/ (True -> False <-> (True -> False) -> False) <->
    True -> False) <-> True
Finally, I’m down to nothing but True, False, implication, bi-implication, and “or.” What to try next takes a bit of parsing. The way the goal is parenthesized, I see that the outermost connective is <->, bi-implication. A <-> B is shorthand for (A -> B) /\ (B -> A). In other words, there’s an “/\” (“and,” “conjunction”) under the surface. To prove a conjunction, we have to prove each side separately. To do that, I need to say:

split.

2 subgoals
  
  Native : Set
  A : Native
  B : Native
  C : Native
  ============================
   ((True <-> True -> False) \/ (True -> False <-> (True -> False) -> False) <->
    True -> False) -> True

subgoal 2 is:
 True ->
 ((True <-> True -> False) \/ (True -> False <-> (True -> False) -> False) <->
  True -> False)
Coq helpfully puts the 2nd case I need to prove on the subgoal stack and leaves me with the first implication to prove. Implications go from one or more hypotheses to a conclusion. I’m allowed to introduce the hypothesis as an assumption, effectively saying “If the hypothesis were true…” which is, after all, one way to interpret “implication.” So I’ll do that.

intro.

2 subgoals
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : (True <-> True -> False) \/
      (True -> False <-> (True -> False) -> False) <-> 
      True -> False
  ============================
   True

subgoal 2 is:
 True ->
 ((True <-> True -> False) \/ (True -> False <-> (True -> False) -> False) <->
  True -> False)
That gives me a pretty hairy hypothesis, which Coq has named “H” by default, but it also leaves me in the enviable position of needing to prove True!

trivial.

1 subgoal
  
  Native : Set
  A : Native
  B : Native
  C : Native
  ============================
   True ->
   ((True <-> True -> False) \/ (True -> False <-> (True -> False) -> False) <->
    True -> False)
Yikes. True implies the hairball. Well, there’s nothing else for it:

intro.

1 subgoal
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : True
  ============================
   (True <-> True -> False) \/ (True -> False <-> (True -> False) -> False) <->
   True -> False
Once again, I see that the outermost connective is a bi-implication that hides a conjunction, so:

split.

2 subgoals
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : True
  ============================
   (True <-> True -> False) \/ (True -> False <-> (True -> False) -> False) ->
   True -> False

subgoal 2 is:
 (True -> False) ->
 (True <-> True -> False) \/ (True -> False <-> (True -> False) -> False)
My subgoal now takes a little closer reading. I have two hypotheses, the compound one and True, that imply False. There’s a variant of “intro” for introducing more than one hypothesis at once. Unsurprisingly, it’s called:

intros.

2 subgoals
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : True
  H0 : (True <-> True -> False) \/
       (True -> False <-> (True -> False) -> False)
  H1 : True
  ============================
   False

subgoal 2 is:
 (True -> False) ->
 (True <-> True -> False) \/ (True -> False <-> (True -> False) -> False)
To prove False, I need a contradiction. I have two of them squirreled away in H0, but I need to pull out at least one of them to use. The outermost connective in H0 is a disjunction. There’s a very handy tactic, “destruct,” that “destructures” terms and adds new variables to the context according to some pattern. Since I want to destructure a disjunction, or “or,” the pattern syntax winds up looking a lot like a regular expression for this-or-that:

destruct H0 as [H2 | H3].

3 subgoals
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : True
  H2 : True <-> True -> False
  H1 : True
  ============================
   False

subgoal 2 is:
 False
subgoal 3 is:
 (True -> False) ->
 (True <-> True -> False) \/ (True -> False <-> (True -> False) -> False)
Now I still don’t have a straightforward contradiction, True -> False, in my context, but I do have something that implies False, which is my goal. Anytime I have something that implies my goal, I can just apply it:

apply H2.

4 subgoals
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : True
  H2 : True <-> True -> False
  H1 : True
  ============================
   True

subgoal 2 is:
 True
subgoal 3 is:
 False
subgoal 4 is:
 (True -> False) ->
 (True <-> True -> False) \/ (True -> False <-> (True -> False) -> False)
Once more, I get to prove True:

trivial.

3 subgoals
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : True
  H2 : True <-> True -> False
  H1 : True
  ============================
   True

subgoal 2 is:
 False
subgoal 3 is:
 (True -> False) ->
 (True <-> True -> False) \/ (True -> False <-> (True -> False) -> False)
Again:

trivial.

2 subgoals
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : True
  H3 : True -> False <-> (True -> False) -> False
  H1 : True
  ============================
   False

subgoal 2 is:
 (True -> False) ->
 (True <-> True -> False) \/ (True -> False <-> (True -> False) -> False)
I have a hypothesis that implies False again, so:

apply H3.

3 subgoals
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : True
  H3 : True -> False <-> (True -> False) -> False
  H1 : True
  ============================
   (True -> False) -> False

subgoal 2 is:
 True
subgoal 3 is:
 (True -> False) ->
 (True <-> True -> False) \/ (True -> False <-> (True -> False) -> False)
Finally, the goal itself is a contradiction, so:

contradiction.

2 subgoals
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : True
  H3 : True -> False <-> (True -> False) -> False
  H1 : True
  ============================
   True

subgoal 2 is:
 (True -> False) ->
 (True <-> True -> False) \/ (True -> False <-> (True -> False) -> False)
I’m back to my favorite subgoal!

trivial.

1 subgoal
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : True
  ============================
   (True -> False) ->
   (True <-> True -> False) \/ (True -> False <-> (True -> False) -> False)
OK, all I have left is a disjunction with a blindingly obvious contradiction on the left-hand side. Since I only have to prove one side of an “or,” I get to pick which one I want to prove.

left.

1 subgoal
  
  Native : Set
  A : Native
  B : Native
  C : Native
  H : True
  H0 : True -> False
  ============================
   True <-> True -> False
True is equivalent to True implies False? I don’t think so.

contradiction.

Proof completed.
And finally, that’s the ball game. C is indeed a Knight.

Boy, was that tedious, though, right? Scroll back through this page. Now keep in mind that unless someone is interacting with Coq, they aren’t going to see the contexts and subgoals. Just reading the theorem and the proof script I wrote looks like this:

Require Import Classical.

Section Smullyan_1_1.

Variable Native : Set.

Variable A : Native.
Variable B : Native.
Variable C : Native.

Definition Knight(X : Native) := True.

Definition A_claims_knavehood := (Knight A <-> ~ Knight A) \/ (~ Knight A <-> ~ ~ Knight A).

Theorem C_is_knight : (A_claims_knavehood <-> ~ Knight B) <-> Knight C.
unfold A_claims_knavehood.
unfold Knight.
unfold not.
split.
intro.
trivial.
intro.
split.
intros.
destruct H0 as [H2 | H3].
apply H2.
trivial.
trivial.
apply H3.
contradiction.
trivial.
left.
contradiction.
Qed.

End Smullyan_1_1.
No wonder people think proving theorems is difficult, tedious, and not worth it. Blech.

In this post, I haven’t actually said “Qed” yet, so Coq hasn’t defined the proof of the theorem. Because of that, I can bail on the whole development:

Restart.

1 subgoal
  
  Native : Set
  A : Native
  B : Native
  C : Native
  ============================
   (A_claims_knavehood <-> ~ Knight B) <-> Knight C
Now I’m literally back at the beginning. What I said about Coq not knowing how to interpret A_claims_knavehood is still true, so:

unfold A_claims_knavehood.

1 subgoal
  
  Native : Set
  A : Native
  B : Native
  C : Native
  ============================
   ((Knight A <-> ~ Knight A) \/ (~ Knight A <-> ~ ~ Knight A) <-> ~ Knight B) <->
   Knight C
The thing to realize now is that all I have is the constant function “Knight,” which always returns True; implications; conjunctions; disjunctions; and negations. In other words, I have what’s called a “propositional tautology.” Computationally, propositional logic is decidable. That means “so simple, even a computer can figure it out.”

tauto.

Give it a few seconds, and:

Proof completed.
Qed.

C_is_knight is defined
Coq’s “tauto” tactic solves propositional tautologies with no further effort on my part. It’s a nice example of the benefit of proof automation in a proof assistant like Coq. For propositional logic, it can do the whole job. As this series delves more into predicate logic, analysis, etc. the limits of automation will show up. I’ll also have to make some hard judgment calls about when to use straight tactics, when to use Coq’s Mathematical Proof Language, and when to use automation. But for now, I hope I’ve whet your appetite with a little puzzle, a little formalization, proving the theorem the hard way, and proving it the easy way, in Coq.

Labyrinth

| Comments

“And you, Sarah. How are you enjoying my labyrinth?” “It’s a piece of cake!”

First, thanks to the readers who have commented on my first post or let me know on Twitter or via e-mail that they’re following this series. That’s enormously gratifying, and makes the writing and the research that goes into it a lot more fun.

One thing I’ve been reminded of in receiving this feedback is that relatively few people have had any exposure to formal logic, so blithely diving right into ε-δ proofs from an analysis text is going to leave an awful lot of people behind. So I have to back up and talk a little bit about logic. For most purposes, I expect to draw from Logical Labyrinths (hence the section title) and Logic and Structure, but might occasionally touch on material from other sources that I’ll call out as necessary.

What is Logic?

To answer that question, I’ll turn to van Dalen:

Traditionally, logic is said to be the art (or study) of reasoning; so in order to describe logic in this tradition, we have to know what “reasoning” is. According to some traditional views reasoning consists of the building of chains of linguistic entities by means of a certain relation “… follows from …”, a view which is good enough for our present purpose. The linguistic entities occurring in this kind of reasoning are taken to be sentences, i.e. entities that express a complete thought, or state of affairs.

These sentences are generally called “propositions.” The defining characteristic of propositions is that they are, at least in principle, either true or false. You might not know whether any given proposition is true or false—if you did, you wouldn’t need either deduction or induction to deal with it—but a truth value can be ascribed to it. To borrow examples from my previous post, “Christopher Columbus discovered America in 1492” is a proposition, as is “It will rain today.” Each can be ascribed a truth value, true or false. How to ascribe truth values to each of them might be rather mysterious in practice. By “discovered,” do we mean “was the first human being to sight or set foot on” or do we only mean “was unaware of the existence of prior to arriving?” By the former definition, it seems a reasonable person would ascribe the truth value “false” to “Christopher Columbus discovered America in 1492” (millenia after the Native Americans did!) whereas by the latter definition the same reasonable person would likely ascribe the truth value “true” to the proposition.

The “It will rain today” case is even trickier. So much so that I won’t talk further about its truth value, but I will talk, as is usual for weather, about its probability in a future post.

So what are the rules that allow us to infer that “… follows from …?” In a stroke of originality, they’re called “inference rules.” They have traditional names, such as the one I mentioned in my first post of the series, modus ponens: Here, the uppercase variables refer to propositions, the funny tent reads as “and,” and the double-shafted right arrow reads as “implies.” So this rule literally says “If you know both that P implies Q and also P, then you can infer Q.” It might seem a little silly to spell this out in such detail, and in some ways it is—Lewis Carroll famously poked fun at this spelling-out in What the Tortoise Said to Achilles—but remember, the point is to be very sure of the soundness of our reasoning. We generally have to start with some information that is sufficiently simple and clear to be accepted as true without proof, called “axioms,” as I also referred to in my first post in this series. A statement we wish to prove is called a “theorem,” and in the process of proving it we may also prove some subsidiary results that can be used in the larger proof, called “lemmas.”

But all this terminology can make logic sound atomistic, dry, and boring, to which Logical Labyrinths is a nice antidote. It begins as a puzzle book, but eventually covers propositional and predicate logic, infinity, etc. formally, so it’s an appealing source for my work here. Let’s begin at the beginning. Smullyan’s protagonist, Abercrombie, is visiting the Island of Knights and Knaves, where knights always tell the truth and knaves always lie:

PROBLEM 1.1 (A CLASSIC CASE). On the day of his arrival, Abercrombie came across three inhabitants, whom we will call A, B, and C. He asked A: “Are you a knight or a knave?” A answered, but so indistinctly that Abercrombie could not understand what he said. He then asked B: “What did he say?” B replied: “He said that he is a knave.” At this point, C piped up and said: “Don’t believe that; it is a lie!”
Was C a knight or a knave?

The first challenge might be believing it’s possible to answer the question! But let me assure you it is. I’m not going to just give the answer, at least not yet. Think about it, discuss it in the comments, buy the book, etc. Eventually I’ll probably work through it. The point I want to make right now is just that it’s a logic puzzle and fun to reason through. But while it’s true that logic is fun, it’s also true that it’s serious business, with some proofs of important theorems coming centuries after the question was first posed, and with proof automation playing a crucial role.

Which brings us to Coq.

I Promise Coq Isn’t Foul

Coq is a “proof assistant,” or “interactive theorem prover.” The problem with automating logic, essentially, is: you can’t, at least not for any interesting logic, at least not completely. What you can do is help the human logician construct a proof of a theorem, provide partial automation support where possible, and—here’s the kicker—check the proof when it’s done. Finding the proof may be undecidable or at least take a long time. Checking it is cheap and relatively fast. Coq is a formidable tool, having been used to construct the proofs of the four-color map theorem and Fermat’s last theorem, but I’m going to try to keep its use simple in this series, striking a balance between spelling proofs out to help build intuition and taking advantage of automation when the details are just too much. Being an old EMACS user, I’ll also use Proof General, an EMACS-based UI, for interacting with Coq.

First, lets’s show off how easy Coq can be.

Theorem one_and_one_is_two : 1 + 1 = 2.

As soon as I type the carriage return after the period (the way I have Proof General configured), I get another window in EMACS that says:

1 subgoal
  
  ============================
   1 + 1 = 2
I’ve told Coq I want to prove that 1 + 1 = 2. Don’t laugh; it took Russell and Whitehead over 400 pages across two volumes to do so. When you tell Coq you want to prove something, it helpfully tells you what your current subgoal is and lets you tell it what to do in order to prove it. The space above the double line is the “context,” and can have facts, hypothesis, etc. relevant to the current goal. For this problem, we don’t have any, just the goal.

Thankfully, for Coq, this problem is

trivial.

In the other window, Coq happily tells me:

Proof completed.
And now I can join professors the world over and say

Qed.

Coq’s reaction to that is:

one_and_one_is_two is defined
This means I could use this important theorem in constructing other proofs, if I so desired.

But what happened? What did it mean when I typed “trivial.”? How did Coq just decide that proves 1 + 1 = 2 when it took Russell and Whitehead those 400+ pages to do the same thing?

“trivial” is called a “tactic,” and Coq belongs to the family of tactics-based proof assistants. The combination of context and subgoal I described is called a “proof state.” Tactics try to simplify a proof state, with the simplest possible proof state, of course, being one with no subgoal. You can think of a tactic as being a little program that tries to change the proof state to be “closer” the goal, and in fact that’s exactly what they are. You can even write your own in Coq’s Ltac tactic language or in OCaml, the language Coq is written in.

Coq has a rather sophisticated notion of computation built into it, which I won’t go into any detail on here. Suffice it to say that Coq interprets “1” and “2” as natural numbers by default and can do computation with them in a way that it knows to be logically sound. When I said “trivial,” one of the things Coq did was to actually add 1 + 1 soundly according to the Peano axioms for the natural numbers. Then it noticed that I had essentially asked for a proof that 2 = 2, which it was able to discharge based on the reflexivity of equality. This ability of Coq’s to blend computational operations and logical ones will come in quite handy later on.

But I’m getting ahead of myself again. Back to the basics! Propositions! I propose to continue discussing propositional logic in Coq in my next post.

Jaynes and Boolean Algebra

| Comments

I simultaneously got behind and ahead of myself with my last post. Thankfully, some folks have responded to it, and it’s become clear that I need to spend a bit more time talking about logic and Coq rather than just diving right into problems in analysis. Before even doing that, thought, I want to address this tweet from Raul Miller. On my urging, Raul’s taken the challenge to pursue Jaynes, and has posted an initial sharply-observed critique of some material from the first chapter:

“Good mathematicians see analogies between theorems; great mathematicians see analogies between analogies”

“To state these ideas more formally, we introduce some notation of the usual symbolic logic, or Boolean algebra, so called because George Bool (1854) introduced a notation similar to the following… These symbols are only a shorthand way of writing propositions, and do not stand for numerical values.”

The sentence which bothers me is that last sentence that I quoted, and it bothers me because of the first sentence that I quoted.

Here’s the thing: in George Boole’s work, these statements represent numerical results. George Boole started out by expressing his concepts in terms of axioms, but those axioms have a well understood correspondence to numerical operations. Boolean multiplication is the operation we know as “Least Common Multiple” and Boolean addition is the operation we know as “Greatest Common Denominator”… So, possibly he was saying that boolean algebra does not represent numbers to avoid confusion between the notation he is going to introduce and the notation he is using here.

Raul’s right: Jaynes is speaking prescriptively, rather than descriptively, here. It’s not that Boole doesn’t intend the symbols to represent numerical values; it’s that Jaynes doesn’t. It becomes clear after a few more paragraphs that Jaynes is quite familiar with Boole’s work—two editions of it, even!

Evidently, then, it must be the most primitive axiom of plausible reasoning that two propositions with the same truth value are equally plausible. This might appear too trivial to mention, were it not for the fact that Boole himself (Boole, 1854, p. 286) fell into error on this point, by mistakenly identifying two propositions which were in fact different—and then failing to see any contradiction in their different plausibilities. Three years later, Boole (1857) gave a revised theory which supersedes that in his earlier book; for further comments on this incident, see Keynes (1921, pp. 167-168); Jaynes (1976, pp. 240-242).

So one problem that Jaynes observes is that Boole 1854, at least, is inconsistent. Since we’re still just talking about logic rather than probability proper, this is unacceptable. You might argue that we should give Boole a break, since it was early days. But Boole antedates Laplace, and in fact, as we’ll see later in Jaynes, Boole’s work on probability was written in reaction to Laplace.

In addition, though, it makes no sense to ascribe numerical values to propositions! Consider: The conjunction, using Boole’s notation, reads “Christopher Columbus discovered America in 1492 and it will rain today.” That’s a perfectly sensible conjunction that we may wish to determine the probability of. Boole’s algebra won’t get us there in either its 1854 or 1857 incarnations, but as we’ll see later, Richard Cox’s sum and product rules will.

The Path to Jaynes

| Comments

The Path to Jaynes: Introduction

Genesis

Hi! My name is Paul Snively. Since I’m relaunching my blog, I figure I might as well start from the beginning, so here goes.

I write software for a living. Like many programmers, I’ve been fascinated by the notion of Artificial Intelligence for a long time. In fact, it’s why I became a programmer: I’m old enough to have played the Zork trilogy on a Model I TRS-80 as a teenager, and I wanted to know how a game could understand some pretty sophisticated English imperatives, such as “Take everything from the table but the brown bag.” I’ve also always had a severe itch to understand things at the most fundamental level possible. I credit this both to being a Lutheran pastor’s grandson and the son of two teachers: the former definitely informs my belief that we live in an ordered universe, and the latter definitely informs my belief in the human ability to learn.

I was in the right place at the right time. I grew up (for some definition of “growing up”) in Columbus, Indiana, which is about 40 miles away from Bloomington, Indiana, home of Indiana University. While still in high school in the early 1980s, I enjoyed studying math and physics as well as working with the math department’s computers, and found myself reading The Little Schemer (well, actually, back then it was the SRI edition of “The Little LISPer”) and Gödel, Escher, Bach: An Eternal Golden Braid. As you might imagine, I was hooked, which was pretty convenient given where I was. I went off to IU as a freshman in 1984, with the intention to take a double major in computer science and physics, having met Douglas Hofstadter while visiting the campus. I was very excited! I was pretty good at math—I’d passed an entrance exam that would have allowed me to take a calculus course that would have covered differential calculus one semester and integral the other, and I’d already published some software while in high school, had two years of physics, and a year of calculus, so I was feeling pretty confident.

I got to IU in the fall of 1984. I hung around Lindley Hall a lot, making a nuisance of myself around Dr. Dan Friedman, another IU hero of mine. But Dr. Hofstadter was on sabbatical that year, which was disappointing. Worse, no one replaced him in teaching Honors Intro Computer Science, so I was stuck writing FORTRAN on punch cards to feed to a CDC Cyber 855. My confidence on entering was misplaced: my physics class had us doing things like triple integrals. My homework assignments were taking me an average of an hour per problem, which might have worked out if it had been my only class. IU wouldn’t let me declare a major, and required undergrad courses like psychology, sociology, and macro- and microeconomics. As difficult as it might be to believe, I was even more immature at 18 than I am now. I didn’t deal with my disappointment and frustration well at all.

Crash

The embarrassing, humiliating, and shameful truth is that I flunked out. This wouldn’t be so bad in and of itself, but my wonderful parents were paying for this and had such high hopes for me. I had wasted thousands and thousands of dollars of their hard-earned money, and in 1984-1985, it wasn’t at all obvious how a geeky kid with a Model I TRS-80 was going to make his way in the world of adults. To make an already-long story at least a little shorter, things obviously worked out pretty well. It turns out that being a largely self-educated programmer is feasible, or at least it was in the mid 1980s. But I threw away a huge opportunity in not figuring out how to conform to the requirements of the liberal arts undergraduate program at IU. One of the things that came to an abrupt end was my education in continuous mathematics, which finally brings me to the point.

The Middle Ages

Even as a self-educated programmer, my exposure to discrete mathematics and formal logic has been, I believe, pretty good. Basic set theory, Boolean algebra, even propositional logic with ∀ and ∃ are reasonably familiar and comfortable. As far as AI technology goes, I spent a lot of time with rule systems, frame systems, decision trees… essentially all of the material that’s covered so well in Peter Norvig’s Paradigms of Artificial Intelligence: Case Studies in Common Lisp, which I was proud to have reviewed in manuscript while being the only MacDTS engineer at Apple supporting Macintosh Common Lisp. But then things changed out from under me, as they so often do.

Fast Forward

Peter’s next text with Stuart Russell, Artificial Intelligence: A Modern Approach, has very different goals from PAIP. PAIP was deliberately historical in nature, showing how iconic approaches to AI could be implemented elegantly in Common Lisp. As a treatise on advanced Common Lisp programming, PAIP still holds up quite well. But AIMA is more properly a survey of modern approaches to AI, particularly Bayesian probability. Although definitely oriented toward practicality, with sample code available in multiple languages, AIMA nevertheless is, in my opinion, best thought of as a very extended bibliography, offering pointers to where a deeper dive might be worthwhile. Looking around the landscape for more information about probability theory eventually led me to Probability Theory: The Logic of Science, by the late E.T. Jaynes.

Like Lightning From a Clear Sky

One thing that immediately struck me upon encountering Bayes’ Theorem: is that it reduces to modus ponens: when the support that evidence B gives to proposition A, P(B|A)/P(B), equals 1.0. To me, this gave Bayes’ Theorem some nice intuitive justification. But it never occurred to me to wonder if the observation could be generalized, and if so, how. This is where Jaynes came in:

…neither the Bayesian nor the frequentist approach is universally applicable, so in the present, more general, work, we take a broader view of things. Our theme is simply: probability theory as extended logic. The ‘new’ perception amounts to the recognition that the mathematical rules of probability theory are not merely rules for calculating frequencies of ‘random variables’; they are also the unique consistent rules for conducting inference (i.e. plausible reasoning) of any kind, and we shall apply them in full generality to that end. It is true that all ‘Bayesian’ calculations are included automatically as particular cases of our rules; but so are all ‘frequentist’ calculations. Nevertheless, our basic rules are broader than either of these, and in many applications our calculations do not fit into either category.

Probability is logic? Sold!

There’s Just One Hitch

The following material is addressed to readers who are already familiar with applied mathematics, at an advanced undergraduate level or preferably higher, and with some field, such as physics, chemistry, biology, geology, medicine, economics, sociology, engineering, operations research, etc., where inference is needed.

“Applied mathematics, at an advanced undergratuate level or preferably higher.” Yikes!

Here’s the thing: I want to understand Jaynes. For all kinds of reasons. Partially so I can write well-conceived machine learning software, partially because if it’s true that probability is an extended form of logic then I find that philosophically important, and partially, yes, for its own sake. To get back to some old loves of mine, mathematics and physics. And hopefully to open new doors in subjects I’m a passionate amateur in, like economics.

What Does It Mean to Understand Something?

I often liked to play tricks on people when I was at MIT. One time, in mechanical drawing class, some joker picked up a French curve (a piece of plastic for drawing smooth curves -a curly, funny-looking thing) and said, “I wonder if the curves on this thing have some special formula?”

I thought for a moment and said, “Sure they do. The curves are very special curves. Lemme show ya,” and I picked up my French curve and began to turn it slowly. “The French curve is made so that at the lowest point on each curve, no matter how you turn it, the tangent is horizontal.”

All the guys in the class were holding their French curve up at different angles, holding their pencil up to it at the lowest point and laying it along, and discovering that, sure enough, the tangent is horizontal. They were all excited by this “discovery” -even though they had already gone through a certain amount of calculus and had already “learned” that the derivative (tangent) of the minimum (lowest point) of ANY curve is zero (horizontal). They didn’t put two and two together. They didn’t even know what they “knew”.

They don’t have intelligence. They have what I call “thintelligence.” They see the immediate situation. They think narrowly and they call it “being focused.” They don’t see the surround. They don’t see the consequences.

I’m no Richard Feynman, but I want to do better than those other students. I want to see the consequences.

OK, don’t panic. Let’s see. E.T. Jaynes was a physicist. So what would a good applied mathematics curriculum look like from that point of view? At the very least, it would include analysis (real, complex, Fourier, functional…), linear algebra, and differential equations, probably in that order. So let’s think about analysis first. The good news for someone like me is that it turns out “analysis” is just calculus through the lens of logic! I already have some background in logic, as you may have noticed from my mentioning the modus ponens inference rule above. So this is already encouraging. If you don’t have any background in formal logic, please let me recommend Logic and Structure as a good starting point.

Some poking around Jaynes and browsing the various analysis texts available for the Kindle convinced me that I want to work through A First Course in Real Analysis. It seems nicely poised for someone with some exposure to logic, and doesn’t limit itself to the one-dimensional case the way many other introductory texts do. It also establishes a connection to differential equations, which I intend to pursue in more depth later. So my intention is to work through this text right here, in plain view, warts and all. I’ll discuss the problems and my attempts to solve them. Since this is about logic, I’ll be working with the Coq proof assistant.

Protter and Morrey begins by discussing the axioms defining a field, because the real numbers form a field. The first problem they pose, then is:

  • Show that in Axiom A-5 it is not necessary to assume there is only one number x such that a + x = 0.
Axiom A-5 is:

  • A-5. Existence of a negative. If a is any number, there is one and only one number x such that a + x = 0. This number is called the negative of a and is denoted by - a.
I immediately ran into a minor problem: Coq’s standard library’s field axioms for real numbers don’t have Axiom A-5! Instead, they assume the existence of the negative in order to say that a real plus its opposite is 0. Still, the existence of the negative is obviously in there, so it seems that instead of taking A-5 as an axiom, I can state it as a lemma and prove it using the axiom Coq does have, Rplus_opp_r. Now comes the kind of cool part: this entire blog post is written in Coq. The HTML you’re viewing was generated by coqdoc with a Makefile that massages things slightly so that I can just copy the resulting file into my blogging software and publish it. So without further ado, here is a very simple lemma proving the existence of the negative of any real number:

Require Import Reals Utf8.
Open Scope R_scope.

Lemma negative_exists : x, y, x + y = 0.
proof.
  let x:R.
  take (-x).
  thus thesis by Rplus_opp_r.
end proof.
Qed.

I import the Reals, obviously, and Utf8 because I want to use the formal symbols for “forall” and “exists,” because I’ll be seeing those in texts a lot. I name my lemma and state it: for all real numbers x, there exists a real number y such that x + y = 0. I then write the proof script in Coq’s Mathematical Proof Language, which deliberately strives to be at least somewhat similar in syntax to proofs done in the texts. In this very simple case, I just introduce the universally-quantified x, take its opposite to witness the existence of y, and then the thesis follows by the Rplus_opp_r axiom from Coq’s standard library.

This is, of course, just setting up the first problem, but this post is already quite long and it’s taken a solid couple of days of working on the infrastructure to get it to the point where I like how it works, so I think I’ll stop here. I hope a few people who want to master probability via Jaynes will join me on this journey. I will, of course, not only be publishing this, but putting the git repository with my Coq developments online as well. In the interest of full disclosure, all book links here go through my Amazon affiliate account. I’ve tried to link to Kindle editions where available both for convenience and to save trees, but sometimes important texts, like Jaynes itself, aren’t available for the Kindle. Finally, I should add that my ultimate goal isn’t just to work through Jaynes. I also want to develop software along the way, exploring the pragmatics of probability in software as well as the theory. This is going to be a long road. But I suspect that from both an intellectual and practical point of view, it may very well be one of the most rewarding ones to travel.

Acknowledgements: I’m inspired by the daily “stupid” questions after her first year of programming on @IrisClasson’s blog. Adam Chlipala graciously helped tutor me at U. Penn’s Coq tutorial after POPL a few years ago and is single-handedly proving you really can write a book entirely with coqdoc. Finally, my wonderful wife has encouraged me so often, but in particular recently, specifically to take up writing again.

This post is dedicated to my parents, who believe in me still.

Yes, Virginia, Scala Is Learnable

| Comments

We’re using Databinder Dispatch a lot in the Cloud Services Engineering group at VMware, and late last week I was discussing it with one of my colleagues, a very senior (not average!) Java developer. I showed him a snippet of Dispatch code and said I wouldn’t expect anyone to understand it on first reading. He seemed surprised by that, unfortunately in the sense that he seemed to believe that it was expected that team members understand Dispatch code on first reading. Then Dave Pollak’s excellent Yes, Virginia, Scala is Hard post appeared, calling me out by name. :-) While it’s extremely flattering that Dave thinks I’m a statistical outlier with respect to programming language expertise, his comment, along with my disappointment to find that a very capable colleague apparently felt pressure to understand something that I expect no one to understand immediately, impels me to try to address the question of Scala’s complexity.

There are at least two obvious forks in the road that I could follow in attempting this. Luckily for me, Michael Fogus followed one fork with a vengeance in his wonderful Scala is for drivers post, which nicely covers a cost:benefit analysis of Scala, acknowledging the initial learning curve but, crucially, pointing out that it abates over time, unlike some of the alternatives. This gives me the opportunity to pursue the other fork: what is likely to be mysterious to the average, or even not-so-average, Java developer, and perhaps to offer some suggestions as to some key facts about Scala that might help shorten the time horizon of the learning curve, or lessen the slope, or if I’m very lucky, both.

Here’s some code I whipped up to demonstrate some points:

First, the purpose of the code: it prints an XML document containing the first 500 artists (people, rather than bands) on MusicBrainz. I had originally intended to go farther, allowing the user to name an artist, and printing a list of that artist’s contemporaries, which is why there’s a dependency on Jorge Ortiz’s scala-time lurking in there, but just getting a list of artists from MusicBrainz was sufficient to demonstrate the points I want to make.

So what hints can I offer even very senior Java developers faced with code like this, and how do the issues they address show up in the code?

1. Scala has no operators.

If I had to pick one thing to get new Scala developers to internalize, this would be it with a bullet. Looking at the Scala code above, we see quite a lot of this: /, <:<, ->, \, <<?, <>, +, and ++ all look like they’re operators in the sense that they’re not Scala syntax such as <-, not things we’ve named ourselves, and they’re surrounded by whitespace. But if they aren’t operators, what are they?

They’re methods. But don’t methods come after .’s? Not necessarily.

2. Scala has dot-, parenthesis-, and semicolon-inference.

Methods-as-operators only work if you can eliminate . and () pairs with relative ease. Scala’s rules for this are actually pretty simple, but only if you know them, which Java developers obviously don’t by definition. Paul Phillips does a great job explaining in this post.

Back to Dispatch’s methods. My colleague made a very helpful comment upon seeing << used to pass POST parameters in some Dispatch code: “Scala works with Java, and in Java << is a bit-shift left operator.” The problem is, he’s right. It took me a couple of days to see what the two issues are that make a method like << confusing to a Java developer. The second is related to the “no operators” observation, so I’ll state it as a…

Corollary to 1: in languages with operators, the operator defines what can appear around it; in Scala, what’s around the method defines what the method can be and what it does.

A good example of this is the expression params + ("offset" -> offset.toString). A Java developer probably thinks of + as an operator that works on the obvious numeric types. params, though, is a Map. To make matters even more confusing, it’s an immutable Map, so even if + is a method on it, what could it possibly mean? A quick look at the ScalaDocs for Map quickly shows that + is indeed a method on Map, which simply takes a new key/value pair (giving us a hint that -> might be used to construct a pair) and returns a new Map with the key/value pair in it. So if params is OK to pass to something expecting a Map[String, String], so is params + ("offset" -> offset.toString).

The first confusing point, though, is the idea that Scala “works with” Java. To narrow in on that a little, Scala “works with” Java in the sense that there’s a high degree of interoperability among Java and Scala classes, interfaces, objects, traits, and methods—things we might think of as “first class” from the point of view of the JVM. But Java’s operators are an artifact of Java syntax, and Scala doesn’t use Java syntax. There’s no reason this should be obvious to a Java developer.

But what is this thing that’s taking params + ("offset" -> offset.toString)? It’s <<?, another method that looks like an operator. Since it’s a method, it must be defined, somehow, on api. api is itself an odd construction, seeming to start with a call to a function :/ on a hostname, followed by / methods with path elements. And that’s exactly what it is.

3. If it looks like a function application, it is one.

I talked about this in Constructive Criticism but it bears repeating, because it’s very important: a token followed by ( followed by some expression followed by ) is a function application, period, the end. That means, yes, in val foo = Map("bread" -> "butter"); foo("bread"), foo(“bread”) is a function application, i.e. a Map is a function. That seems weird to Java developers, who aren’t used to…

4. All functions are objects; objects with an apply() method are functions.

In particular, an object with an apply() method can maintain state that’s important to the apply() method, as is obviously the case for a Map. This is Scala’s answer to Norman Adams’ old observation from the Scheme community that “Objects are a poor man’s closures.” Norm’s right, but because Scala runs on the JVM and CLR and interoperates with other languages on those platforms, its objects are native, and it’s the apply() method that gives these objects their functional nature. To the extent idiomatic Scala code is functional as much as, or more than, object-oriented, this will also pose a learning curve for Java developers.

5. Almost anything can be defined almost anywhere.

Java developers might be surprised to find a class definition inside a def, or an import inside an object. This gives the developer maximal expressive power on one hand, but on the other hand it also helps control the scope within which things like methods-as-operators work. Like all modern languages, Scala is lexically scoped: you can read from the “inside out,” as it were, to find where unfamiliar things in the code come from. In the expression http(api <<? params <> identity), you can trust that <<? is defined relative to api, otherwise the code wouldn’t compile. Similarly, the entire expression api <<? params <> identity must match the type that the http object/function (instantiated by new Http and applied with ( and )) expects. Learning to follow code inside-out is an important skill when you’re faced with code using an unfamiliar API, even if you actually know the Scala language, per se, quite well.

Gotchas

Does this mean that all is peaches and cream in Scala, and all Java developers need to do is learn a few simple rules to understand any and all Scala code? Not at all. Here are a few things that come immediately to mind that are, in my opinion, likely to be ongoing stumbling blocks to Java developers learning Scala.

Implicit resolution

Scala supports implicit objects and implicit defs. They lend amazing power to the language, but mastering their use is one of Scala’s bigger challenges. To fully understand what a sophisticated piece of Scala code does, you’ll need to understand the implicit resolution rules. Like .- and parenthesis-inference, it’s not that the rules are especially complex; it’s just that you either know them or you don’t, and Java developers, again by definition, don’t. For what it’s worth, I don’t either.

Methods bind to the left… except when they don’t.

Methods that don’t end with : are defined on the object to their left. Methods that do end with : are defined on the object to their right. Yeah, I know.

Executive Summary

While it’s true you can, for the most part, write Scala as if it were Java, idiomatic Scala tends not to be very much like Java in the end. Worse, some intuitions brought from Java, such as those regarding operators, can actively lead you astray. I’ve given five basic observations about Scala in an effort to help focus Java developers’ intuitions in a constructive direction, and noted in passing a couple of things that, even after over two years of Scala, continue to be thorns in my side.

My one-sentence summary, though, would be: there’s no substitute for actually learning the language, and yes, Virginia, Scala is learnable.