augment

As useful as comprehensions have turned out to be, it doesn't take long to hear grumblings about the awkwardness of their syntax. In fact it's been an undercurrent in many recent online Scala discussions... This library is a further attempt to have the equivalent of a comprehension in a more "direct style": one that doesn't require nesting, or special syntax, or even macros. Instead it focuses on the simple notation of function application, with behavior driven by different argument types. The idea is to get Scala 3's type machinery to earn its keep by doing the awkward work behind the scenes.

An "augmented" function, if we can call it that, is one that allows this more direct notation for comprehensions. Starting with a plain, vanilla, run-of-the-mill function f0 that takes arguments from sets A and B, the augmented version f replicates f0 for single values, i.e. f(a, b) is the same as f0(a, b), but where you would have

  for
    a <- A
    b <- B
  yield
    f0(a, b)

you can instead use simply

  f(A, B)

Now you may reasonably object that this is only the simplest case, what about the rest? Can you shoehorn everything else into the same notation? Well, on the whole, you can... provided - and this has to be the least surprising twist in all of functional programming - that you use functions.

Let's take the common example of Pythagorean triangles, where we're looking for integer side lengths such that you wind up with a right angle. The comprehension notation would be something like:

  for
    a <- 1 to n
    b <- a to n
    c <- b to n
    if a * a + b * b == c * c
  yield(a, b, c)

Here you could have instead:

  select
    ( 1 to n,
      _ to n,
      _ to n,
      (a, b, c) => a * a + b * b == c * c)

This takes advantage of Scala's underscore notation for functions, since _ to n is a more succinct way of saying x => x to n. Under the hood, select is the augmented version of the tupling function f, such that f(a, b, c) = (a, b, c). In much the same way as a comprehension, this notation is translated into other instructions that do the actual work. But it doesn't always have to be a mess of nested flatMaps.

Different notation, same old comprehension

Why does this notation work? Because you're really talking about the same thing as a comprehension. For now, let's ignore other types of comprehension and start at the beginning, with the set comprehension. This is where David Turner began over forty years ago with his language KRC, which along with its successor Miranda strongly influenced Haskell and then other languages. It implemented a kind of set-builder notation, which as he described it defines "the set of all f x such that x is a member of S".

There are really two separate parts to this set-building, i.e. comprehension: a definition of S, which is a subset of the domain of f, and then the application of f to this subset. Even though the idea of "set building" originally referred only to the first part, we now understand a set comprehension to be both a restriction to a subset and subsequent function application. This is where you have some leeway with notation.

Instead of leaving the application of f as an afterthought, to be "yielded" iteratively after the subset has been constructed, this library's notation puts f front and center, by considering the subset to be its direct argument. Sacrilege!... although in some respects this was the central conceit behind an even older approach than Turner's, that of category theory, which - as anyone who has been browbeaten into reading about monads can attest - told us over eighty years ago that a function is best viewed not as transforming individual elements of a set, but as transforming sets in their entirety. Not f(a) = b, but f(A) = B. An arrow between two objects, i.e., in our case, sets.

Compared with the vanilla version, an augmented function is deemed applicable to additional types of arguments, the main ones being sets, and what we might call "dependent sets". How so? Let's say you're asked to choose three playing cards: it could be from three separate decks, in which case there are just three sets (i.e. decks). Or you could be asked to pick three from the same deck, which is the same as picking one from the full deck, then another from a deck with that card removed, then similarly a third where two cards have been removed. What are the possible outcomes? Here, if you've forgotten, you might want to remind yourself of what a Cartesian product is, because the choices in the first case are the Cartesian product A x A x A, if A is the full set of cards. In the second case your choices are constrained to a subset of that product (where no card is picked twice). But in the right light... you could view this subset as itself a kind of product, where the second and third terms are incomplete decks. Very, very hand-wavingly, we could argue that the second case is the extension of a Cartesian product to "dependent sets".

Even though in functional programming it's sometimes argued that all functions are of one variable (because of currying, etc.), in practice we're dealing with functions of several variables, which is where the Cartesian product comes in: if f is a function of three variables that can range over sets A, B and C, then its domain is A x B x C. The set comprehension lets us define a subset of A x B x C. Only in the simplest cases is this subset itself the Cartesian product of three subsets, which in this library is referred to as the "rectangular" case. As soon as you allow "dependent sets", you can choose any subset whatsoever of A x B x C (referred to as the "irregular" case), which is why comprehensions are so useful. Of course our so-called dependent set is not a set at all, but instead a function that produces a set.

This points to the general form of augmented function application in the rectangular case:

  f(A',  // subset of A
    B',  // subset of B
    C')  // subset of C

and in the more general, irregular case:

  f(A',           // subset of A (ordinary set)
    a => B',      // subset of B that depends on a in A'
    (a, b) => C') // subset of C that depends on a in A' and b in B'

The irregular case just about jibes with the example of Pythagorean triangles, where the arguments are 1 to n, a => a to n and (a, b) => b to n. In fact, the third argument is b => b to n: the convention used here and elsewhere is that where the function has a single argument, it refers to the nearest set (or dependent set). And there is a fourth argument, but this is a predicate which can always be folded into the last dependent set.

There is not really a fundamental difference with Turner's original explanation, more a rearrangement: when he mentions that "generators come into scope from left to right, so later generators can involve earlier ones, but not vice versa", this is reflected in the type signatures of the dependent sets, which are functions of a, then of (a, b), then (a, b, c) etc.

Given that this notation is essentially a comprehension in different clothing, what are the advantages of using it? Well, arguably it can be more concise: there are no on-the-fly variable bindings, and sometimes no variables at all. There is no special backward-arrow syntax, which makes it easier to use in Java and other JVM languages. (Sure, there's the forward arrow, but no one in this day and age is going to argue against lambda functions. We've come too far.) Possibly the main advantage, though, is that this notation inherently distinguishes between the "rectangular" and "irregular" cases.

Result shapes

One somewhat inconvenient aspect of the original list comprehension is that you get a list back, no matter what. This makes it a bit of a halfway-house in terms of the "shape" of the results: sometimes, as in the case of Pythagorean triangles, there is no discernible structure anyway, but in simpler cases a Cartesian product might be associated with a 3D geometric space, in which case... why return a list? If you entrusted your favorite piece of furniture for safekeeping, and got it back as a flatpack, yes it would technically all be there, but you might be a little miffed all the same.

In truth Turner's original paper did not concern itself with questions of associated structure, because it focused on the more complex cases, where the use of recursion meant the comprehension was singlehandedly implementing an algorithm, one that would presumably mangle any structure you started out with. Today though the comprehension is used for all and sundry, and there are advantages to treating the simpler "rectangular" case differently to the bespoke "irregular" case.

For instance retaining a 3D-structure is handy if you want to plot a function:

  val f = augment((x: Int, y: Int) => 100 - x * x - y * y)
  f(-10 to 10, -10 to 10) plot("Sample Function")

By default, this rectangular case will produce a multidimensional array that the plot function can use directly, via plotly. (This is configurable: thanks to type classes, you can choose any other default shape as long as you specify how to "fill it", and the testing code contains an example of how to do this with Guava tables.) There's also an animate function which can illustrate that in geometrical terms, a comprehension implemented with nested mappings is essentially "3D-printing" the result:

  def sq(i: Int) = i * i
  def sphere(r: Int) =
    select(-r to r, -r to r, -r to r, sq(_) + sq(_) + sq(_) <= r * r)
  sphere(5).animate("3D-printed sphere", 3, 0)

It also suggests that the painstaking enumeration involved in 3D-printing might be unnecessary for simpler rectangular blocks. Which brings us to...

Applicatives and monads

Yes, in the context of comprehensions, mentioning monads was only avoidable for so long. But perhaps our preamble has suggested a slightly different view of monadic composition: as a generalization of the Cartesian product. We started with f(a, b, c), then continued with f(A', B', C') and f(A', a => B', (a, b) => C'), in other words f applied to single values, sets and dependent sets. The next extension is via "derived" or "amplified" types which share a fundamental trait with sets: you can "map" over them. Conveniently glossing over the difference between type A and set A, we'll say that T[A] is a mappable type somehow related to A, and that you can apply the augmented function both via

  f(T[A], T[B], T[C])

and

  f(T[A], a => T[B], (a, b) => T[C])

The first case is effectively applicative, and the second, monadic: since they are clearly distinguished by their type signatures, they can be handled differently without further ado, since they are the "rectangular" and "irregular" cases we saw previously.

An overview

At this point the notation of function application has been stretched a fair ways. Using sets as arguments seems reasonably familiar: it recalls function vectorization in languages such as R (even if by default the result here is a Cartesian product, not a vector). Passing functions as arguments to other functions is also quite commonplace (even though it's a practice that dare not always speak its name: sometimes it seems that half the point of OOP is to cunningly rebrand it as "runtime polymorphism"). The function passed as an argument might even "invert control", as with continuation passing style. But it's true that here several of these usages are combined, yielding a number of additional type signatures.

There are a few ways of looking at this: as a mere convenience provided by Scala's apply function and static type inference, or, if you squint long enough, as somehow corresponding to the "natural extension" of a function to a larger domain. Let's studiously ignore any difficulties involved in backing up the latter position, and say that for our purposes, it doesn't really matter which view you take.

Although the notation is essentially just rearranging the terms of the comprehension (which is why it works), you are, in some sense, turning the function inside out, with the arguments seemingly calling the shots. In the monadic case, the "dependent derived type" argument a => T[B] of course recalls flatMap (i.e. bind): instead of applying a higher-order function F to an ordinary function f, in other words Ff, it's as if you're being asked to consider, hold on to your hats... fF.

"Hey, you got your chocolate in my peanut butter!"

"Hey, you got your peanut butter on my chocolate!"

A word on terminology

For any library in this general vicinity, a decision has to be made on whether or not to use CT terms (functor, monad, etc.) There are good arguments on both sides. Those who think the situation is cut-and-dried might wish to consider the experience of mathematician Pierre Cartier, a prominent member of Bourbaki, who for years didn't like using "groupoids" because of their name (The sound is not even required, because the facial expression is meant to leave you in no doubt.) The term "monoid" meets with similar disapproval... If even he can be put off working with useful concepts because of their name, it's probably not something to be discounted. If anything, the tradition continues today with jarring terms like "tagless final"... For now, this libary comes down marginally on the side of using fuzzier terminology (Mappable), which at least emphasizes a salient characteristic involved in a Cartesian product.

Plotting

Away from any considerations of category theory... these examples illustrate augmented function notation as a way to produce 2D and 3D plots, and can be found in the integration tests, both in Java and in Scala.

Fitted curves

String distances

The 8 queens problem

Conway's pulsar