Measures and stuff

This is out of character for me, but… I’ve been reading about analysis lately. It is beginning to catch my interest. In this post, I will highlight just how demonic and twisted the mathematical staff known as the axiom of choice is, and proceed to ramble incoherently about measure and integration.

The first (perhaps surprising) thing we learn when we try to “measure the size” of subsets of $\mathbb{R}$ (or more generally of $\mathbb{R}^n$) is that carrying out this task in general is impossible, assuming we want this idea of “size” to be non-trivial and satisfy some reasonable properties. This is because we get choice-trolled by a ton of lame sets (there’s uncountably many; these for example) whose structure is just too monstrously convoluted (no pun intended) for them to play along with our little game. Our response, of course, is to contemptuously ignore them and anything to do with them; how dare they try to foil our master plan of world integration!

When one looks for “measurable sets”, one is lead naturally to the concept of a $\sigma$-algebra, which is basically a bunch of subsets that satisfy some closure properties (in particular the $\sigma$ indicates closure under countable unions). Recall the Borel subsets of $\mathbb{R}$: these things form an aesthetically pleasing hierarchy, and effortlessly come out in the wash when we conjure up the “smallest” $\sigma$-algebra containing all the open sets of $\mathbb{R}$, which seems like a pretty natural thing to do.

Roughly speaking, to define the Lebesgue measure $\lambda$, we declare the size of $(a,b)$ to be $b-a$, and then extend this to all Borel sets in the only sane way possible (we could technically go a bit further than this, since not every Lebesgue-measurable set is Borel). Now let’s say we want to integrate (sufficiently nice, i.e. Borel-measurable) functions $f \geq 0$ against this measure. The general strategy begins by chopping up the range of the function into a bunch of pieces. For each of these pieces $S$, we then examine its preimage under $f$. To determine how much it contributes to the “area under the curve”, we measure these preimages via $\lambda$, and we use these measures as weights. In this sense we can define the integral of such an $f \geq 0$, and by looking at positive and negative “parts” we can subsequently drop the non-negativity requirement. The integral satisfies, well, just about everything you would expect it to.

Now, onto the classic question asked by high-schoolers worldwide: why, exactly, do we write $\mathrm{d}x$ after the integrand? There is not one answer to this question since integration itself is only an intuitive concept which admits different realizations and implementations. For example, attempts to generalize “standard” (Riemann) integration lead both to Lebesgue integration (and ultimately, integration against general measures — a method which is completely ignorant of “orientation”), and to integration of differential forms over chains (which “cares” about orientation). In the context of this article, though, I would be tempted to say that we are simply using $x$ to denote the Lebesgue measure.

When one generalizes this to Lebesgue-Stieltjes integration, a general increasing function $\alpha(x)$ takes the place of $x$ in that expression (indeed, Lebesgue integration is the special case where we take $\alpha(x)=x$). What we are essentially doing is integrating against a “Lebesgue-Stieltjes” measure, which is constructed much in the same way as the Lebesgue measure except we now declare (roughly speaking) $\mu_\alpha((a,b)) = \alpha(b) - \alpha(a)$. As an important special case, if we let $\alpha$ be the indicator function of $[0,\infty)$, we note that $\mu_\alpha$ assigns a measure of 1 or 0 to sets, based on whether they contain or do not contain (respectively) the point $0$. This is called the Dirac measure, and integration of a function against it (as you can verify) merely reads off the function’s value at 0.

We can also come up with a more screwed up example: consider the Lebesgue-Stieltjes measure $\mu_f$ when $f$ is the Cantor function. One thing we notice about $f$ is that its derivative exists Lebesgue-almost everywhere, but is also 0 Lebesgue-almost everywhere… yet $f$ is not constant; it is increasing. The measure we obtain is “concentrated” on the Cantor set $C$ — for example if we measure a set $S$ that lies in the complement of $C$, it will have measure 0!