I have been attending a reading course on stochastic analysis led by Professor Ioannis Karatzas, where the students take turns in presenting a topic of their choice. I recently presented on Choquet's theory of capacities and its applications to measure theory and in the general theory of processes. This blog post is based on this presentation. I have freely copied[1] from many sources, but my primary reference are the (unfortunately unpublished) lecture notes by Prof. Karatzas.
Kolmogorov laid the modern axiomatic foundations of probability theory with the German monograph Grundbegriffe der Wahrscheinlichkeitsrechnung which appeared in Ergebnisse Der Mathematik in 1933. This was a period of intense discussions on the foundations of probability, and a majority of probabilists at the time considered measure theory not only a waste of time, but also an offense to "probabilistic intuition" (Meyer, 2009). But by 1950, with the work of Doob in particular, these discussions of foundations had been settled.
Continuous-time processes, on the other hand, were difficult to tame even with measure theory: if a particle is subject to random evolution, to show that its trajectory is continuous, or bounded, requires that all time values be considered, whereas classical measure theory can only handle a countable infinity of time values. Thus, not only does probability depend on measure theory, but it also requires more of measure theory than the rest of analysis (Meyer, 2009).
The missing pieces of the puzzle, which will be the highlight of this and the next blog post, are the debut, section and projection theorems. These theorems are also indispensable in many applications, for instance in dynamic programming and stochastic control (Karoui and Tan, 2013).
To get a taste of these theorems, let's recall a famous error made by Lebesgue in the paper Sur les fonctions représentables analytiquement published in 1905. Consider the measurable space (R2,B(R2)) and the projection map π given by R2∋(x,y)↦π(x,y)=y∈R. It is easy to see that for any open set O in R2, the set π(O) is also open in R: Recall that the standard topology on R2 is same as the product topology on R2. By the definition of the product topology on R2, an open set O in R2 is of the form O=⋃i∈I⋂j∈JiUij×Vij for open Uij,Vij in R, I arbitrary and Ji finite. A simple argument gives π(O)=⋃i∈I⋂j∈JiUij which is open in R. In fact, more generally, projection from any product space (with product topology) is an open map. Now it seems reasonable to expect that for any Borel set B∈B(R2) its projection is also a Borel set in B(R), and Lebesgue assumed this in his paper. But, in fact, this is FALSE! The error was spotted in around 1917 by Mikhail Suslin, who realised that the projection map need not be Borel, and this lead to his investigation of analytic sets and to begin the study of what is now known as descriptive set theory (Almost Sure blog).
The problem is projection doesn't commute with countable decreasing intersection. For example, consider the decreasing sequence of sets Sn=(0,1/n)×R. Then π(Sn)=R for all n, giving ⋂n∈Nπ(Sn)=R, but ⋂n∈NSn=∅, giving π(⋂n∈NSn)=∅. The measurable projection theorem stated next will be one of the highlights of this post.
Measurable Projection Theorem: Let (Ω,F,P) be a complete probability space, let (K,B(K)) be a locally compact separable metric space endowed with the collection of its Borel sets, and denote by π the projection of K×Ω onto Ω. Then, for every B∈B(K)⊗F, the projection π(B)∈F.
Why is proving such results difficult? As mentioned above it's because projection doesn't behave nicely with intersections. Nevertheless, let us try to see how one might try to prove the above theorem. A standard approach in measure theory is to construct a collection like
E={S⊆K×Ω:π(S)∈F}
which contains the sets satisfying the desired property, and show that it is a σ-algebra containing a simple collection, say A, such that it easy to show that elements of A satisfy the desired property and A generates B(K)⊗F, because then we will have B(K)⊗F⊆E. To this end, let
A={S⊆K×Ω:S=E×F for E∈B(K),F∈F}.
Then it is easy to see that A⊆E, A generates B(K)⊗F and that A is an algebra. If we could show that E is a monotone class, then we would be done on account of monotone class theorem. Increasing sequences are easily handled, in fact if {Sn}n∈N⊆K×Ω is any sequence, then
π(n=1⋃∞Sn)=n=1⋃∞π(Sn).
But if S1⊇S2⊇⋯, then in general we cannot say
π(n=1⋂∞Sn)=n=1⋂∞π(Sn).
Enter Choquet's theory of capacities. It provides the language to prove results like these. We know that every Borel measure μ on Rd (in fact, any Polish space) has the interior regularity (also known as tightness) property
μ(B)=K∈K(Rd)K⊆Bsupμ(K),∀B∈B(Rd),
where K(Rd) is the collection of all compact sets in Rd. The Choquet's theory of capacities generalizes this approximation-from-below property, and distills those properties of measure that allow for such approximation to hold in very general settings. As we will see, monotonicity and continuity from above and below properties are at play here, and notions corresponding to complements or differences will be absent.
The next few sections will be very abstract and it is easy to lose sight of our goal. Some people enjoy this mental gymnastics, but even if you find this dry, the reward at the end will be worth the initial struggle. We start with Choquet's theory of capacities. The highlight of this part will be Choquet's capacitability theorem. To prove this major result we will need to define a lot of new terminology and prove some major results like Sierpiński's theorem and Sion's theorem. Armed with Choquet's capacitability theorem we will prove Measurable Section theorem which in turn will form the backbone of various other results in measure theory. These results in measure theory will then help us prove results in general theory of processes, but we will discuss this part in the next blog post.
Definition 1: Let E be a nonempty set. A collection E of subsets of E is called a paving if it closed under finite unions and finite intersections. The pair (E,E) is then called a paved space.
The concept of paving generalizes the concept of algebra. As an example, if E is a topological space, then the collection of closed subsets of E forms a paving. As another example, if E is a Hausdorff space, then (E,K(E)) is a paved space, where as before K(E) denotes the collection of all compact sets in E.
It is easy to check that an arbitrary intersection of pavings is also a paving, and that the collection P(E) of all subsets of E is a paving. Thus for any collection A of subsets of E, we can define the notion of paving generated byA as the smallest paving of subsets of E that contains A by simply defining it to be the intersection of all pavings of E containing A.
Definition 2: For two paved spaces (E,E) and (F,F), the product paving of E and F, denoted by E⊗pF, is a paving on E×F generated by all rectangles R={A×B:A∈E,B∈F}.
Using the fact that (A1×B1)∩(A2×B2)=(A1∩A2)×(B1∩B2) we see that R is stable under finite intersections. Therefore, any element of E⊗pF is of the form ⋃i=1nAi×Bi, where Ai∈E and Bi∈F for all i=1,…,n.
Definition 3: Let E be a nonempty set. A collection of subsets of E which is closed under countable unions and countable intersections is called a mosaic.
The concept of mosaic generalizes the concept of σ-algebra. Just like paving, it is easy to define the notion of mosaic generated by a collection. They will always occur in the context of a paving E on E. We denote by E the mosaic generated by E.E⊗pF will denote the mosaic generated by the product paving E⊗pF.
Just like with monotone class arguments in measure theory, the paving E will be a simple collection of subsets of E for which it is easy to prove a property P. From here we will show that the elements of the mosaic E also satisfy P. Often E will be a σ-algebra.
Henceforth the notation E will be used for a paving on E.
Just like the results connecting algebra and σ-algebra, we have results connecting pavings and mosaics.
Lemma 1: If A∈E implies Ac=E∖A∈E, then E=σ(E).
Follows immediately from the monotone class theorem.
As an example, if E is a separable, locally compact metric space, and E=K(E) is the collection of compact subsets of E, then this property holds. In fact, E can be a second-countable locally compact Hausdorff space. Then it is σ-compact and every compact subset is closed. This implies that the open set given by the complement of a compact set is a countable union of compact sets. Combine it with the fact that an open subset of a locally compact and σ-compact is itself σ-compact to get that if A∈E, then Ac∈E.
Lemma 2: The mosaic E is the smallest collection of subsets of E that contains E and is closed under countable increasing unions and under countable decreasing intersections. In other words, if M(E) denotes the monotone class generated by E, then E=M(E).
Since a mosaic is a monotone class and since E⊆E, we have M(E)⊆E. For the other side, we will be done if we show that M(E) is a mosaic, since E⊆M(E).
The first property to note is that a monotone paving is a mosaic. To see this, let R be a monotone paving and A1,A2,…∈R. Then since R is a paving, Bn=⋃i=1nAi∈R for all n∈N. But {Bn}n∈N is an increasing sequence and R is a monotone class, and therefore their union ⋃n∈NAn=⋃n∈NBn∈R, and hence R is closed under countable unions. Similarly for countable intersections.
Therefore, we will be done if we show that M(E) is a paving. To this end, for any B⊆E, let
K(B):={A⊆E:A∪B,A∩B∈M(E)}.
Notice that by symmetry, A∈K(B)⟺B∈K(A). If A1⊆A2⊆⋯∈K(B) is an increasing sequence, then
and similarly for decreasing sequences. Therefore, if K(B)=∅, then it is a monotone class. If A,B∈E, then by the definition of a paving, A∈K(B). Since this is true for every A∈E, we have E⊆K(B), and K(B) is a monotone class containing E. Since M(E) is the smallest monotone class containing E, we have
M(E)⊆K(B).
Hence if A∈M(E) and B∈E, then A∈K(B), and therefore B∈K(A). Since this is true for every B∈E, it follows that
M(E)⊆K(A).
The validity of this relation for every A∈M(E) is equivalent to the assertion that M(E) is a paving.
Recall a standard result from topology: a topological space X is compact if and only if for every collection C of closed subsets of X having the finite intersection property (i.e., every finite sub-collection of C has nonempty intersection), the intersection ⋂C∈CC is nonempty. For our use case, we consider countable sub-collections.
Definition 4: Consider an arbitrary collection A of subsets of E. It is called a compact collection if every countable sub-collection of elements in A having the finite intersection property has non-empty intersection. A paving E is a compact paving, if every decreasing sequence of nonempty elements of E has a nonempty intersection.
It is easy to verify that a compact paving is a compact collection. As an example, if E is a separable metric space, the collection, K(E), of all compact subsets of E is a compact paving. This follows immediately from the finite intersection characterization of a compact topological space stated above.
We define
Eδ:={n∈N⋂An:An∈E for all n∈N}
and note that if E is a compact paving then so is Eδ. Indeed, Eδ is a paving because for A=⋂n∈NAn and B=⋂m∈NBm in Eδ,
and Eδ is a compact paving because intersection of a sequence of countable intersections is again a countable intersection.
The next lemma tells us when it acceptable to commute projection with countable intersections.
Lemma 3: Let K and E be two nonempty sets, and denote by π the projection of K×E onto E. Suppose that H is a paving of subsets of K×E with the property that, for every x∈E, the collection H(x):={H(x):H∈H} is a compact paving on K. Here H(x):={y∈K:(y,x)∈H} denotes the x-section of H⊆K×E. Then, for every decreasing sequence {Hn}n∈N of sets in the paving Hδ, we have
π(n∈N⋂Hn)=n∈N⋂π(Hn).
π(⋂n∈NHn)⊆⋂n∈Nπ(Hn) is easy to see because if x∈π(⋂n∈NHn) then there exists (y,x)∈K×E such that (y,x)∈⋂n∈NHn which implies x∈π(Hn) for all n∈N.
For the other side, let x∈⋂n∈Nπ(Hn). Then the sequence {Hn(x)}n∈N is decreasing whose elements are nonempty and they are in H(x)δ. But since H(x)δ is a compact paving by assumption, ⋂nHn(x) must be nonempty, implying x∈π(⋂n∈NHn).
Definition 5: Let (E,E) be a paved space, and fix a subset A⊆E as well as a decreasing sequence {Ak}k∈N⊆P(E). We say that A is an E-envelope of {Ak}k∈N, if there exists a decreasing sequence {Ck}k∈N⊆E∪{E} such that Ak⊆Ck,∀k∈N and k∈N⋂Ck⊆A.
Examples:
Let E be a separable metric space, and E be the paving consisting of all closed subsets of E. Then a subset A of E is an E-envelope of a given decreasing sequence {Ak}k∈N⊆P(E) if, and only if, A contains ⋂k∈NAk, the intersection of the closures of the sets Ak,k∈N in the sequence. Indeed, that A is an E-envelope of {Ak}k∈N (with the existence of a sequence {Ck}k∈N as in Definition 5 above) implies A contains ⋂k∈NAk follows from the observation that Ak⊆Ck for each k∈N, while the other side is easy to see if we let Ck=Ak for each k∈N.
An abstract version of the example above: Let (E,E) be a paved space; for every subset A of E, introduce the collection of sets A:={B∈E∪{E}:A⊆B} and assume that the intersection A:=⋂B∈AB, called the adherent of A in the paving E, belongs to Eδ∪{E}, i.e., A is a countable intersection of sets in E∪{E}. We claim, and show next, that A is an E-envelope of a given decreasing sequence {Ak}k∈N⊆P(E) if, and only if, A contains ⋂k∈NAk.
Lemma 4: In the setting of the last example, a subset A of E is an E-envelope of a given decreasing sequence {Ak}k∈N⊆P(E) if, and only if, A contains ⋂k∈NAk.
The necessity is clear: if there exists a decreasing sequence {Ck}k∈N⊆E∪{E} such that (Env) is satisfied then Ak⊆Ck and therefore ⋂k∈NAk⊆⋂k∈NCk⊆A.
To see the sufficiency, for every k∈N, let {Bnk}n∈N⊆E∪{E} be a decreasing sequence such that Ak=⋂n∈NBnk (such a sequence exists because of the assumption in Example 2). Then
Ck:=Bk1∩Bk2∩⋯∩Bkk,k∈N
defines a decreasing sequence of elements in E∪{E} with Ak⊆Ak⊆Ck and ⋂k∈NAk=⋂k∈NCk. It follows that the set A envelops the sequence {Ak}k∈N, if A contains the countable intersection ⋂k∈NAk, for then the decreasing sequence {Ck}k∈N⊆E∪{E} satisfies the requirements of (1).
The next lemma lists some properties of envelopes which we will be using frequently.
Lemma 5:
If A is an envelope of a given decreasing sequence {An}n∈N⊆P(E), then every subset of E that contains A is also an envelope of {An}n∈N.
Two decreasing sequences of subsets of E that possess a common subsequence, admit the exact same envelopes.
The collection of envelopes of a given decreasing sequence of subsets of E, is closed under countable intersections.
Parts 1. and 2. are trivial, and follow immediately from the definition.
For part 3., let {Ak}k∈N be a sequence of envelopes of a given decreasing sequence {An}n∈N⊆P(E). For each k∈N, let {Bnk}n∈N⊆E∪{E} be a decreasing sequence, such that An⊆Bnk for all n∈N and ⋂n∈NBnk⊆Ak. Then
Cn:=Bn1∩Bn2∩⋯∩Bnn,n∈N
defines a decreasing sequence of elements in E∪{E} that satisfies An⊆Cn and ⋂n∈NCn=⋂(k,n)∈N2Bnk⊆⋂k∈NAk. It follows that ⋂k∈NAk is an E-envelope of {An}n∈N.
Definition 6: Let E be a nonempty set. A collection C of subsets of E is called a capacitance, if
whenever A∈C and A⊆B, then B∈C, and
whenever {An}n∈N is an increasing sequence of subsets of E such that ⋃n∈NAn∈C, there is an integer m such that Am∈C.
Intuitively, a capacitance is a collection of “big” sets: the power set P(E) is a capacitance, and so are the collections of nonempty and of uncountable subsets of E. The notion of pre-capacity, defined next, gives a more useful example.
Definition 7: A function I:P(E)→R is called a pre-capacity, if it is
monotone increasing, i.e., I(A)≤I(B) holds for every A⊆B, and
ascending, i.e., for every increasing sequence {An}n∈N we have
I(n∈N⋃An)=n∈NsupI(An).
If I:P(E)→R is a pre-capacity, then for every given real number t the collection
C={A∈P(E):I(A)>t}
is a capacitance. Conversely, given a capacitance C, one can associate to it a pre-capacity by defining
I(A)={10if A∈Cif A∈/C.
This then leads to the identification C={A∈P(E):I(A)>0}.
Henceforth assume that there is an underlying paved space (E,E) and a capacitance C of subsets of E.
Definition 8: A sequence f={fn}n∈N of mappings fn:(P(E))n→P(E) is called a Sierpiński’s C-scraper, or simply a C-scraper, if
fn(B1,B2,…,Bn)⊆Bn for all n∈N and for all sets B1,…,Bn∈P(E), and
whenever Bn∈C, then fn(B1,B2,…,Bn)∈C.
The first property expresses the intuitive notion that fn(B1,B2,…,Bn) “scrapes” Bn and the second property ensures that “the scraping does not remove too big a chunk” from Bn. In French, scraper is called rabotage which can be translated also as planing. A simple example of a scraper is the identity scraper: fn(B1,…Bn)=Bn for all n∈N and for all sets B1,…,Bn∈P(E).
Definition 9: Given a C-scraper f={fn}n∈N, a (necessarily decreasing) sequence {Bn}n∈N of subsets of E will be called f-scraped, if for all n∈N we have
Bn+1⊆fn(B1,B2,…,Bn)andBn∈C.
Definition 10: For any B∈P(E) and C-scraper f={fn}n∈N, the sequence {Pn}n∈N⊆C defined by P1:=B and Pn+1:=fn(P1,…,Pn) for all n∈N is f-scraped. It is called the f-scraped orbit of B.
(Dellacherie, 1972) calls {Pn}n∈N above as the f-scraped sequence deduced from B.
Definition 11: A C-scraper f={fn}n∈N is called compatible with a given set A∈P(E), if A envelopes every f-scraped sequence {Bn}n∈N with B1⊆A.
A set A∈P(E) is smooth for the capacitance C, if there exists a C-scraper compatible with it.
If A∈/C, then no subset of A can be in C either by the definition of a capacitance. This implies that there does not exist a f-scraped sequence {Bn}n∈N satisfying B1⊂A. On the other hand, with the identity scraper f, A envelopes every f-scraped sequence {Bn}n∈N because B1 must equal A. Thus A is smooth.
If A∈C and is smooth, then there always exists a sequence {Bn}n∈N of which A is an envelope. Indeed, by assumption there exists f, a scraper, compatible with A. Let {Bn}n∈N be the f-scraped orbit of A. Then since A is smooth, A envelops {Bn}n∈N.
Theorem 1 [Sierpiński]: Let (E,E) be a paved space, and C a capacitance. The collection of subsets of E which are smooth for the capacitance, is closed under countable increasing unions and under countable intersections.
We will come back to its proof later. Let's prove some of its useful consequences first.
Theorem 2: Let (E,E) be a paved space, and C a capacitance. The elements of the mosaic E generated by E are smooth.
An easy consequence of Theorem 1, Lemma 2 and the fact that every element of E is smooth because they are compatible with the identity scraper. □
This theorem is very useful in proving some important results. We will discuss two of them. The first one is the metric space version of Choquet's capacitability theorem. The proof of it will be very similar to the general Choquet's capacitability theorem, but we will need Sion's theorem for the general version, which will be the second result.
Definition 12: Let E be a compact metric space, endowed with the paving E=K(E) of its compact sets. I is called a metric capacity on (E,E) if it a pre-capacity that "descends on compacts" in the sense that for every decreasing sequence {Kn}n∈N⊆K(E) it satisfies
I(n∈N⋂Kn)=n∈NinfI(Kn).
Theorem 3 [Metric space version of Choquet's capacitability theorem]: For every Borel subset B of a compact metric space E, and any metric capacity I:P(E)→R, we have
I(B)=K∈K(E)K⊆BsupI(K).
Fix an arbitrary B∈B(E). If I(B)=−∞ then I(K)=−∞ for any K⊆B, and we have our desired equality trivially. Otherwise, we need to show that whenever I(B)>t holds for some given real number t, there exists a compact set K⊆B such that I(K)≥t. Recall that
C={A∈P(E):I(A)>t}
is a capacitance. Also recall that a subset A of E is an E-envelope of a decreasing sequence {An}n∈N⊆P(E) if and only if A contains ⋂n∈NAn.
Lemma 1 gives that the mosaic E generated by E=K(E) coincides with the Borel σ-algebra B(E). Hence by Theorem 2 every Borel set is smooth. Thus, there exists a C-scraper f={fn}n∈N compatible with the set B.
Consider the f-scraped orbit {Pn}n∈N⊆C of B. By construction B is an envelope of {Pn}n∈N, and hence it contains K:=⋂n∈NPn.K is closed and hence also compact on account of being a subset of a compact set E; similarly for Pn for all n∈N. But since {Pn}n∈N⊆C we have I(Pn)>t for all n∈N. Now use the descending on compacts property of I to get
I(K)=I(n∈N⋂Pn)=n∈NinfI(Pn)≥t.
Theorem 4 [Sion's theorem]: Let (E,E) be a paved space, and C a capacitance. For every element B of C∩E, there exists a decreasing sequence {Kn}n∈N⊆C∩E such that ⋂n∈NKn⊆B.
Theorem 2 implies that the set B is smooth, and thus there exists a C-scraper f={fn}n∈N compatible with it. Let {Pn}n∈N⊆C be the f-scraped orbit of B. Then B is an envelope of {Pn}n∈N, so there exists a decreasing sequence {Bn}n∈N of subsets of E∪{E} such that ⋂n∈NBn⊆P(B) and Pn⊆Bn for all n∈N. Notice Bn∈C.
If the sets Bn belong to the paving E from a certain index m onward, we take Kn:=Bm+n,n∈N as our sequence. Otherwise if Bn=E holds for all integers n, the set B=E is the union of an increasing sequence of sets in E because B∈E and by Lemma 2. Therefore, the fact that B∈C implies B contains a set K∈C∩E; it then suffices to take Kn=K for all integers n.
We now come back to the proof of Theorem 1. But first we will need the following clever operation on scrapers, and a couple of results.
Definition 13: Consider a sequence {fk,k∈N}={{fnk}n∈N,k∈N} of scrapers, and a bijection
N2∋(p,q)↦β(p,q)=p⋆q∈N,
which is strictly increasing in each of its arguments. For every integer n∈N and sets P1,P2,…,Pn, if n=p⋆q, let
fn(P1,P2,…,Pn):=fqp(Pp⋆1,Pp⋆2,…,Pp⋆q).
It is easy to see that this defines a new scraper f={fn}n∈N, called the mixing of the scrapers {fk,k∈N} via the bijection β.
Theorem 5: Let {fk,k∈N} be a sequence of scrapers, and denote by f its mixing by a bijection β. In order for a subset A of E to be compatible with f, it suffices that it be compatible with one of the scrapers fk,k∈N.
Let A∈P(E) be compatible with fk for some arbitrary but fixed k. Consider also a sequence of sets {Pn}n∈N, which is f-scraped and whose first term P1 is contained in A. We need to show that the set A envelops {Pn}n∈N.
To do this, we exploit Lemma 5 (ii) and construct a decreasing sequence {Qn}n∈N⊆P(E) which is a subsequence of {Pn}n∈N and show that A envelops {Qn}n∈N. This will then imply A envelops {Pn}n∈N. To this end, define
Qn:=Pk⋆n∀n∈N.
Because Q1=Pk⋆1⊆P1⋆1=P1⊆A and A is compatible with fk, to show that A envelops {Qn}n∈N it suffices to show {Qn}n∈N is fk-scraped.
Now, Qn=Pk⋆n∈C for all n∈N, so all that remains to be shown is that Qn+1⊆fnk(Q1,Q2,…,Qn) holds for all n∈N. Because {Pn}n∈N is f-scraped we have
An immediate corollary of this theorem: If {An}n∈N is a sequence of smooth subsets of E, there exists a scraper f which is compatible with all the sets An,n∈N.
Theorem 1 [Sierpiński]: Let (E,E) be a paved space, and C a capacitance. The collection of subsets of E which are smooth for the capacitance, is closed under countable increasing unions and under countable intersections.
Closure under countable intersections:
Suppose {Ak}k∈N is a sequence of smooth sets, A=⋂k∈NAk, and f is a C-scraper compatible with all of the sets Ak,k∈N. If {Pn}n∈N is an f-scraped sequence of sets such that P1⊆A, then P1⊆Ak for all k∈N. Our construction then implies Ak is an E-envelope of {Pn}n∈N for all k∈N. Lemma 5 (iii) now implies A is also an E-envelope {Pn}n∈N, showing that A is compatible with f, and hence smooth.
Closure under countable increasing unions:
Suppose {Ak}k∈N is an increasing sequence of smooth sets, A=⋃k∈NAk, and f is a C-scraper compatible with all of the sets Ak,k∈N. The scraper f doesn't work for this case and so we create a new one. For any n∈N and sets P1,P2,…,Pn we define
where p is the smallest integer such that Ap∩P1∈C. Such an integer does exist from part 2. of the definition of capacitance with the sequence being Ak∩P1↑A∩P1∈C. It is easy to see that Φ={φn}n∈N is a C-scraper. It is sufficient to show that Φ is compatible with A to finish our proof.
Let {Pn}n∈N be a Φ-scraped sequence of sets such that P1⊆A. By definition P1∈C and A∩P1=P1, and so from our construction φn(P1,P2,…,Pn)=fn(Ap∩P1,P2,…,Pn). All elements of the sequence Ap∩P1,P2,…,Pn,… are in C and for all n∈N
Pn+1⊆φn(P1,P2,…,Pn)=fn(Ap∩P1,P2,…,Pn),
and thus it follows that this sequence is f-scraped. Now since Ap is compatible with f, Ap is an envelope of this sequence, and also of {Pn}n∈N by Lemma 5 (ii). It follows that A is an envelope of {Pn}n∈N by Lemma 5 (i) because Ap⊆A.
Definition 14: A mapping I:P(E)→R is called a Choquet E-capacity, or simply E-capacity, if it is
monotone increasing, i.e., I(A)≤I(B) holds for every A⊆B,
ascending, i.e., for every increasing sequence {An}n∈N⊆P(E) we have
I(n∈N⋃An)=n∈NsupI(An),
descending on pavings, i.e., for every decreasing sequence {En}n∈N⊆E we have
I(n∈N⋂En)=n∈NinfI(En).
Definition 15: A set A∈P(E) is called I-capacitable if
I(A)=K∈EδK⊆AsupI(K).
Examples:
Consider a paved space (E,E), where E is a compact paving, and define I(A)=0 if A=∅, I(A)=1 if A=∅. Then I is a Choquet E-capacity. The property 3. in the definition of a capacity reflects now the assumption that the paving E is compact.
Consider a probability space (Ω,F,P), then the outer measure
P∗(A):=B∈FA⊆BinfP(B),
which is well-known to be a monotone, countably sub-additive set function taking the value 0 on the empty set ∅, is a Choquet F-capacity. Proof of this is a standard exercise in measure theory, albeit with a different terminology of continuity from below.
Consider a locally compact, separable metric space K, and the paving K of its compact subsets. If π denotes the projection of K×Ω onto Ω and
I(A):=P∗(π(A)) for all A∈P(K×Ω),
then I is a Choquet (K⊗pF)-capacity: Monotone increasing property is trivial. To see the ascending property, note that increasing sequence {An}n∈N⊆P(K×Ω) we have
x∈π(n⋃An)⟺∃y∈K such that (y,x)∈n⋃An⟺x∈n⋃π(An),
Finally, for the descending on pavings property, compactness implies that for all ω∈Ω, (K⊗pF)(ω) is a compact paving, and thus we can apply Lemma 3 to get
π(n∈N⋂En)=n∈N⋂π(En)
for every decreasing sequence {En}n∈N⊆K⊗pF. Therefore,
I(n∈N⋂En)=P∗(n∈N⋂π(En)).
Since En is a finite union of rectangles, π(En)∈F and therefore,
P∗(n∈N⋂π(En))=ninfP∗(π(En))=ninfI(En).
Theorem 6 [Choquet's capacitability theorem]: Consider a paved space (E,E), and let I:P(E)→R be a Choquet E-capacity. Then every set A∈E is I-capacitable.
Fix an arbitrary set A∈E. If I(A)=−∞ then I(K)=−∞ for any K⊆A, and we have our desired equality trivially. Otherwise, we need to show that whenever I(A)>t holds for some given real number t, there exists a set K∈Eδ with K⊆A and I(K)≥t.
Recall that
C={B∈P(E):I(B)>t}
is a capacitance. Then A∈C, and from Sion's Theorem (Theorem 4) there exists a decreasing sequence {Kn}n∈N of elements in E∩C such that ⋂n∈NKn⊆A. But then
I(n∈N⋂Kn)=n∈NinfI(Kn)≥t,
and thus we can take K=⋂n∈NKn.
We are now ready to prove some major results in measure theory.
Theorem 7 [Measurable Projection]: Let (Ω,F,P) be a complete probability space, let (K,B(K)) be a locally compact separable metric space endowed with the collection of its Borel sets, and denote by π the projection of K×Ω onto Ω. Then, for every B∈B(K)⊗F, the projection π(B)∈F.
We start by noticing B(K)⊗pF=B(K)⊗F. This follows from the fact that if A∈B(K)⊗pF, then A=⋃i=1nUi×Vi for some Ui∈B(K) and Vi∈F, and thus it can be shown Ac∈B(K)⊗pF, and thus Lemma 1 gives
B(K)⊗pF=σ(B(K)⊗pF)=B(K)⊗F.
Consider the paving K on K consisting of all compact subsets of K, and introduce the (K⊗pF)-capacity I(A)=P∗(π(A)), A∈B(K×Ω) we saw before. Now note
B(K)⊗pF=K⊗pF,
the mosaic generated by the paving K⊗pF. Showing ⊇ is trivial; for ⊆ let A∈B(K)⊗pF and show A∈K⊗pF by recalling B(K)=K.
Choquet's capacitability theorem (Theorem 6) thus guarantees that every set in B(K)⊗F is I-capacitable. In particular, for every integer n∈N, there exists a set Cn∈(K⊗pF)δ contained in B and such that I(Cn)≤I(B)≤I(Cn)+(1/n). Because Cn∈(K⊗pF)δ, Cn is a countable intersection Cn=⋂m∈NGnm, where each Gnm is a finite union of sets of the form U×V with U∈K,V∈F. Letting Hnm=⋂i=1mGni, we see that Hn1⊇Hn2⊇⋯ and Cn=⋂m∈NHnm, where now Hnm is also a finite union of sets of the form U×V with U∈K,V∈F.
The form of Hnm immediately implies π(Hnm)∈F for all (m,n)∈N2. Lemma 3 implies
π(Cn)=π(m∈N⋂Hnm)=m∈N⋂π(Hnm)∈F,∀n∈N,
which further implies
π(n∈N⋃Cn)=n∈N⋃π(Cn)∈F.
On the other hand, Cn⊆B for all n∈N and thus π(⋃n∈NCn)⊆B. But (2) implies that the difference of these two sets is a P-null set, and the completeness of the probability space gives the desired conclusion π(B)∈F.
It is not easy to construct an example of a Borel set in the product space whose projection is not Borel. It requires study of analytic sets. Check out Corollary 8.2.17 in (Cohn, 2013) for more details.
The next theorem is a very visual theorem, especially for the case where K and Ω are R.
Definition 16: For a map f:X→Y, we define its graph[[f]] to be the product set
[[f]]:={(y,x)∈Y×X:f(x)=y}.
We call a set G∈B(K)⊗F a measurable graph, if for every ω∈Ω its section G(ω)={y∈K:(y,ω)∈G} contains at most one point.
Theorem 8 [Measurable Graph]: A subset G of K×Ω is a measurable graph, if and only if, there exists a set Ξ∈F and a measurable mapping g:Ξ→K, such that G=[[g]].
Sufficiency: If Ξ and g are as stated, the set G={(y,ω)∈K×Ξ:y=g(ω)} equals the pre-image φ−1(Δ) of the diagonal Δ={(y,y)∈K×K:y∈K} under the mapping
K×Ξ∋(y,ω)↦φ(y,ω):=(y,g(ω))∈K×K.
This mapping φ is (B(K)⊗F)-measurable because of the facts that B(K×K)=B(K)⊗B(K), on account of K being separable, and that g is measurable. Since Δ is a closed set (because K is Hausdorff), Δ∈B(K×K) and thus G is a measurable graph.
Necessity: Suppose that G is a measurable graph, and let Ξ:=π(G). Then Ξ∈F by the Measurable Projection theorem. For every ω∈Ξ, define g(ω) to be the unique element of the set G(ω). We want to show g:Ξ→K is measurable. Indeed for any H∈B(K), it is easy to see that
g−1(H)=π(G∩(H×Ω))∈F,
where the inclusion follows from Measurable Projection theorem.
Now let K=[0,∞), the case important in stochastic processes.
Definition 17: Let A⊆[0,∞)×Ω. The debut of A is the nonnegative function DA:Ω→[0,∞] defined as
DA(ω)=inf{t∈[0,∞):(t,ω)∈A}.
Recall the convention inf∅=∞. It is easy to see that {DA<∞}=π(A).
Theorem 9 [Measurable Debut]: Let (Ω,F,P) be a complete probability space, and consider a measurable set A∈B([0,∞))⊗F. Then the debut DA of this set is a random variable.
For any given real number t>0, the set DA−1([0,t))={ω∈Ω:DA(ω)<t} is the projection onto Ω of the measurable subset A∩([0,t)×Ω)∈B([0,∞))⊗F. To see this, note
ω∈DA−1([0,t))⟺0≤DA(ω)<t⟺∃s∈[0,t) such that (s,ω)∈A⟺ω∈π(A∩([0,t)×Ω)).
Measurable Projection theorem shows that this set is in F.
Let (Ω,F,P) be a complete probability space, and consider a set A⊆[0,∞)×Ω. Then for every ω∈π(A), there exists a t∈[0,∞) such that (t,ω)∈A. In other words, we can define a mapping Z:π(A)→[0,∞). It is convenient to extend Z to the whole of Ω by setting Z=∞ on Ω∖π(A). When is it possible to choose Z such that it is measurable? The measurable section theorem (also known as measurable selection theorem) says that it is possible to define Z to be measurable if A∈B([0,∞))⊗F.
Recall that for Z:Ω→[0,∞] its graph is the product set
[[Z]]={(t,ω)∈[0,∞)×Ω:Z(ω)=t}.
The condition (Z(ω),ω)∈A whenever Z<∞ can then be expressed by saying [[Z]]⊆A.
Other than stochastic processes, measurable section theorems have applications in optimal control and game theory.
Theorem 10 [Measurable Section]: Let (Ω,F,P) be a complete probability space, and consider a measurable set A∈B([0,∞))⊗F. Then there exists a random variable Z:Ω→[0,∞] with [[Z]]⊆A and {Z<∞}=π(A).
We divide our analysis into two parts. We shall first show that for every ε>0 there exists a random variable Zε:Ω→[0,∞] with [[Zε]]⊆A and
P(π(A))≤P(Zε<∞)+ε.
To this end, recall that if K=[0,∞) and K is the paving of all compact subsets of K, then we have the K⊗pF-capacity I(A)=P∗(π(A)),A∈P(K×Ω). Therefore, every A∈B(K)⊗F is I-capacitable, and fixing A, for every ε>0, there exists a set
Cε∈(K⊗pF)δ such that Cε⊆A,I(Cε)≤I(A)≤I(Cε)+ε.
Let
Zε:=DCε
be the debut of this set Cε. Then the Measurable Debut theorem (Theorem 9) implies Zε is a random variable.
For every ω∈Ω, the section Cε(ω) is a compact subset of K (use the facts that compact ⟺ closed and bounded here, and Cε∈(K⊗pF)δ). Notice that if (t,ω)∈[[DCε]] then DCε(ω)=t which is same as saying t=inf{s∈[0,∞):(s,ω)∈Cε}, but since Cε(ω) is closed this implies (t,ω)∈Cε. Therefore, [[Zε]]⊆Cε, showing the first requirement.
The second requirement P(π(A))≤P(Zε<∞)+ε is true because π(A)∈F by the Measurable Projection theorem (Theorem 7) and {Zε<∞}=π(Cε)∈F, and now use (2).
Let us now construct a random variable Z to satisfy the properties claimed in the theorem. Define
A1:=A.
Then from above there exists a random variable
Z1:Ω→[0,∞] with [[Z1]]⊆A1 and P(Z1<∞)≥21P(π(A1)),
by taking ε=P(Z1<∞)>0; if it happens that P(Z1<∞)=0, then P(π(A1))=0 and the inequality is still true. Now define
A2:=A1∖([0,∞)×{Z1<∞}),
and, reasoning as before, construct a random variable
Z2:Ω→[0,∞] with [[Z2]]⊆A2 and P(Z2<∞)≥21P(π(A2))=21(P(π(A1))−P(Z1<∞)).
Continuing this way, we obtain a sequence {Zn}n∈N such that [[Zn]]⊆A, the projections π([[Zn]]) are disjoint, and we have k=1∑nP(Zk<∞)≥(21+221+⋯+2n1)P(π(A1))=(1−2−n)P(π(A)),∀n∈N. Therefore, the random variable Z defined as
Z(ω)={Zk(ω)∞if ω∈{Zk<∞} for some k∈Notherwise,
satisfies [[Z]]⊆A, thus also {Z<∞}⊆π(A).
On the other hand, letting n→∞ in (3), we see that {Z<∞} and π(A) have the same probability. Therefore, the completeness of the probability space implies these two sets can be made equal.