I have been attending a reading course on stochastic analysis led by Professor Ioannis Karatzas, where the students take turn in presenting a topic of their choice. I recently presented on Choquet's theory of capacities and its applications to measure theory and in the general theory of processes. This blog post is based on this presentation. I have freely copied[1] from many sources, but my primary reference are the (unfortunately unpublished) lecture notes by Prof. Karatzas.
Kolmogorov laid the modern axiomatic foundations of probability theory with the German monograph Grundbegriffe der Wahrscheinlichkeitsrechnung which appeared in Ergebnisse Der Mathematik in 1933. This was a period of intense discussions on the foundations of probability, and a majority of probabilists at the time considered measure theory not only a waste of time, but also an offense to "probabilistic intuition" (Meyer, 2009). But by 1950, with the work of Doob in particular, these discussions of foundations had been settled.
Continuous-time processes, on the other hand, were difficult to tame even with measure theory: if a particle is subject to random evolution, to show that its trajectory is continuous, or bounded, requires that all time values be considered, whereas classical measure theory can only handle a countable infinity of time values. Thus, not only does probability depend on measure theory, but it also requires more of measure theory than the rest of analysis (Meyer, 2009).
The missing pieces of the puzzle, which will be the highlight of this and the next blog post, are the debut, section and projection theorems. These theorems are also indispensable in many applications, for instance in dynamic programming and stochastic control (Karoui and Tan, 2013).
To get a taste of these theorems, let's recall a famous error made by Lebesgue in the paper Sur les fonctions représentables analytiquement published in 1905. Consider the measurable space \((\mathbb R^2, \mathcal{B}(\mathbb R^2))\) and the projection map \(\pi\) given by \(\mathbb R^2 \ni (x,y) \mapsto \pi(x,y) = y \in \mathbb R.\) It is easy to see that for any open set \(O\) in \(\mathbb R^2\), the set \(\pi(O)\) is also open in \(\mathbb R\): Recall that the standard topology on \(\mathbb R^2\) is same as the product topology on \(\mathbb R^2.\) By the definition of the product topology on \(\mathbb R^2\), an open set \(O\) in \(\mathbb R^2\) is of the form \(O = \bigcup_{i \in I} \bigcap_{j \in J_i} U_{ij} \times V_{ij}\) for open \(U_{ij}, V_{ij}\) in \(\mathbb R\), \(I\) arbitrary and \(J_i\) finite. A simple argument gives \(\pi(O) = \bigcup_{i \in I} \bigcap_{j \in J_i} U_{ij}\) which is open in \(\mathbb R.\) In fact, more generally, projection from any product space (with product topology) is an open map. Now it seems reasonable to expect that for any Borel set \(B \in \mathcal{B}(\mathbb R^2)\) its projection is also a Borel set in \(\mathcal{B}(\mathbb R)\), and Lebesgue assumed this in his paper. But, in fact, this is FALSE! The error was spotted in around 1917 by Mikhail Suslin, who realised that the projection map need not be Borel, and this lead to his investigation of analytic sets and to begin the study of what is now known as descriptive set theory (Almost Sure blog).
The problem is projection doesn't commute with countable decreasing intersection. For example, consider the decreasing sequence of sets \(S_n = (0, 1/n) \times \mathbb R.\) Then \(\pi(S_n) = \mathbb R\) for all \(n\), giving \(\bigcap_{n \in \mathbb N} \pi(S_n) = \mathbb R\), but \(\bigcap_{n \in \mathbb N} S_n = \varnothing\), giving \(\pi \left( \bigcap_{n \in \mathbb N} S_n \right) = \varnothing.\) The measurable projection theorem stated next will be one of the highlights of this post.
Why is proving such results difficult? As mentioned above it's because projection doesn't behave nicely with intersections. Nevertheless, let us try to see how one might try to prove the above theorem. A standard approach in measure theory is to construct a collection like
The next few sections will be very abstract and it is easy to lose sight of our goal. Some people enjoy this mental gymnastics, but even if you find this dry, the reward at the end will be worth the initial struggle. We start with Choquet's theory of capacities. The highlight of this part will be Choquet's capacitability theorem. To prove this major result we will need to define a lot of new terminology and prove some major results like Sierpiński's theorem and Sion's theorem. Armed with Choquet's capacitability theorem we will prove Measurable Section theorem which in turn will form the backbone of various other results in measure theory. These results in measure theory will then help us prove results in general theory of processes, but we will discuss this part in the next blog post.
The concept of paving generalizes the concept of algebra. As an example, if \(E\) is a topological space, then the collection of closed subsets of \(E\) forms a paving. As another example, if \(E\) is a Hausdorff space, then \((E, \mathscr K(E))\) is a paved space, where as before \(\mathscr{K}(E)\) denotes the collection of all compact sets in \(E.\)
It is easy to check that an arbitrary intersection of pavings is also a paving, and that the collection \(\mathfrak{P}(E)\) of all subsets of \(E\) is a paving. Thus for any collection \(\mathcal{A}\) of subsets of \(E\), we can define the notion of paving generated by \(\mathcal{A}\) as the smallest paving of subsets of \(E\) that contains \(\mathcal{A}\) by simply defining it to be the intersection of all pavings of \(E\) containing \(\mathcal{A}.\)
Using the fact that \((A_1 \times B_1) \cap (A_2 \times B_2) = (A_1 \cap A_2) \times (B_1 \cap B_2)\) we see that \(\mathcal{R}\) is stable under finite intersections. Therefore, any element of \(\mathcal{E} \otimes_p \mathcal{F}\) is of the form \(\bigcup_{i=1}^n A_i \times B_i\), where \(A_i \in \mathcal{E}\) and \(B_i \in \mathcal{F}\) for all \(i=1,\ldots,n.\)
The concept of mosaic generalizes the concept of \(\sigma\)-algebra. Just like paving, it is easy to define the notion of mosaic generated by a collection. They will always occur in the context of a paving \(\mathcal{E}\) on \(E.\) We denote by \(\widehat{\mathcal{E}}\) the mosaic generated by \(\mathcal{E}.\) \(\mathcal{E}\ \widehat{\otimes}_p \mathcal{F}\) will denote the mosaic generated by the product paving \(\mathcal{E} \otimes_p \mathcal{F}.\)
Just like with monotone class arguments in measure theory, the paving \(\mathcal E\) will be a simple collection of subsets of \(E\) for which it is easy to prove a property \(P.\) From here we will show that the elements of the mosaic \(\widehat{\mathcal E}\) also satisfy \(P.\) Often \(\widehat{\mathcal E}\) will be a \(\sigma\)-algebra.
Henceforth the notation \(\mathcal{E}\) will be used for a paving on \(E.\)
Just like the results connecting algebra and \(\sigma\)-algebra, we have results connecting pavings and mosaics.
As an example, if \(E\) is a separable, locally compact metric space, and \(\mathcal{E} = \mathscr{K}(E)\) is the collection of compact subsets of \(E\), then this property holds. In fact, \(E\) can be a second-countable locally compact Hausdorff space. Then it is \(\sigma\)-compact and every compact subset is closed. This implies that the open set given by the complement of a compact set is a countable union of compact sets. Combine it with the fact that an open subset of a locally compact and \(\sigma\)-compact is itself \(\sigma\)-compact to get that if \(A \in \mathcal E\), then \(A^\mathsf{c} \in \widehat{\mathcal E}.\)
Since a mosaic is a monotone class and since \(\mathcal{E} \subseteq \widehat{\mathcal{E}}\), we have \(\mathcal M(\mathcal{E}) \subseteq \widehat{\mathcal{E}}.\) For the other side, we will be done if we show that \(\mathcal M(\mathcal{E})\) is a mosaic, since \(\mathcal{E} \subseteq \mathcal M(\mathcal{E}).\)
The first property to note is that a monotone paving is a mosaic. To see this, let \(\mathcal{R}\) be a monotone paving and \(A_1, A_2, \ldots \in \mathcal{R}.\) Then since \(\mathcal{R}\) is a paving, \(B_n = \bigcup_{i=1}^n A_i \in \mathcal{R}\) for all \({n \in \mathbb N}.\) But \(\{B_n\}_{n \in \mathbb N}\) is an increasing sequence and \(\mathcal{R}\) is a monotone class, and therefore their union \(\bigcup_{n \in \mathbb N} A_n = \bigcup_{n \in \mathbb N} B_n \in \mathcal{R}\), and hence \(\mathcal{R}\) is closed under countable unions. Similarly for countable intersections.
Therefore, we will be done if we show that \(\mathcal M(\mathcal{E})\) is a paving. To this end, for any \(B \subseteq E\), let
Recall a standard result from topology: a topological space \(X\) is compact if and only if for every collection \(\mathcal C\) of closed subsets of \(X\) having the finite intersection property (i.e., every finite sub-collection of \(\mathcal C\) has nonempty intersection), the intersection \(\bigcap_{C \in \mathcal C} C\) is nonempty. For our use case, we consider countable sub-collections.
It is easy to verify that a compact paving is a compact collection. As an example, if \(E\) is a separable metric space, the collection, \(\mathscr{K}(E)\), of all compact subsets of \(E\) is a compact paving. This follows immediately from the finite intersection characterization of a compact topological space stated above.
We define
The next lemma tells us when it acceptable to commute projection with countable intersections.
\(\pi\left(\bigcap_{n \in \mathbb N} H_n\right) \subseteq \bigcap_{n \in \mathbb N} \pi(H_n)\) is easy to see because if \(x \in \pi\left(\bigcap_{n \in \mathbb N} H_n\right)\) then there exists \((y,x) \in K \times E\) such that \((y,x) \in \bigcap_{n \in \mathbb N} H_n\) which implies \(x \in \pi(H_n)\) for all \(n \in \mathbb N.\)
For the other side, let \(x \in \bigcap_{n \in \mathbb N} \pi(H_n).\) Then the sequence \(\{H_n(x)\}_{n \in \mathbb N}\) is decreasing whose elements are nonempty and they are in \(\mathcal{H}(x)_\delta.\) But since \(\mathcal{H}(x)_\delta\) is a compact paving by assumption, \(\bigcap_n H_n(x)\) must be nonempty, implying \(x \in \pi\left(\bigcap_{n \in \mathbb N} H_n\right).\)
Examples:
Let \(E\) be a separable metric space, and \(\mathcal{E}\) be the paving consisting of all closed subsets of \(E.\) Then a subset \(A\) of \(E\) is an \(\mathcal{E}\)-envelope of a given decreasing sequence \(\{A_k\} _ {k \in \mathbb N} \subseteq \mathfrak{P}(E)\) if, and only if, \(A\) contains \(\bigcap_{k \in \mathbb N} \overline{A}_k\), the intersection of the closures of the sets \(A_k, k \in \mathbb N\) in the sequence. Indeed, that \(A\) is an \(\mathcal{E}\)-envelope of \(\{A_k\} _ {k \in \mathbb N}\) (with the existence of a sequence \(\{C_k\}_{k \in \mathbb N}\) as in Definition 5 above) implies \(A\) contains \(\bigcap_{k \in \mathbb N} \overline{A}_k\) follows from the observation that \(\overline{A}_k \subseteq C_k\) for each \(k \in \mathbb N\), while the other side is easy to see if we let \(C_k = \overline{A}_k\) for each \(k \in \mathbb N.\)
An abstract version of the example above: Let \((E, \mathcal{E})\) be a paved space; for every subset \(A\) of \(E\), introduce the collection of sets \(\mathcal{A} := \{B \in \mathcal{E} \cup \{E\} \,:\, A \subseteq B\}\) and assume that the intersection \(\overline{A} := \bigcap_{B \in \mathcal{A}} B\), called the adherent of \(A\) in the paving \(\mathcal{E}\), belongs to \(\mathcal{E}_\delta \cup \{E\}\), i.e., \(\overline{A}\) is a countable intersection of sets in \(\mathcal{E} \cup \{E\}.\) We claim, and show next, that \(A\) is an \(\mathcal{E}\)-envelope of a given decreasing sequence \(\{A_k\}_{k \in \mathbb N} \subseteq \mathfrak{P}(E)\) if, and only if, \(A\) contains \(\bigcap_{k \in \mathbb N} \overline{A}_k.\)
The necessity is clear: if there exists a decreasing sequence \(\{C_k\}_{k \in \mathbb N} \subseteq \mathcal{E} \cup \{E\}\) such that \((\text{Env})\) is satisfied then \(\overline{A}_k \subseteq C_k\) and therefore \(\bigcap_{k \in \mathbb N} \overline{A}_k \subseteq \bigcap_{k \in \mathbb N} C_k \subseteq A.\)
To see the sufficiency, for every \(k \in \mathbb N\), let \(\left\{B_n^k\right\}_{n \in \mathbb N} \subseteq \mathcal{E} \cup \{E\}\) be a decreasing sequence such that \(\overline{A}_k = \bigcap_{n \in \mathbb N} B_n^k\) (such a sequence exists because of the assumption in Example 2). Then
The next lemma lists some properties of envelopes which we will be using frequently.
Lemma 5:
If \(A\) is an envelope of a given decreasing sequence \(\{A_n\} _ {n \in \mathbb N} \subseteq \mathfrak{P}(E)\), then every subset of \(E\) that contains \(A\) is also an envelope of \(\{A_n\} _ {n \in \mathbb N}.\)
Two decreasing sequences of subsets of \(E\) that possess a common subsequence, admit the exact same envelopes.
The collection of envelopes of a given decreasing sequence of subsets of \(E\), is closed under countable intersections.
Parts 1. and 2. are trivial, and follow immediately from the definition.
For part 3., let \(\{A^k\} _ {k \in \mathbb N}\) be a sequence of envelopes of a given decreasing sequence \(\{A_n\} _ {n \in \mathbb N} \subseteq \mathfrak{P}(E).\) For each \(k \in \mathbb N\), let \(\{B_n^k\} _ {n \in \mathbb N} \subseteq \mathcal{E} \cup \{E\}\) be a decreasing sequence, such that \(A_n \subseteq B_n^k\) for all \(n \in \mathbb N\) and \(\bigcap_{n \in \mathbb N} B_n^k \subseteq A^k.\) Then
Definition 6: Let \(E\) be a nonempty set. A collection \(\mathcal C\) of subsets of \(E\) is called a capacitance, if
whenever \(A \in \mathcal{C}\) and \(A \subseteq B\), then \(B \in \mathcal{C}\), and
whenever \(\{A_n\} _ {n \in \mathbb N}\) is an increasing sequence of subsets of \(E\) such that \(\bigcup_{n \in \mathbb N} A_n \in \mathcal{C}\), there is an integer \(m\) such that \(A_m \in \mathcal{C}.\)
Intuitively, a capacitance is a collection of “big” sets: the power set \(\mathfrak{P}(E)\) is a capacitance, and so are the collections of nonempty and of uncountable subsets of \(E.\) The notion of pre-capacity, defined next, gives a more useful example.
Definition 7: A function \(I \colon \mathfrak{P}(E) \to \overline{\mathbb R}\) is called a pre-capacity, if it is
monotone increasing, i.e., \(I(A) \le I(B)\) holds for every \(A \subseteq B\), and
ascending, i.e., for every increasing sequence \(\{A_n\} _ {n \in \mathbb N}\) we have
If \(I \colon \mathfrak{P}(E) \to \overline{\mathbb R}\) is a pre-capacity, then for every given real number \(t\) the collection
Henceforth assume that there is an underlying paved space \((E, \mathcal{E})\) and a capacitance \(\mathcal{C}\) of subsets of \(E.\)
Definition 8: A sequence \(\mathfrak{f} = \{f_n\} _ {n \in \mathbb N}\) of mappings \(f_n \colon \left(\mathfrak{P}(E)\right)^n \to \mathfrak{P}(E)\) is called a Sierpiński’s \(\mathcal{C}\)-scraper, or simply a \(\mathcal{C}\)-scraper, if
\(f_n(B_1, B_2, \ldots, B_n) \subseteq B_n\) for all \(n \in \mathbb N\) and for all sets \(B_1, \ldots, B_n \in \mathfrak{P}(E)\), and
whenever \(B_n \in \mathcal{C}\), then \(f_n(B_1, B_2, \ldots, B_n) \in \mathcal{C}.\)
The first property expresses the intuitive notion that \(f_n(B_1, B_2, \ldots, B_n)\) “scrapes” \(B_n\) and the second property ensures that “the scraping does not remove too big a chunk” from \(B_n.\) In French, scraper is called rabotage which can be translated also as planing. A simple example of a scraper is the identity scraper: \(f_n(B_1, \ldots B_n) = B_n\) for all \(n \in \mathbb N\) and for all sets \(B_1, \ldots, B_n \in \mathfrak{P}(E).\)
(Dellacherie, 1972) calls \(\{P_n\} _ {n \in \mathbb N}\) above as the \(\mathfrak f\)-scraped sequence deduced from \(B.\)
Definition 11: A \(\mathcal{C}\)-scraper \(\mathfrak{f} = \{f_n\}_{n \in \mathbb N}\) is called compatible with a given set \(A \in \mathfrak{P}(E)\), if \(A\) envelopes every \(\mathfrak{f}\)-scraped sequence \(\{B_n\} _ {n \in \mathbb N}\) with \(B_1 \subseteq A.\)
A set \(A \in \mathfrak{P}(E)\) is smooth for the capacitance \(\mathcal{C}\), if there exists a \(\mathcal{C}\)-scraper compatible with it.
If \(A \notin \mathcal C\), then no subset of \(A\) can be in \(\mathcal C\) either by the definition of a capacitance. This implies that there does not exist a \(\mathfrak f\)-scraped sequence \(\{B_n\}_{n \in \mathbb N}\) satisfying \(B_1 \subset A.\) On the other hand, with the identity scraper \(\mathfrak f\), \(A\) envelopes every \(\mathfrak f\)-scraped sequence \(\{B_n\}_{n \in \mathbb N}\) because \(B_1\) must equal \(A.\) Thus \(A\) is smooth.
If \(A \in \mathcal C\) and is smooth, then there always exists a sequence \(\{B_n\}_{n \in \mathbb N}\) of which \(A\) is an envelope. Indeed, by assumption there exists \(\mathfrak f\), a scraper, compatible with \(A.\) Let \(\{B_n\}_{n \in \mathbb N}\) be the \(\mathfrak f\)-scraped orbit of \(A.\) Then since \(A\) is smooth, \(A\) envelops \(\{B_n\}_{n \in \mathbb N}.\)
The next result is central in this theory.
We will come back to its proof later. Let's prove some of its useful consequences first.
This theorem is very useful in proving some important results. We will discuss two of them. The first one is the metric space version of Choquet's capacitability theorem. The proof of it will be very similar to the general Choquet's capacitability theorem, but we will need Sion's theorem for the general version, which will be the second result.
Fix an arbitrary \(B \in \mathcal{B}(E).\) If \(I(B) = -\infty\) then \(I(K) = -\infty\) for any \(K \subseteq B\), and we have our desired equality trivially. Otherwise, we need to show that whenever \(I(B) > t\) holds for some given real number \(t\), there exists a compact set \(K \subseteq B\) such that \(I(K) \ge t.\) Recall that
Lemma 1 gives that the mosaic \(\widehat{\mathcal{E}}\) generated by \(\mathcal{E} = \mathscr{K}(E)\) coincides with the Borel \(\sigma\)-algebra \(\mathcal{B}(E).\) Hence by Theorem 2 every Borel set is smooth. Thus, there exists a \(\mathcal{C}\)-scraper \(\mathfrak{f} = \{f_n\}_{n \in \mathbb N}\) compatible with the set \(B.\)
Consider the \(\mathfrak{f}\)-scraped orbit \(\{P_n\} _ {n \in \mathbb N} \subseteq \mathcal{C}\) of \(B.\) By construction \(B\) is an envelope of \(\{P_n\} _ {n \in \mathbb N}\), and hence it contains \(K := \bigcap _ {n \in \mathbb N} \overline{P}_n.\) \(K\) is closed and hence also compact on account of being a subset of a compact set \(E\); similarly for \(\overline{P}_n\) for all \(n \in \mathbb N.\) But since \(\{P_n\} _ {n \in \mathbb N} \subseteq \mathcal{C}\) we have \(I(\overline{P}_n) > t\) for all \(n \in \mathbb N.\) Now use the descending on compacts property of \(I\) to get
Theorem 2 implies that the set \(B\) is smooth, and thus there exists a \(\mathcal{C}\)-scraper \(\mathfrak{f} = \{f_n\} _ {n \in \mathbb N}\) compatible with it. Let \(\{P_n\} _ {n \in \mathbb N} \subseteq \mathcal{C}\) be the \(\mathfrak{f}\)-scraped orbit of \(B.\) Then \(B\) is an envelope of \(\{P_n\} _ {n \in \mathbb N}\), so there exists a decreasing sequence \(\{B_n\} _ {n \in \mathbb N}\) of subsets of \(\mathcal{E} \cup \{E\}\) such that \(\bigcap _ {n \in \mathbb N} B_n \subseteq \mathfrak{P}(B)\) and \(P_n \subseteq B_n\) for all \({n \in \mathbb N}.\) Notice \(B_n \in \mathcal{C}.\)
If the sets \(B_n\) belong to the paving \(\mathcal{E}\) from a certain index \(m\) onward, we take \(K_n := B_{m+n}, n \in \mathbb N\) as our sequence. Otherwise if \(B_n = E\) holds for all integers \(n\), the set \(B=E\) is the union of an increasing sequence of sets in \(\mathcal{E}\) because \(B \in \widehat{\mathcal{E}}\) and by Lemma 2. Therefore, the fact that \(B \in \mathcal{C}\) implies \(B\) contains a set \(K \in \mathcal{C} \cap \mathcal{E}\); it then suffices to take \(K_n = K\) for all integers \(n.\)
We now come back to the proof of Theorem 1. But first we will need the following clever operation on scrapers, and a couple of results.
Let \(A \in \mathfrak{P}(E)\) be compatible with \(\mathfrak{f}^k\) for some arbitrary but fixed \(k.\) Consider also a sequence of sets \(\{P_n\} _ {n \in \mathbb N}\), which is \(\mathfrak{f}\)-scraped and whose first term \(P_1\) is contained in \(A.\) We need to show that the set \(A\) envelops \(\{P_n\} _ {n \in \mathbb N}.\)
To do this, we exploit Lemma 5 (ii) and construct a decreasing sequence \(\{Q_n\} _ {n \in \mathbb N} \subseteq \mathfrak{P}(E)\) which is a subsequence of \(\{P_n\} _ {n \in \mathbb N}\) and show that \(A\) envelops \(\{Q_n\} _ {n \in \mathbb N}.\) This will then imply \(A\) envelops \(\{P_n\} _ {n \in \mathbb N}.\) To this end, define
Now, \(Q_n = P_{k \star n} \in \mathcal{C}\) for all \({n \in \mathbb N}\), so all that remains to be shown is that \(Q_{n+1} \subseteq f^k_n(Q_1, Q_2, \ldots, Q_n)\) holds for all \({n \in \mathbb N}.\) Because \(\{P_n\} _ {n \in \mathbb N}\) is \(\mathfrak{f}\)-scraped we have
An immediate corollary of this theorem: If \(\{A_n\}_{n \in \mathbb N}\) is a sequence of smooth subsets of \(E\), there exists a scraper \(\mathfrak{f}\) which is compatible with all the sets \(A_n, {n \in \mathbb N}.\)
We state Theorem 1 again:
Closure under countable intersections:
Suppose \(\left\{A^k\right\} _ {k \in \mathbb N}\) is a sequence of smooth sets, \(A = \bigcap _ {k \in \mathbb N} A^k\), and \(\mathfrak{f}\) is a \(\mathcal{C}\)-scraper compatible with all of the sets \(A^k, k \in \mathbb N.\) If \(\{P_n\} _ {n \in \mathbb N}\) is an \(\mathfrak{f}\)-scraped sequence of sets such that \(P_1 \subseteq A\), then \(P_1 \subseteq A^k\) for all \(k \in \mathbb N.\) Our construction then implies \(A^k\) is an \(\mathcal{E}\)-envelope of \(\{P_n\} _ {n \in \mathbb N}\) for all \(k \in \mathbb N.\) Lemma 5 (iii) now implies \(A\) is also an \(\mathcal{E}\)-envelope \(\{P_n\} _ {n \in \mathbb N}\), showing that \(A\) is compatible with \(\mathfrak{f}\), and hence smooth.
Closure under countable increasing unions:
Suppose \(\left\{A^k\right\}_{k \in \mathbb N}\) is an increasing sequence of smooth sets, \(A = \bigcup _ {k \in \mathbb N} A^k\), and \(\mathfrak{f}\) is a \(\mathcal{C}\)-scraper compatible with all of the sets \(A^k, k \in \mathbb N.\) The scraper \(\mathfrak{f}\) doesn't work for this case and so we create a new one. For any \(n \in \mathbb N\) and sets \(P_1, P_2, \ldots, P_n\) we define
Let \(\{P_n\} _ {n \in \mathbb N}\) be a \(\Phi\)-scraped sequence of sets such that \(P_1 \subseteq A.\) By definition \(P_1 \in \mathcal{C}\) and \(A \cap P_1 = P_1\), and so from our construction \(\varphi_n(P_1, P_2, \ldots, P_n) = f_n(A^p \cap P_1, P_2, \ldots, P_n).\) All elements of the sequence \(A^p \cap P_1, P_2, \ldots, P_n, \ldots\) are in \(\mathcal{C}\) and for all \(n \in \mathbb N\)
Definition 14: A mapping \(I \colon \mathfrak{P}(E) \to \overline{\mathbb R}\) is called a Choquet \(\mathcal{E}\)-capacity, or simply \(\mathcal{E}\)-capacity, if it is
monotone increasing, i.e., \(I(A) \le I(B)\) holds for every \(A \subseteq B\),
ascending, i.e., for every increasing sequence \(\{A_n\} _ {n \in \mathbb N} \subseteq \mathfrak{P}(E)\) we have
descending on pavings, i.e., for every decreasing sequence \(\{E_n\} _ {n \in \mathbb N} \subseteq \mathcal{E}\) we have
Examples:
Consider a paved space \((E, \mathcal{E})\), where \(\mathcal E\) is a compact paving, and define \(I(A) = 0\) if \(A = \varnothing\), \(I(A) = 1\) if \(A \neq \varnothing.\) Then \(I\) is a Choquet \(\mathcal{E}\)-capacity. The property 3. in the definition of a capacity reflects now the assumption that the paving \(\mathcal{E}\) is compact.
Consider a probability space \((\Omega, \mathcal{F}, \mathbb{P})\), then the outer measure
Consider a locally compact, separable metric space \(K\), and the paving \(\mathscr{K}\) of its compact subsets. If \(\pi\) denotes the projection of \(K \times \Omega\) onto \(\Omega\) and
Fix an arbitrary set \(A \in \widehat{\mathcal{E}}.\) If \(I(A) = -\infty\) then \(I(K) = -\infty\) for any \(K \subseteq A\), and we have our desired equality trivially. Otherwise, we need to show that whenever \(I(A) > t\) holds for some given real number \(t\), there exists a set \(K \in \mathcal{E}_\delta\) with \(K \subseteq A\) and \(I(K) \ge t.\)
Recall that
We are now ready to prove some major results in measure theory.
We start by noticing \(\mathcal{B}(K) \widehat{\otimes}_p \mathcal{F} = \mathcal{B}(K) \otimes \mathcal{F}.\) This follows from the fact that if \(A \in \mathcal{B}(K) \otimes_p \mathcal{F}\), then \(A = \bigcup _ {i=1}^n U_i \times V_i\) for some \(U_i \in \mathcal{B}(K)\) and \(V_i \in \mathcal{F}\), and thus it can be shown \(A^\mathsf{c} \in \mathcal{B}(K) \widehat{\otimes}_p \mathcal{F}\), and thus Lemma 1 gives
Consider the paving \(\mathscr{K}\) on \(K\) consisting of all compact subsets of \(K\), and introduce the \((\mathscr{K} \otimes_p \mathcal{F})\)-capacity \(I(A) = \mathbb{P}^*(\pi(A))\), \(A \in \mathcal{B}(K \times \Omega)\) we saw before. Now note
Choquet's capacitability theorem (Theorem 6) thus guarantees that every set in \(\mathcal{B}(K) \otimes \mathcal{F}\) is \(I\)-capacitable. In particular, for every integer \(n \in \mathbb N\), there exists a set \(C_n \in (\mathscr{K} \otimes_p \mathcal{F})_\delta\) contained in \(B\) and such that \[\begin{aligned} I(C_n) \le I(B) \le I(C_n) + (1/n).\end{aligned}\] Because \(C_n \in (\mathscr{K} \otimes_p \mathcal{F})_\delta\), \(C_n\) is a countable intersection \(C_n = \bigcap _ {m \in \mathbb N} G_n^m\), where each \(G_n^m\) is a finite union of sets of the form \(U \times V\) with \(U \in \mathscr{K}, V \in \mathcal{F}.\) Letting \(H_n^m = \bigcap _ {i=1}^m G_n^i\), we see that \(H_n^1 \supseteq H_n^2 \supseteq \cdots\) and \(C_n = \bigcap _ {m \in \mathbb N} H_n^m\), where now \(H_n^m\) is also a finite union of sets of the form \(U \times V\) with \(U \in \mathscr{K}, V \in \mathcal{F}.\)
The form of \(H_n^m\) immediately implies \(\pi(H_n^m) \in \mathcal{F}\) for all \((m,n) \in \mathbb N^2.\) Lemma 3 implies
It is not easy to construct an example of a Borel set in the product space whose projection is not Borel. It requires study of analytic sets. Check out Corollary 8.2.17 in (Cohn, 2013) for more details.
The next theorem is a very visual theorem, especially for the case where \(K\) and \(\Omega\) are \(\mathbb R.\)
Definition 16: For a map \(f \colon X \to Y\), we define its graph \(\llbracket f \rrbracket\) to be the product set
We call a set \(G \in \mathcal{B}(K) \otimes \mathcal{F}\) a measurable graph, if for every \(\omega \in \Omega\) its section \(G(\omega) = \{y \in K\,:\,(y,\omega) \in G\}\) contains at most one point.
Sufficiency: If \(\Xi\) and \(g\) are as stated, the set \(G = \{(y,\omega) \in K \times \Xi\,:\,y = g(\omega)\}\) equals the pre-image \(\varphi^{-1}(\Delta)\) of the diagonal \(\Delta = \{(y,y) \in K \times K\,:\,y \in K\}\) under the mapping
Necessity: Suppose that \(G\) is a measurable graph, and let \(\Xi := \pi(G).\) Then \(\Xi \in \mathcal{F}\) by the Measurable Projection theorem. For every \(\omega \in \Xi\), define \(g(\omega)\) to be the unique element of the set \(G(\omega).\) We want to show \(g \colon \Xi \to K\) is measurable. Indeed for any \(H \in \mathcal{B}(K)\), it is easy to see that
Now let \(K = [0, \infty)\), the case important in stochastic processes.
Recall the convention \(\inf \varnothing = \infty.\) It is easy to see that \(\{D_A < \infty\} = \pi(A).\)
Let \((\Omega, \mathcal{F}, \mathbb{P})\) be a complete probability space, and consider a set \(A \subseteq [0, \infty) \times \Omega.\) Then for every \(\omega \in \pi(A),\) there exists a \(t \in [0, \infty)\) such that \((t, \omega) \in A.\) In other words, we can define a mapping \(Z \colon \pi(A) \to [0, \infty).\) It is convenient to extend \(Z\) to the whole of \(\Omega\) by setting \(Z = \infty\) on \(\Omega \setminus \pi(A).\) When is it possible to choose \(Z\) such that it is measurable? The measurable section theorem (also known as measurable selection theorem) says that it is possible to define \(Z\) to be measurable if \(A \in \mathcal{B}([0, \infty)) \otimes \mathcal{F}.\)
Recall that for \(Z \colon \Omega \to [0, \infty]\) its graph is the product set
The condition \((Z(\omega), \omega) \in A\) whenever \(Z < \infty\) can then be expressed by saying \(\llbracket Z \rrbracket \subseteq A.\)
Other than stochastic processes, measurable section theorems have applications in optimal control and game theory.
We divide our analysis into two parts. We shall first show that for every \(\varepsilon > 0\) there exists a random variable \(Z_\varepsilon \colon \Omega \to [0, \infty]\) with \(\llbracket Z_\varepsilon \rrbracket \subseteq A\) and
For every \(\omega \in \Omega\), the section \(C_\varepsilon(\omega)\) is a compact subset of \(K\) (use the facts that compact \(\iff\) closed and bounded here, and \(C_\varepsilon \in (\mathscr{K} \otimes_p \mathcal{F})_\delta\)). Notice that if \((t, \omega) \in \llbracket D_{C_\varepsilon} \rrbracket\) then \(D_{C_\varepsilon}(\omega) = t\) which is same as saying \(t = \inf \{s \in [0, \infty)\,:\,(s,\omega) \in C_\varepsilon\}\), but since \(C_\varepsilon(\omega)\) is closed this implies \((t, \omega) \in C_\varepsilon.\) Therefore, \(\llbracket Z_\varepsilon \rrbracket \subseteq C_\varepsilon\), showing the first requirement.
The second requirement \(\mathbb{P}\left(\pi(A)\right) \le \mathbb{P}\left(Z_\varepsilon < \infty\right) + \varepsilon\) is true because \(\pi(A) \in \mathcal{F}\) by the Measurable Projection theorem (Theorem 7) and \(\{Z_\varepsilon < \infty\} = \pi(C_\varepsilon) \in \mathcal{F}\), and now use (2).
Let us now construct a random variable \(Z\) to satisfy the properties claimed in the theorem. Define
On the other hand, letting \(n \to \infty\) in (3), we see that \(\{Z < \infty\}\) and \(\pi(A)\) have the same probability. Therefore, the completeness of the probability space implies these two sets can be made equal.
With this we are done laying the foundations. In the next blog post, we will discuss applications of these results in the general theory of processes.
Cohn, Donald L.. Measure Theory, second edition, Birkhäuser, 2013.
Dellacherie, Claude. Capacités et processus stochastiques, Springer-Verlag, 1972.
Dellacherie, Claude and Meyer, Paul-André. Probabilities and Potential, North-Holland Publishing Company, 1978.
Dellacherie, Claude. Capacities and analytic sets, Part of the Lecture Notes in Mathematics book series (LNM, volume 839), Springer-Verlag, 1981.
Karoui, Nicole El and Tan, Xiaolu. Capacities, Measurable Selection and Dynamic Programming Part I: Abstract Framework, 2013.
Lowther, George. Almost Sure blog.
Meyer, Paul-André. Stochastic Processes from 1950 to the Present (Translated from French by Jeanine Sedjro), 2009.
[1] | If you steal from one author, it’s plagiarism; if you steal from many, it’s research. |