1+ # ' Clr and inverse clr transformation
2+ # '
3+ # ' \code{clr} computes the clr or inverse clr transformation of a vector \code{f}
4+ # ' with respect to integration weights \code{w}, corresponding to a Bayes Hilbert space
5+ # ' \eqn{B^2(\mu) = B^2(\mathcal{T}, \mathcal{A}, \mu)}{B^2(\mu) = B^2(T, A, \mu)}.
6+ # '
7+ # ' @param f a vector containing the function values (evaluated on a grid) of the
8+ # ' function \eqn{f} to transform. If \code{inverse = TRUE}, \code{f} must be a density,
9+ # ' i.e., all entries must be positive and usually \code{f} integrates to one.
10+ # ' If \code{inverse = FALSE}, \code{f} should integrate to zero, see Details.
11+ # ' @param w a vector of length one or of the same length as \code{f} containing
12+ # ' positive integration weights. If \code{w} has length one, this
13+ # ' weight is used for all function values. The integral of \eqn{f} is approximated
14+ # ' via \eqn{\int_{\mathcal{T}} f \, \mathrm{d}\mu \approx
15+ # ' \sum_{j=1}^m}{\sum_{j=1}^m} \code{w}\eqn{_j} \code{f}\eqn{_j},
16+ # ' where \eqn{m} equals the length of \code{f}.
17+ # ' @param inverse if \code{TRUE}, the inverse clr transformation is computed.
18+ # '
19+ # ' @details The clr transformation maps a density \eqn{f} from \eqn{B^2(\mu)} to
20+ # ' \eqn{L^2_0(\mu) := \{ f \in L^2(\mu) ~|~ \int_{\mathcal{T}} f \, \mathrm{d}\mu = 0\}}{L^2_0(\mu) := {f \in L^2(\mu) | \int_T f d\mu = 0}}
21+ # ' via
22+ # ' \deqn{\mathrm{clr}(f) := \log f - \frac{1}{\mu (\mathcal{T})} \int_{\mathcal{T}} \log f \, \mathrm{d}\mu.}{clr(f) := log f - 1/\mu(T) * \int_T log f d\mu.}
23+ # ' The inverse clr transformation maps a function \eqn{f} from
24+ # ' \eqn{L^2_0(\mu)} to \eqn{B^2(\mu)} via
25+ # ' \deqn{\mathrm{clr}^{-1}(f) := \frac{\exp f}{\int_{\mathcal{T}} \exp f \, \mathrm{d}\mu}.}{clr^{-1}(f) := (exp f) / (\int_T \exp f d\mu).}
26+ # ' Note that in contrast to Maier et al. (2021), this definition of the inverse
27+ # ' clr transformation includes normalization, yielding the respective probability
28+ # ' density function (representative of the equivalence class of proportional
29+ # ' functions in \eqn{B^2(\mu)}).
30+ # '
31+ # ' The (inverse) clr transformation depends not only on \eqn{f}, but also on the
32+ # ' underlying measure space \eqn{\left( \mathcal{T}, \mathcal{A}, \mu\right)}{(T, A, \mu)},
33+ # ' which determines the integral. In \code{clr} this is specified via the
34+ # ' integration weights \code{w}. E.g., for a discrete set \eqn{\mathcal{T}}{T}
35+ # ' with \eqn{\mathcal{A} = \mathcal{P}(\mathcal{T})}{A = P(T)} the power set of
36+ # ' \eqn{\mathcal{T}}{T} and \eqn{\mu = \sum_{t \in T} \delta_t} the sum of dirac
37+ # ' measures at \eqn{t \in \mathcal{T}}{t \in T}, the default \code{w = 1} is
38+ # ' the correct choice. In this case, integrals are indeed computed exactly, not
39+ # ' only approximately.
40+ # ' For an interval \eqn{\mathcal{T} = [a, b]}{T = [a, b]}
41+ # ' with \eqn{\mathcal{A} = \mathcal{B}}{A = B} the Borel \eqn{\sigma}-algebra
42+ # ' restricted to \eqn{\mathcal{T}}{T} and \eqn{\mu = \lambda} the Lebesgue measure,
43+ # ' the choice of \code{w} depends on the grid on which the function was evaluated:
44+ # ' \code{w}\eqn{_j} must correspond to the length of the subinterval of \eqn{[a, b]}, which
45+ # ' \code{f}\eqn{_j} represents.
46+ # ' E.g., for a grid with equidistant distance \eqn{d}, where the boundary grid
47+ # ' values are \eqn{a + \frac{d}{2}}{a + d/2} and \eqn{b - \frac{d}{2}}{b - d/2}
48+ # ' (i.e., the grid points are centers of intervals of size \eqn{d}),
49+ # ' equal weights \eqn{d} should be chosen for \code{w}.
50+ # '
51+ # ' The clr transformation is crucial for density-on-scalar regression
52+ # ' since estimating the clr transformed model in \eqn{L^2_0(\mu)} is equivalent
53+ # ' to estimating the original model in \eqn{B^2(\mu)} (as the clr transformation
54+ # ' is an isometric isomorphism), see also the vignette "FDboost_density-on-scalar_births"
55+ # ' and Maier et al. (2021).
56+ # '
57+ # ' @return A vector of the same length as \code{f} containing the (inverse) clr
58+ # ' transformation of \code{f}.
59+ # '
60+ # ' @author Eva-Maria Maier
61+ # '
62+ # ' @references
63+ # ' Maier, E.-M., Stoecker, A., Fitzenberger, B., Greven, S. (2021):
64+ # ' Additive Density-on-Scalar Regression in Bayes Hilbert Spaces with an Application to Gender Economics.
65+ # ' arXiv preprint arXiv:2110.11771.
66+ # '
67+ # ' @examples
68+ # ' ### Continuous case (T = [0, 1] with Lebesgue measure):
69+ # ' # evaluate density of a Beta distribution on an equidistant grid
70+ # ' g <- seq(from = 0.005, to = 0.995, by = 0.01)
71+ # ' f <- dbeta(g, 2, 5)
72+ # ' # compute clr transformation with distance of two grid points as integration weight
73+ # ' f_clr <- clr(f, w = 0.01)
74+ # ' # visualize result
75+ # ' plot(g, f_clr , type = "l")
76+ # ' abline(h = 0, col = "grey")
77+ # ' # compute inverse clr transformation (w as above)
78+ # ' f_clr_inv <- clr(f_clr, w = 0.01, inverse = TRUE)
79+ # ' # visualize result
80+ # ' plot(g, f, type = "l")
81+ # ' lines(g, f_clr_inv, lty = 2, col = "red")
82+ # '
83+ # ' ### Discrete case (T = {1, ..., 12} with sum of dirac measures at t in T):
84+ # ' data("birthDistribution", package = "FDboost")
85+ # ' # fit density-on-scalar model with effects for sex and year
86+ # ' model <- FDboost(birth_densities_clr ~ 1 + bolsc(sex, df = 1) +
87+ # ' bbsc(year, df = 1, differences = 1),
88+ # ' # use bbsc() in timeformula to ensure integrate-to-zero constraint
89+ # ' timeformula = ~bbsc(month, df = 4,
90+ # ' # December is followed by January of subsequent year
91+ # ' cyclic = TRUE,
92+ # ' # knots = {1, ..., 12} with additional boundary knot
93+ # ' # 0 (coinciding with 12) due to cyclic = TRUE
94+ # ' knots = 1:11, boundary.knots = c(0, 12),
95+ # ' # degree = 1 with these knots yields identity matrix
96+ # ' # as design matrix
97+ # ' degree = 1),
98+ # ' data = birthDistribution, offset = 0,
99+ # ' control = boost_control(mstop = 1000))
100+ # ' # Extract predictions (clr-transformed!) and transform them to Bayes Hilbert space
101+ # ' predictions_clr <- predict(model)
102+ # ' predictions <- t(apply(predictions_clr, 1, clr, inverse = TRUE))
103+ # '
104+ # ' @export
105+ clr <- function (f , w = 1 , inverse = FALSE ) {
106+ stopifnot(" inverse must be TRUE or FALSE." = inverse %in% c(TRUE , FALSE ))
107+ stopifnot(" f must be numeric." = is.numeric(f ))
108+ stopifnot(" f contains missing values." = ! (anyNA(f )))
109+ n <- length(f )
110+ if (length(w ) == 1 ) {
111+ w <- rep(w , n )
112+ }
113+ stopifnot(" w must be contain positive weights of length 1 or the same length as f." =
114+ length(w ) == n & is.numeric(w ) & all(w > 0 ))
115+ int_f <- sum(f * w )
116+ if (! inverse ) {
117+ stopifnot(" As a density, f must be positive." = all(f > 0 ))
118+ if (! isTRUE(all.equal(int_f , 1 , tolerance = 0.01 ))) {
119+ warning(paste0(" f is not a probability density with respect to w. Its integral is " ,
120+ round(int_f , 2 ), " ." ))
121+ }
122+ return (log(f ) - 1 / sum(w ) * sum(log(f ) * w ))
123+ } else {
124+ if (! isTRUE(all.equal(int_f , 0 , tolerance = 0.01 ))) {
125+ warning(paste0(" f does not integrate to zero with respect to w. Its integral is " ,
126+ round(int_f , 2 ), " ." ))
127+ }
128+ return (exp(f ) / sum(exp(f ) * w ))
129+ }
130+ }
131+
132+
133+ # ' Densities of live births in Germany
134+ # '
135+ # ' \code{birthDistribution} contains densities of live births in Germany over the
136+ # ' months per year (1950 to 2019) and sex (male and female), resulting in 140
137+ # ' densities.
138+ # '
139+ # ' @docType data
140+ # '
141+ # ' @usage data(birthDistribution, package = "FDboost")
142+ # '
143+ # ' @format A list in the correct format to be passed to \code{\link{FDboost}} for
144+ # ' density-on-scalar regression:
145+ # ' \describe{
146+ # ' \item{\code{birth_densities}}{A 140 x 12 matrix containing the birth densities
147+ # ' in its rows. The first 70 rows correspond to male newborns, the second 70 rows
148+ # ' to female ones. Within both of these, the years are ordered increasingly
149+ # ' (1950-2019), see also \code{sex} and \code{year}.}
150+ # ' \item{\code{birth_densities_clr}}{A 140 x 12 matrix containing the clr
151+ # ' transformed densities in its rows. Same structure as \code{birth_densities}.}
152+ # ' \item{\code{sex}}{A factor vector of length 140 with levels \code{"m"} (male)
153+ # ' and \code{"f"} (female), corresponding to the sex of the newborns for the rows of
154+ # ' \code{birth_densities} and \code{birth_densities_clr}. The first 70 elements
155+ # ' are \code{"m"}, the second 70 \code{"f"}.}
156+ # ' \item{\code{year}}{A vector of length 140 containing the integers from 1950
157+ # ' to 2019 two times (\code{c(1950:2019, 1950:2019)}), corresponding to the years
158+ # ' for the rows of \code{birth_densities} and \code{birth_densities_clr}.}
159+ # ' \item{\code{month}}{A vector containing the integers from 1 to 12, corresponding
160+ # ' to the months for the columns of \code{birth_densities} and \code{birth_densities_clr}
161+ # ' (domain \eqn{\mathcal{T}}{T} of the (clr-)densities).}
162+ # ' }
163+ # ' Note that for estimating a density-on-scalar model with \code{FDboost}, the
164+ # ' clr transformed densities (\code{birth_densities_clr}) serve as response, see
165+ # ' also the vignette "FDboost_density-on-scalar_births".
166+ # ' The original densities (\code{birth_densities}) are not needed for estimation,
167+ # ' but still included for the sake of completeness.
168+ # '
169+ # ' @details To compensate for the different lengths of the months, the average
170+ # ' number of births per day for each month (by sex and year) was used to compute
171+ # ' the birth shares from the absolute birth counts. The 12 shares corresponding
172+ # ' to one year and sex form one density in the Bayes Hilbert space
173+ # ' \eqn{B^2(\delta) = B^2\left( \mathcal{T}, \mathcal{A}, \delta\right)}{B^2(\delta) = B^2(T, A, \delta)},
174+ # ' where \eqn{\mathcal{T} = \{1, \ldots, 12\}}{T = {1, \ldots, 12}} corresponds
175+ # ' to the set of the 12 months, \eqn{\mathcal{A} := \mathcal{P}(\mathcal{T})}{A := P(T)}
176+ # ' corresponds to the power set of \eqn{\mathcal{T}}{T}, and the reference measure
177+ # ' \eqn{\delta := \sum_{t = 1}^{12} \delta_t} corresponds to the sum of dirac
178+ # ' measures at \eqn{t \in \mathcal{T}}{t \in T}.
179+ # '
180+ # ' @seealso \code{\link{clr}} for the (inverse) clr transformation.
181+ # '
182+ # ' @source Statistisches Bundesamt (Destatis), Genesis-Online, data set
183+ # ' \href{https://www-genesis.destatis.de/genesis//online?operation=table&code=12612-0002&bypass=true&levelindex=0&levelid=1610983595176#abreadcrumb}{12612-0002}
184+ # ' (01/18/2021); \href{https://www.govdata.de/dl-de/by-2-0}{dl-de/by-2-0};
185+ # ' processed by Eva-Maria Maier
186+ # '
187+ # ' @references
188+ # ' Maier, E.-M., Stoecker, A., Fitzenberger, B., Greven, S. (2021):
189+ # ' Additive Density-on-Scalar Regression in Bayes Hilbert Spaces with an Application to Gender Economics.
190+ # ' arXiv preprint arXiv:2110.11771.
191+ # '
192+ # ' @examples
193+ # ' data("birthDistribution", package = "FDboost")
194+ # '
195+ # ' # Plot densities
196+ # ' year_col <- rainbow(70, start = 0.5, end = 1)
197+ # ' year_lty <- c(1, 2, 4, 5)
198+ # ' par(mfrow = c(1, 2))
199+ # ' funplot(1:12, birthDistribution$birth_densities[1:70, ], ylab = "densities", xlab = "month",
200+ # ' xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Male")
201+ # ' funplot(1:12, birthDistribution$birth_densities[71:140, ], ylab = "densities", xlab = "month",
202+ # ' xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Female")
203+ # ' par(mfrow = c(1, 1))
204+ # '
205+ # ' # fit density-on-scalar model with effects for sex and year
206+ # ' model <- FDboost(birth_densities_clr ~ 1 + bolsc(sex, df = 1) +
207+ # ' bbsc(year, df = 1, differences = 1),
208+ # ' # use bbsc() in timeformula to ensure integrate-to-zero constraint
209+ # ' timeformula = ~bbsc(month, df = 4,
210+ # ' # December is followed by January of subsequent year
211+ # ' cyclic = TRUE,
212+ # ' # knots = {1, ..., 12} with additional boundary knot
213+ # ' # 0 (coinciding with 12) due to cyclic = TRUE
214+ # ' knots = 1:11, boundary.knots = c(0, 12),
215+ # ' # degree = 1 with these knots yields identity matrix
216+ # ' # as design matrix
217+ # ' degree = 1),
218+ # ' data = birthDistribution, offset = 0,
219+ # ' control = boost_control(mstop = 1000))
220+ # '
221+ # ' # Plotting 'model' yields the clr-transformed effects
222+ # ' par(mfrow = c(1, 3))
223+ # ' plot(model, n1 = 12, n2 = 12)
224+ # '
225+ # ' # Use inverse clr transformation to get effects in Bayes Hilbert space, e.g. for intercept
226+ # ' intercept_clr <- predict(model, which = 1)[1, ]
227+ # ' intercept <- clr(intercept_clr, w = 1, inverse = TRUE)
228+ # ' funplot(1:12, intercept, xlab = "month", xaxp = c(1, 12, 11), pch = 20,
229+ # ' main = "Intercept", ylab = expression(hat(beta)[0]), id = rep(1, 12))
230+ # '
231+ # ' # Same with predictions
232+ # ' predictions_clr <- predict(model)
233+ # ' predictions <- t(apply(predictions_clr, 1, clr, inverse = TRUE))
234+ # ' pred_ylim <- range(birthDistribution$birth_densities)
235+ # ' par(mfrow = c(1, 2))
236+ # ' funplot(1:12, predictions[1:70, ], ylab = "predictions", xlab = "month", ylim = pred_ylim,
237+ # ' xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Male")
238+ # ' funplot(1:12, predictions[71:140, ], ylab = "predictions", xlab = "month", ylim = pred_ylim,
239+ # ' xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Female")
240+ # ' par(mfrow = c(1, 1))
241+ " birthDistribution"
0 commit comments