Skip to content

Commit cd59e78

Browse files
authored
Merge pull request #18 from Eva2703/BayesSpace
Bayes space utility tools
2 parents e2f401c + fd5ebf4 commit cd59e78

11 files changed

Lines changed: 793 additions & 4 deletions

DESCRIPTION

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ Collate:
4949
'baselearners.R'
5050
'baselearnersX.R'
5151
'bootstrapCIs.R'
52+
'clr_functions.R'
5253
'constrainedX.R'
5354
'crossvalidation.R'
5455
'FDboostLSS.R'

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ export(bolsc)
4141
export(bootstrapCI)
4242
export(brandomc)
4343
export(bsignal)
44+
export(clr)
4445
export(cvLong)
4546
export(cvMa)
4647
export(funMRD)

R/clr_functions.R

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
#' Clr and inverse clr transformation
2+
#'
3+
#' \code{clr} computes the clr or inverse clr transformation of a vector \code{f}
4+
#' with respect to integration weights \code{w}, corresponding to a Bayes Hilbert space
5+
#' \eqn{B^2(\mu) = B^2(\mathcal{T}, \mathcal{A}, \mu)}{B^2(\mu) = B^2(T, A, \mu)}.
6+
#'
7+
#' @param f a vector containing the function values (evaluated on a grid) of the
8+
#' function \eqn{f} to transform. If \code{inverse = TRUE}, \code{f} must be a density,
9+
#' i.e., all entries must be positive and usually \code{f} integrates to one.
10+
#' If \code{inverse = FALSE}, \code{f} should integrate to zero, see Details.
11+
#' @param w a vector of length one or of the same length as \code{f} containing
12+
#' positive integration weights. If \code{w} has length one, this
13+
#' weight is used for all function values. The integral of \eqn{f} is approximated
14+
#' via \eqn{\int_{\mathcal{T}} f \, \mathrm{d}\mu \approx
15+
#' \sum_{j=1}^m}{\sum_{j=1}^m} \code{w}\eqn{_j} \code{f}\eqn{_j},
16+
#' where \eqn{m} equals the length of \code{f}.
17+
#' @param inverse if \code{TRUE}, the inverse clr transformation is computed.
18+
#'
19+
#' @details The clr transformation maps a density \eqn{f} from \eqn{B^2(\mu)} to
20+
#' \eqn{L^2_0(\mu) := \{ f \in L^2(\mu) ~|~ \int_{\mathcal{T}} f \, \mathrm{d}\mu = 0\}}{L^2_0(\mu) := {f \in L^2(\mu) | \int_T f d\mu = 0}}
21+
#' via
22+
#' \deqn{\mathrm{clr}(f) := \log f - \frac{1}{\mu (\mathcal{T})} \int_{\mathcal{T}} \log f \, \mathrm{d}\mu.}{clr(f) := log f - 1/\mu(T) * \int_T log f d\mu.}
23+
#' The inverse clr transformation maps a function \eqn{f} from
24+
#' \eqn{L^2_0(\mu)} to \eqn{B^2(\mu)} via
25+
#' \deqn{\mathrm{clr}^{-1}(f) := \frac{\exp f}{\int_{\mathcal{T}} \exp f \, \mathrm{d}\mu}.}{clr^{-1}(f) := (exp f) / (\int_T \exp f d\mu).}
26+
#' Note that in contrast to Maier et al. (2021), this definition of the inverse
27+
#' clr transformation includes normalization, yielding the respective probability
28+
#' density function (representative of the equivalence class of proportional
29+
#' functions in \eqn{B^2(\mu)}).
30+
#'
31+
#' The (inverse) clr transformation depends not only on \eqn{f}, but also on the
32+
#' underlying measure space \eqn{\left( \mathcal{T}, \mathcal{A}, \mu\right)}{(T, A, \mu)},
33+
#' which determines the integral. In \code{clr} this is specified via the
34+
#' integration weights \code{w}. E.g., for a discrete set \eqn{\mathcal{T}}{T}
35+
#' with \eqn{\mathcal{A} = \mathcal{P}(\mathcal{T})}{A = P(T)} the power set of
36+
#' \eqn{\mathcal{T}}{T} and \eqn{\mu = \sum_{t \in T} \delta_t} the sum of dirac
37+
#' measures at \eqn{t \in \mathcal{T}}{t \in T}, the default \code{w = 1} is
38+
#' the correct choice. In this case, integrals are indeed computed exactly, not
39+
#' only approximately.
40+
#' For an interval \eqn{\mathcal{T} = [a, b]}{T = [a, b]}
41+
#' with \eqn{\mathcal{A} = \mathcal{B}}{A = B} the Borel \eqn{\sigma}-algebra
42+
#' restricted to \eqn{\mathcal{T}}{T} and \eqn{\mu = \lambda} the Lebesgue measure,
43+
#' the choice of \code{w} depends on the grid on which the function was evaluated:
44+
#' \code{w}\eqn{_j} must correspond to the length of the subinterval of \eqn{[a, b]}, which
45+
#' \code{f}\eqn{_j} represents.
46+
#' E.g., for a grid with equidistant distance \eqn{d}, where the boundary grid
47+
#' values are \eqn{a + \frac{d}{2}}{a + d/2} and \eqn{b - \frac{d}{2}}{b - d/2}
48+
#' (i.e., the grid points are centers of intervals of size \eqn{d}),
49+
#' equal weights \eqn{d} should be chosen for \code{w}.
50+
#'
51+
#' The clr transformation is crucial for density-on-scalar regression
52+
#' since estimating the clr transformed model in \eqn{L^2_0(\mu)} is equivalent
53+
#' to estimating the original model in \eqn{B^2(\mu)} (as the clr transformation
54+
#' is an isometric isomorphism), see also the vignette "FDboost_density-on-scalar_births"
55+
#' and Maier et al. (2021).
56+
#'
57+
#' @return A vector of the same length as \code{f} containing the (inverse) clr
58+
#' transformation of \code{f}.
59+
#'
60+
#' @author Eva-Maria Maier
61+
#'
62+
#' @references
63+
#' Maier, E.-M., Stoecker, A., Fitzenberger, B., Greven, S. (2021):
64+
#' Additive Density-on-Scalar Regression in Bayes Hilbert Spaces With an Application to Gender Economics.
65+
#' arXiv preprint arXiv:.
66+
#'
67+
#' @examples
68+
#' ### Continuous case (T = [0, 1] with Lebesgue measure):
69+
#' # evaluate density of a Beta distribution on an equidistant grid
70+
#' g <- seq(from = 0.005, to = 0.995, by = 0.01)
71+
#' f <- dbeta(g, 2, 5)
72+
#' # compute clr transformation with distance of two grid points as integration weight
73+
#' f_clr <- clr(f, w = 0.01)
74+
#' # visualize result
75+
#' plot(g, f_clr , type = "l")
76+
#' abline(h = 0, col = "grey")
77+
#' # compute inverse clr transformation (w as above)
78+
#' f_clr_inv <- clr(f_clr, w = 0.01, inverse = TRUE)
79+
#' # visualize result
80+
#' plot(g, f, type = "l")
81+
#' lines(g, f_clr_inv, lty = 2, col = "red")
82+
#'
83+
#' ### Discrete case (T = {1, ..., 12} with sum of dirac measures at t in T):
84+
#' data("birthDistribution", package = "FDboost")
85+
#' # fit density-on-scalar model with effects for sex and year
86+
#' model <- FDboost(birth_densities_clr ~ 1 + bolsc(sex, df = 1) +
87+
#' bbsc(year, df = 1, differences = 1),
88+
#' # use bbsc() in timeformula to ensure integrate-to-zero constraint
89+
#' timeformula = ~bbsc(month, df = 4,
90+
#' # December is followed by January of subsequent year
91+
#' cyclic = TRUE,
92+
#' # knots = {1, ..., 12} with additional boundary knot
93+
#' # 0 (coinciding with 12) due to cyclic = TRUE
94+
#' knots = 1:11, boundary.knots = c(0, 12),
95+
#' # degree = 1 with these knots yields identity matrix
96+
#' # as design matrix
97+
#' degree = 1),
98+
#' data = birthDistribution, offset = 0,
99+
#' control = boost_control(mstop = 1000))
100+
#' # Extract predictions (clr-transformed!) and transform them to Bayes Hilbert space
101+
#' predictions_clr <- predict(model)
102+
#' predictions <- t(apply(predictions_clr, 1, clr, inverse = TRUE))
103+
#'
104+
#' @export
105+
clr <- function(f, w = 1, inverse = FALSE) {
106+
stopifnot("inverse must be TRUE or FALSE." = inverse %in% c(TRUE, FALSE))
107+
stopifnot("f must be numeric." = is.numeric(f))
108+
stopifnot("f contains missing values." = !(anyNA(f)))
109+
n <- length(f)
110+
if (length(w) == 1) {
111+
w <- rep(w, n)
112+
}
113+
stopifnot("w must be contain positive weights of length 1 or the same length as f." =
114+
length(w) == n & is.numeric(w) & all(w > 0))
115+
int_f <- sum(f * w)
116+
if (!inverse) {
117+
stopifnot("As a density, f must be positive." = all(f > 0))
118+
if (!isTRUE(all.equal(int_f, 1, tolerance = 0.01))) {
119+
warning(paste0("f is not a probability density with respect to w. Its integral is ",
120+
round(int_f, 2), "."))
121+
}
122+
return(log(f) - 1 / sum(w) * sum(log(f) * w))
123+
} else {
124+
if (!isTRUE(all.equal(int_f, 0, tolerance = 0.01))) {
125+
warning(paste0("f does not integrate to zero with respect to w. Its integral is ",
126+
round(int_f, 2), "."))
127+
}
128+
return(exp(f) / sum(exp(f) * w))
129+
}
130+
}
131+
132+
133+
#' Densities of live births in Germany
134+
#'
135+
#' \code{birthDistribution} contains densities of live births in Germany over the
136+
#' months per year (1950 to 2019) and sex (male and female), resulting in 140
137+
#' densities.
138+
#'
139+
#' @docType data
140+
#'
141+
#' @usage data(birthDistribution, package = "FDboost")
142+
#'
143+
#' @format A list in the correct format to be passed to \code{\link{FDboost}} for
144+
#' density-on-scalar regression:
145+
#' \describe{
146+
#' \item{\code{birth_densities}}{A 140 x 12 matrix containing the birth densities
147+
#' in its rows. The first 70 rows correspond to male newborns, the second 70 rows
148+
#' to female ones. Within both of these, the years are ordered increasingly
149+
#' (1950-2019), see also \code{sex} and \code{year}.}
150+
#' \item{\code{birth_densities_clr}}{A 140 x 12 matrix containing the clr
151+
#' transformed densities in its rows. Same structure as \code{birth_densities}.}
152+
#' \item{\code{sex}}{A factor vector of length 140 with levels \code{"m"} (male)
153+
#' and \code{"f"} (female), corresponding to the sex of the newborns for the rows of
154+
#' \code{birth_densities} and \code{birth_densities_clr}. The first 70 elements
155+
#' are \code{"m"}, the second 70 \code{"f"}.}
156+
#' \item{\code{year}}{A vector of length 140 containing the integers from 1950
157+
#' to 2019 two times (\code{c(1950:2019, 1950:2019)}), corresponding to the years
158+
#' for the rows of \code{birth_densities} and \code{birth_densities_clr}.}
159+
#' \item{\code{month}}{A vector containing the integers from 1 to 12, corresponding
160+
#' to the months for the columns of \code{birth_densities} and \code{birth_densities_clr}
161+
#' (domain \eqn{\mathcal{T}}{T} of the (clr-)densities).}
162+
#' }
163+
#' Note that for estimating a density-on-scalar model with \code{FDboost}, the
164+
#' clr transformed densities (\code{birth_densities_clr}) serve as response, see
165+
#' also the vignette "FDboost_density-on-scalar_births".
166+
#' The original densities (\code{birth_densities}) are not needed for estimation,
167+
#' but still included for the sake of completeness.
168+
#'
169+
#' @details To compensate for the different lengths of the months, the average
170+
#' number of births per day for each month (by sex and year) was used to compute
171+
#' the birth shares from the absolute birth counts. The 12 shares corresponding
172+
#' to one year and sex form one density in the Bayes Hilbert space
173+
#' \eqn{B^2(\delta) = B^2\left( \mathcal{T}, \mathcal{A}, \delta\right)}{B^2(\delta) = B^2(T, A, \delta)},
174+
#' where \eqn{\mathcal{T} = \{1, \ldots, 12\}}{T = {1, \ldots, 12}} corresponds
175+
#' to the set of the 12 months, \eqn{\mathcal{A} := \mathcal{P}(\mathcal{T})}{A := P(T)}
176+
#' corresponds to the power set of \eqn{\mathcal{T}}{T}, and the reference measure
177+
#' \eqn{\delta := \sum_{t = 1}^{12} \delta_t} corresponds to the sum of dirac
178+
#' measures at \eqn{t \in \mathcal{T}}{t \in T}.
179+
#'
180+
#' @seealso \code{\link{clr}} for the (inverse) clr transformation.
181+
#'
182+
#' @source Statistisches Bundesamt (Destatis), Genesis-Online, data set
183+
#' \href{https://www-genesis.destatis.de/genesis//online?operation=table&code=12612-0002&bypass=true&levelindex=0&levelid=1610983595176#abreadcrumb}{12612-0002}
184+
#' (01/18/2021); \href{https://www.govdata.de/dl-de/by-2-0}{dl-de/by-2-0};
185+
#' processed by Eva-Maria Maier
186+
#'
187+
#' @references
188+
#' Maier, E.-M., Stoecker, A., Fitzenberger, B., Greven, S. (2021):
189+
#' Additive Density-on-Scalar Regression in Bayes Hilbert Spaces With an Application to Gender Economics.
190+
#' Journal, vol(number), pages.
191+
#'
192+
#' @examples
193+
#' data("birthDistribution", package = "FDboost")
194+
#'
195+
#' # Plot densities
196+
#' year_col <- rainbow(70, start = 0.5, end = 1)
197+
#' year_lty <- c(1, 2, 4, 5)
198+
#' par(mfrow = c(1, 2))
199+
#' funplot(1:12, birthDistribution$birth_densities[1:70, ], ylab = "densities", xlab = "month",
200+
#' xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Male")
201+
#' funplot(1:12, birthDistribution$birth_densities[71:140, ], ylab = "densities", xlab = "month",
202+
#' xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Female")
203+
#' par(mfrow = c(1, 1))
204+
#'
205+
#' # fit density-on-scalar model with effects for sex and year
206+
#' model <- FDboost(birth_densities_clr ~ 1 + bolsc(sex, df = 1) +
207+
#' bbsc(year, df = 1, differences = 1),
208+
#' # use bbsc() in timeformula to ensure integrate-to-zero constraint
209+
#' timeformula = ~bbsc(month, df = 4,
210+
#' # December is followed by January of subsequent year
211+
#' cyclic = TRUE,
212+
#' # knots = {1, ..., 12} with additional boundary knot
213+
#' # 0 (coinciding with 12) due to cyclic = TRUE
214+
#' knots = 1:11, boundary.knots = c(0, 12),
215+
#' # degree = 1 with these knots yields identity matrix
216+
#' # as design matrix
217+
#' degree = 1),
218+
#' data = birthDistribution, offset = 0,
219+
#' control = boost_control(mstop = 1000))
220+
#'
221+
#' # Plotting 'model' yields the clr-transformed effects
222+
#' par(mfrow = c(1, 3))
223+
#' plot(model, n1 = 12, n2 = 12)
224+
#'
225+
#' # Use inverse clr transformation to get effects in Bayes Hilbert space, e.g. for intercept
226+
#' intercept_clr <- predict(model, which = 1)[1, ]
227+
#' intercept <- clr(intercept_clr, w = 1, inverse = TRUE)
228+
#' funplot(1:12, intercept, xlab = "month", xaxp = c(1, 12, 11), pch = 20,
229+
#' main = "Intercept", ylab = expression(hat(beta)[0]), id = rep(1, 12))
230+
#'
231+
#' # Same with predictions
232+
#' predictions_clr <- predict(model)
233+
#' predictions <- t(apply(predictions_clr, 1, clr, inverse = TRUE))
234+
#' pred_ylim <- range(birthDistribution$birth_densities)
235+
#' par(mfrow = c(1, 2))
236+
#' funplot(1:12, predictions[1:70, ], ylab = "predictions", xlab = "month", ylim = pred_ylim,
237+
#' xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Male")
238+
#' funplot(1:12, predictions[71:140, ], ylab = "predictions", xlab = "month", ylim = pred_ylim,
239+
#' xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Female")
240+
#' par(mfrow = c(1, 1))
241+
"birthDistribution"

R/utilityFunctions.R

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,8 +87,9 @@ funplot <- function(x, y, id=NULL, rug=TRUE, ...){
8787
### Get further arguments passed to the matplot-functions
8888
dots <- list(...)
8989

90-
oldpar <- par(no.readonly = TRUE)
91-
on.exit(par(oldpar))
90+
## AS: caused bug, couldn't see necessity here
91+
# oldpar <- par(no.readonly = TRUE)
92+
# on.exit(par(oldpar))
9293

9394
getArguments <- function(x, dots=dots){
9495
if(any(names(dots) %in% names(x))){

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ FDboost
1010
`FDboost` Boosting Functional Regression Models.
1111

1212
The package FDboost fits regression models for functional data, i.e.,
13-
scalar-on-function, function-on-scalar and function-on-function regression models,
13+
scalar-on-function, function-on-scalar, and function-on-function regression models,
1414
by a component-wise gradient boosting algorithm.
15+
Furthermore, it can be used to fit density-on-scalar regression models.
1516

1617
## Using FDboost
1718

@@ -24,6 +25,7 @@ Instructions on how to use `FDboost` can be found in various places:
2425
- [function-on-function regression](https://cran.r-project.org/web/packages/FDboost/vignettes/FLAM_canada.pdf)
2526
- [scalar-on-function regression](https://cran.r-project.org/web/packages/FDboost/vignettes/FLAM_fuel.pdf)
2627
- [function-on-scalar regression](https://cran.r-project.org/web/packages/FDboost/vignettes/FLAM_viscosity.pdf)
28+
- [density-on-scalar regression](https://github.com/Eva2703/FDboost/blob/BayesSpace/vignettes/density-on-scalar_birth.pdf)
2729

2830
## Issues & Feature Requests
2931

data/birthDistribution.RData

25.2 KB
Binary file not shown.

man/FDboost.Rd

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)