boost-R
diff --git a/‎DESCRIPTION‎
Lines changed: 1 addition & 0 deletions b/‎DESCRIPTION‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎NAMESPACE‎
Lines changed: 1 addition & 0 deletions b/‎NAMESPACE‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎R/FDboost-package.R‎
Lines changed: 6 additions & 0 deletions b/‎R/FDboost-package.R‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎R/clr_functions.R‎
Lines changed: 241 additions & 0 deletions b/‎R/clr_functions.R‎
Lines changed: 241 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 3 additions & 1 deletion b/‎README.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎data/birthDistribution.RData‎
25.2 KB b/‎data/birthDistribution.RData‎
25.2 KB
diff --git a/‎inst/NEWS.Rd‎
Lines changed: 11 additions & 1 deletion b/‎inst/NEWS.Rd‎
Lines changed: 11 additions & 1 deletion
diff --git a/‎man/FDboost-package.Rd‎
Lines changed: 6 additions & 0 deletions b/‎man/FDboost-package.Rd‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎man/FDboost.Rd‎
Lines changed: 1 addition & 1 deletion b/‎man/FDboost.Rd‎
Lines changed: 1 addition & 1 deletion
@@ -49,6 +49,7 @@ Collate:
     'baselearners.R'
     'baselearnersX.R'
     'bootstrapCIs.R'
+    'clr_functions.R'
     'constrainedX.R'
     'crossvalidation.R'
     'FDboostLSS.R'
 
@@ -41,6 +41,7 @@ export(bolsc)
 export(bootstrapCI)
 export(brandomc)
 export(bsignal)
+export(clr)
 export(cvLong)
 export(cvMa)
 export(funMRD)
 
@@ -10,10 +10,12 @@
 #' This package is intended to fit regression models with functional variables. 
 #' It is possible to fit models with functional response and/or functional covariates, 
 #' resulting in scalar-on-function, function-on-scalar and function-on-function regression. 
+#' Furthermore, the package can be used to fit density-on-scalar regression models.
 #' Details on the functional regression models that can be fitted with \pkg{FDboost}
 #' can be found in Brockhaus et al. (2015, 2017, 2018) and Ruegamer et al. (2018).  
 #' A hands-on tutorial for the package can be found 
 #' in Brockhaus, Ruegamer and Greven (2017), see \url{https://arxiv.org/abs/1705.10662}.
+#' For density-on-scalar regression models see Maier et al. (2021).
 #' 
 #' Using component-wise gradient boosting as fitting procedure, \pkg{FDboost} relies on 
 #' the R package \pkg{mboost} (Hothorn et al., 2017). 
@@ -50,6 +52,10 @@
 #' A Hands-on Tutorial Using the R Package mboost. Computational Statistics, 29, 3-35. 
 #' \url{https://cran.r-project.org/package=mboost/vignettes/mboost_tutorial.pdf}
 #' 
+#' Maier, E.-M., Stoecker, A., Fitzenberger, B., Greven, S. (2021):
+#' Additive Density-on-Scalar Regression in Bayes Hilbert Spaces with an Application to Gender Economics.
+#' arXiv preprint arXiv:2110.11771.
+#' 
 #' Ruegamer D., Brockhaus, S., Gentsch K., Scherer, K., Greven, S. (2018). 
 #' Boosting factor-specific functional historical models for the detection of synchronization in bioelectrical signals. 
 #' Journal of the Royal Statistical Society: Series C (Applied Statistics), 67, 621-642.
 
@@ -0,0 +1,241 @@
+#' Clr and inverse clr transformation
+#' 
+#' \code{clr} computes the clr or inverse clr transformation of a vector \code{f}
+#' with respect to integration weights \code{w}, corresponding to a Bayes Hilbert space
+#' \eqn{B^2(\mu) = B^2(\mathcal{T}, \mathcal{A}, \mu)}{B^2(\mu) = B^2(T, A, \mu)}.
+#' 
+#' @param f a vector containing the function values (evaluated on a grid) of the 
+#' function \eqn{f} to transform. If \code{inverse = TRUE}, \code{f} must be a density,
+#' i.e., all entries must be positive and usually \code{f} integrates to one. 
+#' If \code{inverse = FALSE}, \code{f} should integrate to zero, see Details.
+#' @param w a vector of length one or of the same length as \code{f} containing 
+#' positive integration weights. If \code{w} has length one, this
+#' weight is used for all function values. The integral of \eqn{f} is approximated
+#' via \eqn{\int_{\mathcal{T}} f \, \mathrm{d}\mu \approx 
+#' \sum_{j=1}^m}{\sum_{j=1}^m} \code{w}\eqn{_j} \code{f}\eqn{_j},
+#' where \eqn{m} equals the length of \code{f}.
+#' @param inverse if \code{TRUE}, the inverse clr transformation is computed.
+#' 
+#' @details The clr transformation maps a density \eqn{f} from \eqn{B^2(\mu)} to
+#' \eqn{L^2_0(\mu) := \{ f \in L^2(\mu) ~|~ \int_{\mathcal{T}} f \, \mathrm{d}\mu = 0\}}{L^2_0(\mu) := {f \in L^2(\mu) | \int_T f d\mu = 0}}
+#' via
+#' \deqn{\mathrm{clr}(f) := \log f - \frac{1}{\mu (\mathcal{T})} \int_{\mathcal{T}} \log f \, \mathrm{d}\mu.}{clr(f) := log f - 1/\mu(T) * \int_T log f d\mu.}
+#' The inverse clr transformation maps a function \eqn{f} from
+#' \eqn{L^2_0(\mu)} to \eqn{B^2(\mu)} via
+#' \deqn{\mathrm{clr}^{-1}(f) := \frac{\exp f}{\int_{\mathcal{T}} \exp f \, \mathrm{d}\mu}.}{clr^{-1}(f) := (exp f) / (\int_T \exp f d\mu).}
+#' Note that in contrast to Maier et al. (2021), this definition of the inverse
+#' clr transformation includes normalization, yielding the respective probability 
+#' density function (representative of the equivalence class of proportional
+#' functions in \eqn{B^2(\mu)}). 
+#' 
+#' The (inverse) clr transformation depends not only on \eqn{f}, but also on the
+#' underlying measure space \eqn{\left( \mathcal{T}, \mathcal{A}, \mu\right)}{(T, A, \mu)}, 
+#' which determines the integral. In \code{clr} this is specified via the 
+#' integration weights \code{w}. E.g., for a discrete set \eqn{\mathcal{T}}{T}
+#' with \eqn{\mathcal{A} = \mathcal{P}(\mathcal{T})}{A = P(T)} the power set of 
+#' \eqn{\mathcal{T}}{T} and \eqn{\mu = \sum_{t \in T} \delta_t} the sum of dirac
+#' measures at \eqn{t \in \mathcal{T}}{t \in T}, the default \code{w = 1} is
+#' the correct choice. In this case, integrals are indeed computed exactly, not
+#' only approximately. 
+#' For an interval \eqn{\mathcal{T} = [a, b]}{T = [a, b]}
+#' with \eqn{\mathcal{A} = \mathcal{B}}{A = B} the Borel \eqn{\sigma}-algebra 
+#' restricted to \eqn{\mathcal{T}}{T} and \eqn{\mu = \lambda} the Lebesgue measure,
+#' the choice of \code{w} depends on the grid on which the function was evaluated:
+#' \code{w}\eqn{_j} must correspond to the length of the subinterval of \eqn{[a, b]}, which 
+#' \code{f}\eqn{_j} represents.
+#' E.g., for a grid with equidistant distance \eqn{d}, where the boundary grid 
+#' values are \eqn{a + \frac{d}{2}}{a + d/2} and \eqn{b - \frac{d}{2}}{b - d/2}
+#' (i.e., the grid points are centers of intervals of size \eqn{d}),
+#' equal weights \eqn{d} should be chosen for \code{w}. 
+#' 
+#' The clr transformation is crucial for density-on-scalar regression 
+#' since estimating the clr transformed model in \eqn{L^2_0(\mu)} is equivalent
+#' to estimating the original model in \eqn{B^2(\mu)} (as the clr transformation
+#' is an isometric isomorphism), see also the vignette "FDboost_density-on-scalar_births"
+#' and Maier et al. (2021).
+#' 
+#' @return A vector of the same length as \code{f} containing the (inverse) clr 
+#' transformation of \code{f}.
+#' 
+#' @author Eva-Maria Maier
+#' 
+#' @references 
+#' Maier, E.-M., Stoecker, A., Fitzenberger, B., Greven, S. (2021):
+#' Additive Density-on-Scalar Regression in Bayes Hilbert Spaces with an Application to Gender Economics.
+#' arXiv preprint arXiv:2110.11771.
+#' 
+#' @examples
+#' ### Continuous case (T = [0, 1] with Lebesgue measure):
+#' # evaluate density of a Beta distribution on an equidistant grid
+#' g <- seq(from = 0.005, to = 0.995, by = 0.01)
+#' f <- dbeta(g, 2, 5)
+#' # compute clr transformation with distance of two grid points as integration weight
+#' f_clr <- clr(f, w = 0.01)
+#' # visualize result
+#' plot(g, f_clr , type = "l")
+#' abline(h = 0, col = "grey")
+#' # compute inverse clr transformation (w as above)
+#' f_clr_inv <- clr(f_clr, w = 0.01, inverse = TRUE)
+#' # visualize result
+#' plot(g, f, type = "l")
+#' lines(g, f_clr_inv, lty = 2, col = "red")
+#' 
+#' ### Discrete case (T = {1, ..., 12} with sum of dirac measures at t in T):
+#' data("birthDistribution", package = "FDboost")
+#' # fit density-on-scalar model with effects for sex and year
+#' model <- FDboost(birth_densities_clr ~ 1 + bolsc(sex, df = 1) + 
+#'                    bbsc(year, df = 1, differences = 1),
+#'                  # use bbsc() in timeformula to ensure integrate-to-zero constraint
+#'                  timeformula = ~bbsc(month, df = 4, 
+#'                                      # December is followed by January of subsequent year
+#'                                      cyclic = TRUE, 
+#'                                      # knots = {1, ..., 12} with additional boundary knot
+#'                                      # 0 (coinciding with 12) due to cyclic = TRUE
+#'                                      knots = 1:11, boundary.knots = c(0, 12), 
+#'                                      # degree = 1 with these knots yields identity matrix 
+#'                                      # as design matrix
+#'                                      degree = 1),
+#'                  data = birthDistribution, offset = 0, 
+#'                  control = boost_control(mstop = 1000))
+#' # Extract predictions (clr-transformed!) and transform them to Bayes Hilbert space
+#' predictions_clr <- predict(model)
+#' predictions <- t(apply(predictions_clr, 1, clr, inverse = TRUE))
+#' 
+#' @export
+clr <- function(f, w = 1, inverse = FALSE) {
+  stopifnot("inverse must be TRUE or FALSE." = inverse %in% c(TRUE, FALSE))
+  stopifnot("f must be numeric." = is.numeric(f))
+  stopifnot("f contains missing values." = !(anyNA(f)))
+  n <- length(f)
+  if (length(w) == 1) {
+    w <- rep(w, n)
+  }
+  stopifnot("w must be contain positive weights of length 1 or the same length as f." = 
+              length(w) == n & is.numeric(w) & all(w > 0))
+  int_f <- sum(f * w)
+  if (!inverse) {
+    stopifnot("As a density, f must be positive." = all(f > 0))
+    if (!isTRUE(all.equal(int_f, 1, tolerance = 0.01))) {
+      warning(paste0("f is not a probability density with respect to w. Its integral is ",
+                     round(int_f, 2), "."))
+    }
+    return(log(f) - 1 / sum(w) * sum(log(f) * w))
+  } else {
+    if (!isTRUE(all.equal(int_f, 0, tolerance = 0.01))) {
+      warning(paste0("f does not integrate to zero with respect to w. Its integral is ",
+                     round(int_f, 2), "."))
+    }
+    return(exp(f) / sum(exp(f) * w))
+  }
+}
+
+
+#' Densities of live births in Germany 
+#'
+#' \code{birthDistribution} contains densities of live births in Germany over the
+#' months per year (1950 to 2019) and sex (male and female), resulting in 140
+#' densities.
+#'
+#' @docType data
+#'
+#' @usage data(birthDistribution, package = "FDboost")
+#'
+#' @format A list in the correct format to be passed to \code{\link{FDboost}} for 
+#' density-on-scalar regression:
+#' \describe{
+#' \item{\code{birth_densities}}{A 140 x 12 matrix containing the birth densities
+#' in its rows. The first 70 rows correspond to male newborns, the second 70 rows 
+#' to female ones. Within both of these, the years are ordered increasingly 
+#' (1950-2019), see also \code{sex} and \code{year}.}
+#' \item{\code{birth_densities_clr}}{A 140 x 12 matrix containing the clr
+#' transformed densities in its rows. Same structure as \code{birth_densities}.}
+#' \item{\code{sex}}{A factor vector of length 140 with levels \code{"m"} (male)
+#' and \code{"f"} (female), corresponding to the sex of the newborns for the rows of
+#' \code{birth_densities} and \code{birth_densities_clr}. The first 70 elements 
+#' are \code{"m"}, the second 70 \code{"f"}.}
+#' \item{\code{year}}{A vector of length 140 containing the integers from 1950 
+#' to 2019 two times (\code{c(1950:2019, 1950:2019)}), corresponding to the years
+#' for the rows of \code{birth_densities} and \code{birth_densities_clr}.}
+#' \item{\code{month}}{A vector containing the integers from 1 to 12, corresponding
+#' to the months for the columns of \code{birth_densities} and \code{birth_densities_clr}
+#' (domain \eqn{\mathcal{T}}{T} of the (clr-)densities).}
+#' }
+#' Note that for estimating a density-on-scalar model with \code{FDboost}, the
+#' clr transformed densities (\code{birth_densities_clr}) serve as response, see
+#' also the vignette "FDboost_density-on-scalar_births".
+#' The original densities (\code{birth_densities}) are not needed for estimation,
+#' but still included for the sake of completeness.
+#' 
+#' @details To compensate for the different lengths of the months, the average 
+#' number of births per day for each month (by sex and year) was used to compute
+#' the birth shares from the absolute birth counts. The 12 shares corresponding
+#' to one year and sex form one density in the Bayes Hilbert space 
+#' \eqn{B^2(\delta) = B^2\left( \mathcal{T}, \mathcal{A}, \delta\right)}{B^2(\delta) = B^2(T, A, \delta)},
+#' where \eqn{\mathcal{T} = \{1, \ldots, 12\}}{T = {1, \ldots, 12}} corresponds
+#' to the set of the 12 months, \eqn{\mathcal{A} := \mathcal{P}(\mathcal{T})}{A := P(T)}
+#' corresponds to the power set of \eqn{\mathcal{T}}{T}, and the reference measure
+#' \eqn{\delta := \sum_{t = 1}^{12} \delta_t} corresponds to the sum of dirac 
+#' measures at \eqn{t \in \mathcal{T}}{t \in T}.
+#' 
+#' @seealso \code{\link{clr}} for the (inverse) clr transformation.
+#'
+#' @source Statistisches Bundesamt (Destatis), Genesis-Online, data set 
+#' \href{https://www-genesis.destatis.de/genesis//online?operation=table&code=12612-0002&bypass=true&levelindex=0&levelid=1610983595176#abreadcrumb}{12612-0002} 
+#' (01/18/2021); \href{https://www.govdata.de/dl-de/by-2-0}{dl-de/by-2-0}; 
+#' processed by Eva-Maria Maier
+#' 
+#' @references 
+#' Maier, E.-M., Stoecker, A., Fitzenberger, B., Greven, S. (2021):
+#' Additive Density-on-Scalar Regression in Bayes Hilbert Spaces with an Application to Gender Economics.
+#' arXiv preprint arXiv:2110.11771.
+#' 
+#' @examples
+#' data("birthDistribution", package = "FDboost")
+#' 
+#' # Plot densities
+#' year_col <- rainbow(70, start = 0.5, end = 1)
+#' year_lty <- c(1, 2, 4, 5)
+#' par(mfrow = c(1, 2))
+#' funplot(1:12, birthDistribution$birth_densities[1:70, ], ylab = "densities", xlab = "month", 
+#'         xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Male")
+#' funplot(1:12, birthDistribution$birth_densities[71:140, ], ylab = "densities", xlab = "month", 
+#'         xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Female")
+#' par(mfrow = c(1, 1))
+#' 
+#' # fit density-on-scalar model with effects for sex and year
+#' model <- FDboost(birth_densities_clr ~ 1 + bolsc(sex, df = 1) + 
+#'                    bbsc(year, df = 1, differences = 1),
+#'                  # use bbsc() in timeformula to ensure integrate-to-zero constraint
+#'                  timeformula = ~bbsc(month, df = 4, 
+#'                                      # December is followed by January of subsequent year
+#'                                      cyclic = TRUE, 
+#'                                      # knots = {1, ..., 12} with additional boundary knot
+#'                                      # 0 (coinciding with 12) due to cyclic = TRUE
+#'                                      knots = 1:11, boundary.knots = c(0, 12), 
+#'                                      # degree = 1 with these knots yields identity matrix 
+#'                                      # as design matrix
+#'                                      degree = 1),
+#'                  data = birthDistribution, offset = 0, 
+#'                  control = boost_control(mstop = 1000))
+#' 
+#' # Plotting 'model' yields the clr-transformed effects
+#' par(mfrow = c(1, 3))
+#' plot(model, n1 = 12, n2 = 12)
+#' 
+#' # Use inverse clr transformation to get effects in Bayes Hilbert space, e.g. for intercept
+#' intercept_clr <- predict(model, which = 1)[1, ]
+#' intercept <- clr(intercept_clr, w = 1, inverse = TRUE)
+#' funplot(1:12, intercept, xlab = "month", xaxp = c(1, 12, 11), pch = 20,
+#'         main = "Intercept", ylab = expression(hat(beta)[0]), id = rep(1, 12))
+#' 
+#' # Same with predictions
+#' predictions_clr <- predict(model)
+#' predictions <- t(apply(predictions_clr, 1, clr, inverse = TRUE))
+#' pred_ylim <- range(birthDistribution$birth_densities)
+#' par(mfrow = c(1, 2))
+#' funplot(1:12, predictions[1:70, ], ylab = "predictions", xlab = "month", ylim = pred_ylim,
+#'         xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Male")
+#' funplot(1:12, predictions[71:140, ], ylab = "predictions", xlab = "month", ylim = pred_ylim,
+#'         xaxp = c(1, 12, 11), pch = 20, col = year_col, lty = year_lty, main = "Female")
+#' par(mfrow = c(1, 1))
+"birthDistribution"
@@ -10,8 +10,9 @@ FDboost
 `FDboost` Boosting Functional Regression Models.
 
 The package FDboost fits regression models for functional data, i.e., 
-scalar-on-function, function-on-scalar and function-on-function regression models, 
+scalar-on-function, function-on-scalar, and function-on-function regression models,
 by a component-wise gradient boosting algorithm. 
+Furthermore, it can be used to fit density-on-scalar regression models.
 
 ## Using FDboost
 
@@ -24,6 +25,7 @@ Instructions on how to use `FDboost` can be found in various places:
   - [function-on-function regression](https://cran.r-project.org/web/packages/FDboost/vignettes/FLAM_canada.pdf)
   - [scalar-on-function regression](https://cran.r-project.org/web/packages/FDboost/vignettes/FLAM_fuel.pdf)
   - [function-on-scalar regression](https://cran.r-project.org/web/packages/FDboost/vignettes/FLAM_viscosity.pdf)
+  - [density-on-scalar regression](https://github.com/Eva2703/FDboost/blob/BayesSpace/vignettes/density-on-scalar_birth.pdf)
 
 ## Issues & Feature Requests
 
 
@@ -1,14 +1,24 @@
 \name{NEWS}
 \title{News for Package 'FDboost'}
 
-\section{Changes in FDboost version 1.0-2 (2021-06-29)}{
+
+\section{Changes in FDboost version 1.1-0 (2022-07-12)}{
   \subsection{Miscellaneous}{
   \itemize{
   \item Anisotropic tensor-product operators \code{b1 \%A0\% b2} and \code{b1 \%Xa0\% b2} now 
   also working when \code{lambda} is specified for \code{b1} and \code{df} is specified for \code{b2}
   (or vice versa).
   }
   }
+  \subsection{New feature}{
+  \itemize{
+  \item New function \code{clr} to compute the centered-log-ratio transform and its
+  inverse for density-on-scalar regression in Bayes spaces.
+  \item New dataset \code{birthDistribution}.
+  \item New vignette illustrating density-on-function regression on 
+  the \code{birthDistribution} data.
+  }
+  }
 }
 
 \section{Changes in FDboost version 0.3-4 (2020-08-31)}{