- PMM returned predicted values instead of observed values (C++): The
pmmmodel returned predicted$\hat{y}$ for missing rows instead of the nearest observed$y$ values. Now it follows Little and Rubin (2002). - PMM with character/factor variables (R):
fill_NA_N()withmodel = "pmm"and a character dependent variable failed because it attemptedas.numeric()on non-numeric strings, producing all NAs. - Character dependent variable with lm models:
fill_NA()andfill_NA_N()withmodel = "lm_pred","lm_bayes", or"lm_noise"silently returned all NAs when the dependent variable was character with non-numeric labels (e.g.,"apple","banana").
- README: added sequential-chain MI examples (dplyr and data.table) showing how to impute multiple variables and pool with Rubin's rules.
- Introduction vignette: added full imputation workflow with sequential ordering (impute variables whose predictors are complete first), FCS (chained equations) section with data.table example, and PMM note for the OOP interface.
- MI vignette: expanded Rubin's rules derivations, added PMM MI example using the OOP interface, expanded "Important caveat" section with OOP and data.table FCS code snippets for non-monotone patterns.
- Documented PMM as a proper MI method throughout vignettes and README.
- Improved prose throughout vignettes and README.
- Added 20 PMM-specific tests (
test-pmm.R): observed-value returns, factor/character support, weighted PMM, grouped data.table, reproducibility, stochasticity. - Added 31 FCS tests (
test-fcs.R): data.table, data.frame, and OOP FCS helpers; joint-missingness handling; MI+pool workflow; comparison withmice(pooled estimates and imputed means). - Added tests for character dependent variables with non-numeric labels across all models and data types.
- Test suite expanded from 243 to 311 tests.
Kota Hattori, thank you for your feedback and for motivating me for this deep update.
pool()function for combining results from multiply imputed datasets (Rubin's rules, Barnard-Rubin df adjustment). Works withlm,glm, and other models that supportcoef()andvcov(). Validated againstmice.printandsummarymethods for pooled results.
- fixed residual variance estimator in
lm_noiseandlm_bayesstochastic models: divisor changed fromn-p-1ton-p, wherepalready counts the intercept column supplied by the user. The previous formula over-corrected by one degree of freedom.
- new vignette on missing data mechanisms (MCAR/MAR/MNAR) and MI workflows.
- refactored introduction vignette with
pool()examples. - improved README with MI section and benchmark table.
- test suite for
pool(), including comparison againstmice::pool(). - new weighted regression validation test against
lm.wfit(). - refactored C++ source code for clarity.
- fixed typos in error messages and documentation.
- regenerated performance benchmarks on R 4.4.3, macOS M3 Pro.
- cran related update,
OMP_THREAD_LIMIT.
- fixed CRAN Notes.
- style the cpp code.
- VIF() should be more stable.
- simplified
naive_fill_NA, It is a regular sampling imputation now. - Fixed
dontrunexamples. - replace
ggplot2::aes_stringwithggplot2::aes, as the former is depreciated. - regenerate performance benchmarks on R 4.2.1.
- styler over the code.
- improve documentation.
tinyverseworld, less dependencies.- fixed imputations for character variables under linear models.
- speed up the
pmmmodel. - more tests, higher
covr. - rerun performance tests.
- update URL inside README.
- improve coverage.
- use drop = FALSE when subsetting the data.frame
- healthy DESCRIPTION file, fix spaces.
- more input validation.
- update broken vignette links
- solve broken UpSetR::upset reference links
- upset_NA based on UpSetR::upset plot function
- compare_imp plot function
- new logo
- remove times argument
- R CRAN r-oldrel-windows-ix86+x86_64 problems
- lifecycle problems
- fill_NA_N has a new model which is pmm - predictive mean matching
- fast PMM - presorting and binary search
- naive_fill_NA - auto function for data.frames - bayes mean and lda
- ridge argument for lm models - adding small disturbance to diag of X'X
- lm_bayes provide more disturbance
- new tests
- codecov
- remove old urls form vignettes
- providing a more comfortable environment for data.table/dplyr users
- expand vignette and documentation
- updated performance benchmarks
- fix a glitch - e.g. lack of correct warning for a lda model with zero variance variables
- data.table problem - jump to R 3.5.0
- valgrind - a lot of optimizations - problem with arma::exp and arma::randn
- optimize a lot of code
- methods/functions resistant to glitches
- fix imputations with a grouping variable - error if there is precisly one NA at any group
- add data.table to benchmarks - model with a grouping variable
- add R functions (
fill_NA_N,fill_NA,VIF) which could be used by a data.table user
- add
impute_Nmethod - optimized multiple imputations - add
vifmethod - Variance inflation factors
- vignette,readme,description,todo
- adjust to solaris
- reference - set a grouping variable by a reference but as a numeric vector - integer vector do not work (randomly lost pointer)