Skip to content

Commit bae92ec

Browse files
committed
ENH scale regression coefficients with sqrt(R^2)
1 parent 9b57785 commit bae92ec

3 files changed

Lines changed: 46 additions & 46 deletions

File tree

tutorials/notebooks/shortclips/03_plot_wordnet_model.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -443,7 +443,7 @@
443443
"cell_type": "markdown",
444444
"metadata": {},
445445
"source": [
446-
"Here, we are only interested in the voxels with good generalization\nperformances. We select an arbitrary threshold of 0.05 (R^2 score).\n\n"
446+
"Because the ridge model allows a different regularization per voxel, the\nregression coefficients may have very different scales. In turn, these\ndifferent scales can introduce a bias in the interpretation, focusing the\nattention disproportionately on voxels fitted with the lowest alpha. To\naddress this issue, we rescale the regression coefficient to have a norm\nequal to the square-root of the $R^2$ scores. We found empirically that\nthis rescaling best matches results obtained with a regularization shared\naccross voxels. This rescaling also removes the need to select only best\nperforming voxels, because voxels with low prediction accuracies are rescaled\nto have a low norm.\n\n"
447447
]
448448
},
449449
{
@@ -454,7 +454,7 @@
454454
},
455455
"outputs": [],
456456
"source": [
457-
"primal_coef_selection = primal_coef[:, scores > 0.05]"
457+
"primal_coef /= np.linalg.norm(primal_coef, axis=0)[None]\nprimal_coef *= np.sqrt(np.maximum(0, scores))[None]"
458458
]
459459
},
460460
{
@@ -472,7 +472,7 @@
472472
},
473473
"outputs": [],
474474
"source": [
475-
"# split the ridge coefficients per delays\ndelayer = pipeline.named_steps['delayer']\nprimal_coef_per_delay = delayer.reshape_by_delays(primal_coef_selection,\n axis=0)\nprint(\"(n_delays, n_features, n_voxels) =\", primal_coef_per_delay.shape)\n\n# average over delays\naverage_coef = np.mean(primal_coef_per_delay, axis=0)\nprint(\"(n_features, n_voxels) =\", average_coef.shape)"
475+
"# split the ridge coefficients per delays\ndelayer = pipeline.named_steps['delayer']\nprimal_coef_per_delay = delayer.reshape_by_delays(primal_coef, axis=0)\nprint(\"(n_delays, n_features, n_voxels) =\", primal_coef_per_delay.shape)\ndel primal_coef\n\n# average over delays\naverage_coef = np.mean(primal_coef_per_delay, axis=0)\nprint(\"(n_features, n_voxels) =\", average_coef.shape)\ndel primal_coef_per_delay"
476476
]
477477
},
478478
{
@@ -551,7 +551,7 @@
551551
"cell_type": "markdown",
552552
"metadata": {},
553553
"source": [
554-
"According to the authors of [1]_, \"this principal component distinguishes\nbetween categories with high stimulus energy (e.g. moving objects like\n`person` and `vehicle`) and those with low stimulus energy (e.g. stationary\nobjects like `sky` and `city`)\".\n\nIn this example, because we use only a single subject and we perform a\ndifferent voxel selection, our result is slightly different than in [1]_. We\nalso use a different regularization parameter in each voxel, while in [1]_\nall voxels had the same regularization parameter. We do not aim at\nreproducing exactly the results in [1]_, but we rather describe the general\napproach.\n\n"
554+
"According to the authors of [1]_, \"this principal component distinguishes\nbetween categories with high stimulus energy (e.g. moving objects like\n`person` and `vehicle`) and those with low stimulus energy (e.g. stationary\nobjects like `sky` and `city`)\".\n\nIn this example, because we use only a single subject and we perform a\ndifferent voxel selection, our result is slightly different than in the\noriginal publication. We also use a different regularization parameter in\neach voxel, while in [1]_ all voxels had the same regularization parameter.\nHowever, we do not aim at reproducing exactly the results of the original\npublication, but we rather describe the general approach.\n\n"
555555
]
556556
},
557557
{
@@ -569,7 +569,7 @@
569569
},
570570
"outputs": [],
571571
"source": [
572-
"# split the ridge coefficients per delays\nprimal_coef_per_delay = delayer.reshape_by_delays(primal_coef, axis=0)\nprint(\"(n_delays, n_features, n_voxels) =\", primal_coef_per_delay.shape)\ndel primal_coef\n\n# average over delays\naverage_coef = np.mean(primal_coef_per_delay, axis=0)\nprint(\"(n_features, n_voxels) =\", average_coef.shape)\ndel primal_coef_per_delay\n\n# transform with the fitted PCA\naverage_coef_transformed = pca.transform(average_coef.T).T\nprint(\"(n_components, n_voxels) =\", average_coef_transformed.shape)\ndel average_coef\n\n# We make sure vmin = -vmax, so that the colormap is centered on 0.\nvmax = np.percentile(np.abs(average_coef_transformed), 99.9)\n\n# plot the primal weights projected on the first principal component.\nax = plot_flatmap_from_mapper(average_coef_transformed[0], mapper_file,\n vmin=-vmax, vmax=vmax, cmap='coolwarm')\nplt.show()"
572+
"# transform with the fitted PCA\naverage_coef_transformed = pca.transform(average_coef.T).T\nprint(\"(n_components, n_voxels) =\", average_coef_transformed.shape)\ndel average_coef\n\n# We make sure vmin = -vmax, so that the colormap is centered on 0.\nvmax = np.percentile(np.abs(average_coef_transformed), 99.9)\n\n# plot the primal weights projected on the first principal component.\nax = plot_flatmap_from_mapper(average_coef_transformed[0], mapper_file,\n vmin=-vmax, vmax=vmax, cmap='coolwarm')\nplt.show()"
573573
]
574574
},
575575
{

tutorials/notebooks/shortclips/merged_for_colab.ipynb

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2005,8 +2005,16 @@
20052005
"cell_type": "markdown",
20062006
"metadata": {},
20072007
"source": [
2008-
"Here, we are only interested in the voxels with good generalization\n",
2009-
"performances. We select an arbitrary threshold of 0.05 (R^2 score).\n",
2008+
"Because the ridge model allows a different regularization per voxel, the\n",
2009+
"regression coefficients may have very different scales. In turn, these\n",
2010+
"different scales can introduce a bias in the interpretation, focusing the\n",
2011+
"attention disproportionately on voxels fitted with the lowest alpha. To\n",
2012+
"address this issue, we rescale the regression coefficient to have a norm\n",
2013+
"equal to the square-root of the $R^2$ scores. We found empirically that\n",
2014+
"this rescaling best matches results obtained with a regularization shared\n",
2015+
"accross voxels. This rescaling also removes the need to select only best\n",
2016+
"performing voxels, because voxels with low prediction accuracies are rescaled\n",
2017+
"to have a low norm.\n",
20102018
"\n"
20112019
]
20122020
},
@@ -2018,7 +2026,8 @@
20182026
},
20192027
"outputs": [],
20202028
"source": [
2021-
"primal_coef_selection = primal_coef[:, scores > 0.05]"
2029+
"primal_coef /= np.linalg.norm(primal_coef, axis=0)[None]\n",
2030+
"primal_coef *= np.sqrt(np.maximum(0, scores))[None]"
20222031
]
20232032
},
20242033
{
@@ -2039,13 +2048,14 @@
20392048
"source": [
20402049
"# split the ridge coefficients per delays\n",
20412050
"delayer = pipeline.named_steps['delayer']\n",
2042-
"primal_coef_per_delay = delayer.reshape_by_delays(primal_coef_selection,\n",
2043-
" axis=0)\n",
2051+
"primal_coef_per_delay = delayer.reshape_by_delays(primal_coef, axis=0)\n",
20442052
"print(\"(n_delays, n_features, n_voxels) =\", primal_coef_per_delay.shape)\n",
2053+
"del primal_coef\n",
20452054
"\n",
20462055
"# average over delays\n",
20472056
"average_coef = np.mean(primal_coef_per_delay, axis=0)\n",
2048-
"print(\"(n_features, n_voxels) =\", average_coef.shape)"
2057+
"print(\"(n_features, n_voxels) =\", average_coef.shape)\n",
2058+
"del primal_coef_per_delay"
20492059
]
20502060
},
20512061
{
@@ -2166,11 +2176,11 @@
21662176
"objects like `sky` and `city`)\".\n",
21672177
"\n",
21682178
"In this example, because we use only a single subject and we perform a\n",
2169-
"different voxel selection, our result is slightly different than in [1]_. We\n",
2170-
"also use a different regularization parameter in each voxel, while in [1]_\n",
2171-
"all voxels had the same regularization parameter. We do not aim at\n",
2172-
"reproducing exactly the results in [1]_, but we rather describe the general\n",
2173-
"approach.\n",
2179+
"different voxel selection, our result is slightly different than in the\n",
2180+
"original publication. We also use a different regularization parameter in\n",
2181+
"each voxel, while in [1]_ all voxels had the same regularization parameter.\n",
2182+
"However, we do not aim at reproducing exactly the results of the original\n",
2183+
"publication, but we rather describe the general approach.\n",
21742184
"\n"
21752185
]
21762186
},
@@ -2191,16 +2201,6 @@
21912201
},
21922202
"outputs": [],
21932203
"source": [
2194-
"# split the ridge coefficients per delays\n",
2195-
"primal_coef_per_delay = delayer.reshape_by_delays(primal_coef, axis=0)\n",
2196-
"print(\"(n_delays, n_features, n_voxels) =\", primal_coef_per_delay.shape)\n",
2197-
"del primal_coef\n",
2198-
"\n",
2199-
"# average over delays\n",
2200-
"average_coef = np.mean(primal_coef_per_delay, axis=0)\n",
2201-
"print(\"(n_features, n_voxels) =\", average_coef.shape)\n",
2202-
"del primal_coef_per_delay\n",
2203-
"\n",
22042204
"# transform with the fitted PCA\n",
22052205
"average_coef_transformed = pca.transform(average_coef.T).T\n",
22062206
"print(\"(n_components, n_voxels) =\", average_coef_transformed.shape)\n",

tutorials/shortclips/03_plot_wordnet_model.py

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -331,22 +331,32 @@
331331
print("(n_delays * n_features, n_voxels) =", primal_coef.shape)
332332

333333
###############################################################################
334-
# Here, we are only interested in the voxels with good generalization
335-
# performances. We select an arbitrary threshold of 0.05 (R^2 score).
336-
primal_coef_selection = primal_coef[:, scores > 0.05]
334+
# Because the ridge model allows a different regularization per voxel, the
335+
# regression coefficients may have very different scales. In turn, these
336+
# different scales can introduce a bias in the interpretation, focusing the
337+
# attention disproportionately on voxels fitted with the lowest alpha. To
338+
# address this issue, we rescale the regression coefficient to have a norm
339+
# equal to the square-root of the :math:`R^2` scores. We found empirically that
340+
# this rescaling best matches results obtained with a regularization shared
341+
# accross voxels. This rescaling also removes the need to select only best
342+
# performing voxels, because voxels with low prediction accuracies are rescaled
343+
# to have a low norm.
344+
primal_coef /= np.linalg.norm(primal_coef, axis=0)[None]
345+
primal_coef *= np.sqrt(np.maximum(0, scores))[None]
337346

338347
###############################################################################
339348
# Then, we aggregate the coefficients across the different delays.
340349

341350
# split the ridge coefficients per delays
342351
delayer = pipeline.named_steps['delayer']
343-
primal_coef_per_delay = delayer.reshape_by_delays(primal_coef_selection,
344-
axis=0)
352+
primal_coef_per_delay = delayer.reshape_by_delays(primal_coef, axis=0)
345353
print("(n_delays, n_features, n_voxels) =", primal_coef_per_delay.shape)
354+
del primal_coef
346355

347356
# average over delays
348357
average_coef = np.mean(primal_coef_per_delay, axis=0)
349358
print("(n_features, n_voxels) =", average_coef.shape)
359+
del primal_coef_per_delay
350360

351361
###############################################################################
352362
# Even after averaging over delays, the coefficient matrix is still too large
@@ -405,26 +415,16 @@
405415
# objects like `sky` and `city`)".
406416
#
407417
# In this example, because we use only a single subject and we perform a
408-
# different voxel selection, our result is slightly different than in [1]_. We
409-
# also use a different regularization parameter in each voxel, while in [1]_
410-
# all voxels had the same regularization parameter. We do not aim at
411-
# reproducing exactly the results in [1]_, but we rather describe the general
412-
# approach.
418+
# different voxel selection, our result is slightly different than in the
419+
# original publication. We also use a different regularization parameter in
420+
# each voxel, while in [1]_ all voxels had the same regularization parameter.
421+
# However, we do not aim at reproducing exactly the results of the original
422+
# publication, but we rather describe the general approach.
413423

414424
###############################################################################
415425
# To project the principal component on the cortical surface, we first need to
416426
# use the fitted PCA to transform the primal weights of all voxels.
417427

418-
# split the ridge coefficients per delays
419-
primal_coef_per_delay = delayer.reshape_by_delays(primal_coef, axis=0)
420-
print("(n_delays, n_features, n_voxels) =", primal_coef_per_delay.shape)
421-
del primal_coef
422-
423-
# average over delays
424-
average_coef = np.mean(primal_coef_per_delay, axis=0)
425-
print("(n_features, n_voxels) =", average_coef.shape)
426-
del primal_coef_per_delay
427-
428428
# transform with the fitted PCA
429429
average_coef_transformed = pca.transform(average_coef.T).T
430430
print("(n_components, n_voxels) =", average_coef_transformed.shape)

0 commit comments

Comments
 (0)