|
21 | 21 | relashionship based on the `Wordnet <https://wordnet.princeton.edu/>`_ dataset. |
22 | 22 |
|
23 | 23 | *Summary:* We first concatenate the features with multiple temporal delays to |
24 | | -account for the slow hemodynamic response. We then use linear regression to fit |
25 | | -a predictive model of brain activity. The linear regression is regularized to |
26 | | -improve robustness to correlated features and to improve generalization |
| 24 | +account for the slow hemodynamic response. We then use linear regression to fit |
| 25 | +a predictive model of brain activity. The linear regression is regularized to |
| 26 | +improve robustness to correlated features and to improve generalization |
27 | 27 | performance. The optimal regularization hyperparameter is selected over a |
28 | 28 | grid-search with cross-validation. Finally, the model generalization |
29 | 29 | performance is evaluated on a held-out test set, comparing the model |
|
66 | 66 | ############################################################################### |
67 | 67 | # If we repeat an experiment multiple times, part of the fMRI responses might |
68 | 68 | # change. However the modeling features do not change over the repeats, so the |
69 | | -# voxelwise encoding model will predict the same signal for each repeat. To have an |
70 | | -# upper bound of the model prediction accuracy, we keep only the repeatable part of |
71 | | -# the signal by averaging the test repeats. |
| 69 | +# voxelwise encoding model will predict the same signal for each repeat. To |
| 70 | +# have an upper bound of the model prediction accuracy, we keep only the |
| 71 | +# repeatable part of the signal by averaging the test repeats. |
72 | 72 | Y_test = Y_test.mean(0) |
73 | 73 |
|
74 | 74 | print("(n_samples_test, n_voxels) =", Y_test.shape) |
|
123 | 123 | # |
124 | 124 | # Now, let's define the model pipeline. |
125 | 125 | # |
126 | | -# We first center the features, since we will not use an intercept. The |
127 | | -# mean value in fMRI recording is non-informative, so each run is detrended and |
| 126 | +# We first center the features, since we will not use an intercept. The mean |
| 127 | +# value in fMRI recording is non-informative, so each run is detrended and |
128 | 128 | # demeaned independently, and we do not need to predict an intercept value in |
129 | 129 | # the linear model. |
130 | 130 | # |
131 | 131 | # However, we prefer to avoid normalizing by the standard deviation of each |
132 | | -# feature. If the features are extracted in a consistent way from the |
133 | | -# stimulus, their relative scale is meaningful. Normalizing them independently |
134 | | -# from each other would remove this information. Moreover, the wordnet features are |
| 132 | +# feature. If the features are extracted in a consistent way from the stimulus, |
| 133 | +# their relative scale is meaningful. Normalizing them independently from each |
| 134 | +# other would remove this information. Moreover, the wordnet features are |
135 | 135 | # one-hot-encoded, which means that each feature is either present (1) or not |
136 | 136 | # present (0) in each sample. Normalizing one-hot-encoded features is not |
137 | 137 | # recommended, since it would scale disproportionately the infrequent features. |
|
141 | 141 |
|
142 | 142 | ############################################################################### |
143 | 143 | # Then we concatenate the features with multiple delays to account for the |
144 | | -# hemodynamic response. Due to neurovascular coupling, the recorded BOLD signal is |
145 | | -# delayed in time with respect to the stimulus onset. With different delayed versions |
146 | | -# of the features, the linear regression model will weigh each delayed feature |
147 | | -# with a different weight to maximize the predictions. With a sample every 2 |
148 | | -# seconds, we typically use 4 delays [1, 2, 3, 4] to cover the |
| 144 | +# hemodynamic response. Due to neurovascular coupling, the recorded BOLD signal |
| 145 | +# is delayed in time with respect to the stimulus onset. With different delayed |
| 146 | +# versions of the features, the linear regression model will weigh each delayed |
| 147 | +# feature with a different weight to maximize the predictions. With a sample |
| 148 | +# every 2 seconds, we typically use 4 delays [1, 2, 3, 4] to cover the |
149 | 149 | # hemodynamic response peak. In the next example, we further describe this |
150 | 150 | # hemodynamic response estimation. |
151 | 151 | from voxelwise_tutorials.delayer import Delayer |
|
154 | 154 | ############################################################################### |
155 | 155 | # Finally, we use a ridge regression model. Ridge regression is a linear |
156 | 156 | # regression with L2 regularization. The L2 regularization improves robustness |
157 | | -# to correlated features and improves generalization performance. However, the L2 |
158 | | -# regularization is controlled by a hyperparameter ``alpha`` that needs to be |
159 | | -# tuned for each dataset. This regularization hyperparameter is usually selected over a grid |
160 | | -# search with cross-validation, selecting the hyperparameter that maximizes the |
161 | | -# predictive performances on the validation set. More details about |
162 | | -# cross-validation can be found in the `scikit-learn documentation |
| 157 | +# to correlated features and improves generalization performance. However, the |
| 158 | +# L2 regularization is controlled by a hyperparameter ``alpha`` that needs to |
| 159 | +# be tuned for each dataset. This regularization hyperparameter is usually |
| 160 | +# selected over a grid search with cross-validation, selecting the |
| 161 | +# hyperparameter that maximizes the predictive performances on the validation |
| 162 | +# set. More details about cross-validation can be found in the `scikit-learn |
| 163 | +# documentation |
163 | 164 | # <https://scikit-learn.org/stable/modules/cross_validation.html>`_. |
164 | 165 | # |
165 | 166 | # For computational reasons, when the number of features is larger than the |
|
177 | 178 | # mean score over targets. Here, we want to find a different optimal |
178 | 179 | # hyperparameter per target/voxel, so we use the package `himalaya |
179 | 180 | # <https://github.com/gallantlab/himalaya>`_ which implements a |
180 | | -# ``scikit-learn`` compatible estimator ``KernelRidgeCV``, with |
181 | | -# hyperparameter selection independently on each target. |
| 181 | +# ``scikit-learn`` compatible estimator ``KernelRidgeCV``, with hyperparameter |
| 182 | +# selection independently on each target. |
182 | 183 | from himalaya.kernel_ridge import KernelRidgeCV |
183 | 184 |
|
184 | 185 | ############################################################################### |
|
266 | 267 | # Plot the model prediction accuracy |
267 | 268 | # ---------------------------------- |
268 | 269 | # |
269 | | -# To visualize the model prediction accuracy, we can plot it for each voxel on |
270 | | -# a flattened surface of the brain. To do so, we use a mapper that is specific |
271 | | -# to the each subject's brain. |
272 | | -# (Check previous example to see how to use the mapper to Freesurfer average |
273 | | -# surface.) |
| 270 | +# To visualize the model prediction accuracy, we can plot it for each voxel on |
| 271 | +# a flattened surface of the brain. To do so, we use a mapper that is specific |
| 272 | +# to the each subject's brain. (Check previous example to see how to use the |
| 273 | +# mapper to Freesurfer average surface.) |
274 | 274 | import matplotlib.pyplot as plt |
275 | 275 | from voxelwise_tutorials.viz import plot_flatmap_from_mapper |
276 | 276 |
|
|
280 | 280 |
|
281 | 281 | ############################################################################### |
282 | 282 | # We can see that the "wordnet" features successfully predict part of the |
283 | | -# measured brain activity, with :math:`R^2` scores as high as 0.4. Note that these |
284 | | -# scores are generalization scores, since they are computed on a test set that |
285 | | -# was not used during model fitting. |
286 | | -# Since we fitted a model independently in each voxel, we can inspect the |
287 | | -# generalization performances at the best available spatial resolution: |
288 | | -# individual voxels. |
| 283 | +# measured brain activity, with :math:`R^2` scores as high as 0.4. Note that |
| 284 | +# these scores are generalization scores, since they are computed on a test set |
| 285 | +# that was not used during model fitting. Since we fitted a model independently |
| 286 | +# in each voxel, we can inspect the generalization performances at the best |
| 287 | +# available spatial resolution: individual voxels. |
289 | 288 | # |
290 | | -# The best-predicted voxels are located in visual semantic areas like EBA, or FFA. |
291 | | -# This is expected since the wordnet features encode semantic information about |
292 | | -# the visual stimulus. For more discussions about these results, we refer the |
293 | | -# reader to the original publication [1]_. |
| 289 | +# The best-predicted voxels are located in visual semantic areas like EBA, or |
| 290 | +# FFA. This is expected since the wordnet features encode semantic information |
| 291 | +# about the visual stimulus. For more discussions about these results, we refer |
| 292 | +# the reader to the original publication [1]_. |
294 | 293 |
|
295 | 294 | ############################################################################### |
296 | 295 | # Plot the selected hyperparameters |
|
320 | 319 | # that have more predictive power. |
321 | 320 | # |
322 | 321 | # Since we know the meaning of each feature, we can interpret the large |
323 | | -# regression coefficients. In the case of wordnet features, we can even build |
324 | | -# a graph that represents the features that are linked by a semantic relationship. |
| 322 | +# regression coefficients. In the case of wordnet features, we can even build a |
| 323 | +# graph that represents the features that are linked by a semantic |
| 324 | +# relationship. |
325 | 325 |
|
326 | 326 | ############################################################################### |
327 | 327 | # We first get the (primal) ridge regression coefficients from the fitted |
|
367 | 367 |
|
368 | 368 | ############################################################################### |
369 | 369 | # Similarly to [1]_, we correct the coefficients of features linked by a |
370 | | -# semantic relationship. When building the wordnet features, if a frame was labeled |
371 | | -# with `wolf`, the authors automatically added the semantically linked categories |
372 | | -# `canine`, `carnivore`, `placental mammal`, `mamma`, `vertebrate`, `chordate`, |
373 | | -# `organism`, and `whole`. The authors thus argue that the same correction |
374 | | -# needs to be done on the coefficients. |
| 370 | +# semantic relationship. When building the wordnet features, if a frame was |
| 371 | +# labeled with `wolf`, the authors automatically added the semantically linked |
| 372 | +# categories `canine`, `carnivore`, `placental mammal`, `mamma`, `vertebrate`, |
| 373 | +# `chordate`, `organism`, and `whole`. The authors thus argue that the same |
| 374 | +# correction needs to be done on the coefficients. |
375 | 375 |
|
376 | 376 | from voxelwise_tutorials.wordnet import load_wordnet |
377 | 377 | from voxelwise_tutorials.wordnet import correct_coefficients |
|
406 | 406 | # |
407 | 407 | # In this example, because we use only a single subject and we perform a |
408 | 408 | # different voxel selection, our result is slightly different than in [1]_. We |
409 | | -# also use a different regularization parameter in each voxel, while in [1]_ |
410 | | -# all voxels had the same regularization parameter. |
411 | | -# We do not aim at reproducing exactly the results in [1]_, |
412 | | -# but we rather describe the general approach. |
| 409 | +# also use a different regularization parameter in each voxel, while in [1]_ |
| 410 | +# all voxels had the same regularization parameter. We do not aim at |
| 411 | +# reproducing exactly the results in [1]_, but we rather describe the general |
| 412 | +# approach. |
413 | 413 |
|
414 | 414 | ############################################################################### |
415 | 415 | # To project the principal component on the cortical surface, we first need to |
|
492 | 492 | # |
493 | 493 | # .. [2] Saunders, C., Gammerman, A., & Vovk, V. (1998). |
494 | 494 | # Ridge regression learning algorithm in dual variables. |
495 | | - |
496 | | -del pipeline, kernel_ridge_cv |
0 commit comments