Skip to content

Commit dfeda06

Browse files
committed
DOC more narrative improvements
1 parent abef728 commit dfeda06

6 files changed

Lines changed: 388 additions & 170 deletions

File tree

tutorials/movies_3T/02_plot_wordnet_model.py

Lines changed: 37 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,30 @@
55
66
In this example, we model the fMRI responses with semantic "wordnet" features,
77
manually annotated on each frame of the movie stimulus. The model is a
8-
regularized linear regression model, also known as ridge regression. Since this
8+
regularized linear regression model, known as ridge regression. Since this
99
model is used to predict brain activity from the stimulus, it is called a
1010
(voxelwise) encoding model.
1111
1212
This example reproduces part of the analysis described in Huth et al (2012)
1313
[1]_. See this publication for more details about the experiment, the wordnet
1414
features, along with more results and more discussions.
1515
16+
*Wordnet features:* The features used in this example are semantic labels
17+
manually annotated on each frame of the movie stimulus. The semantic labels
18+
include nouns (such as "woman", "car", or "building") and verbs (such as
19+
"talking", "touching", or "walking"), for a total of 1705 distinct category
20+
labels. To interpret our model, labels can be organized in a graph of semantic
21+
relashionship based on the `Wordnet <https://wordnet.princeton.edu/>`_ dataset.
22+
1623
*Summary:* We first concatenate the features with multiple delays, to account
17-
for the hemodynamic response. We then fit a predictive model of BOLD activity,
18-
using a linear regression that weights differently each delayed feature. The
19-
linear regression is regularized to improve robustness to correlated features
20-
and to improve generalization. The optimal regularization hyperparameter is
21-
selected over a grid-search with cross-validation. Finally, the model
22-
generalization performance is evaluated on a held-out test set, comparing the
23-
model predictions with the corresponding ground-truth fMRI responses.
24+
for the slow hemodynamic response. We then fit a predictive model of BOLD
25+
activity, using a linear regression that weights differently each delayed
26+
feature. The linear regression is regularized to improve robustness to
27+
correlated features and to improve generalization. The optimal regularization
28+
hyperparameter is selected over a grid-search with cross-validation. Finally,
29+
the model generalization performance is evaluated on a held-out test set,
30+
comparing the model predictions with the corresponding ground-truth fMRI
31+
responses.
2432
"""
2533
###############################################################################
2634
# Path of the data directory
@@ -59,6 +67,8 @@
5967
# example).
6068
Y_test = Y_test.mean(0)
6169

70+
print("(n_samples_test, n_voxels) =", Y_test.shape)
71+
6272
###############################################################################
6373
# We fill potential NaN (not-a-number) values with zeros.
6474
Y_train = np.nan_to_num(Y_train)
@@ -125,13 +135,22 @@
125135
delayer = Delayer(delays=[1, 2, 3, 4])
126136

127137
###############################################################################
128-
# Finally, we use a ridge regression model. For computational reasons, when the
129-
# number of features is larger than the number of samples, it is more efficient
130-
# to solve a ridge regression using the (equivalent) dual formulation [2]_.
131-
# This dual formulation is equivalent to kernel ridge regression with a linear
132-
# kernel. Here, we have 3600 training samples, and 1705 * 4 = 6820 features (we
133-
# multiply by 4 since we use 4 time delays), therefore it is more efficient to
134-
# use kernel ridge regression.
138+
# Finally, we use a ridge regression model. Ridge regression is a linear
139+
# regression with a L2 regularization. The L2 regularizatin improves robustness
140+
# to correlated features and improves generalization. However, the L2
141+
# regularization is controled by a hyperparameter ``alpha`` that needs to be
142+
# tuned. This regularization hyperparameter is usually selected over a grid
143+
# search with cross-validation, selecting the hyperparameter that maximizes the
144+
# predictive performances on the validation set. More details about
145+
# cross-validation can be found in the `scikit-learn documentation
146+
# <https://scikit-learn.org/stable/modules/cross_validation.html>`_.
147+
#
148+
# For computational reasons, when the number of features is larger than the
149+
# number of samples, it is more efficient to solve a ridge regression using the
150+
# (equivalent) dual formulation [2]_. This dual formulation is equivalent to
151+
# kernel ridge regression with a linear kernel. Here, we have 3600 training
152+
# samples, and 1705 * 4 = 6820 features (we multiply by 4 since we use 4 time
153+
# delays), therefore it is more efficient to use kernel ridge regression.
135154
#
136155
# With one target, we could directly use the pipeline in ``scikit-learn``'s
137156
# ``GridSearchCV``, to select the optimal regularization hyperparameter
@@ -140,7 +159,8 @@
140159
# only optimize (for example) the mean score over targets. Here, we want to
141160
# find a different optimal hyperparameter per target/voxel, so we use the
142161
# package `himalaya <https://github.com/gallantlab/himalaya>`_ which implements
143-
# a ``scikit-learn`` compatible estimator ``KernelRidgeCV``.
162+
# a ``scikit-learn`` compatible estimator ``KernelRidgeCV``, with
163+
# hyperparameter selection independently on each target.
144164
from himalaya.kernel_ridge import KernelRidgeCV
145165

146166
###############################################################################
@@ -156,6 +176,7 @@
156176
from himalaya.backend import set_backend
157177
backend = set_backend("torch_cuda", on_error="warn")
158178
print(backend)
179+
159180
###############################################################################
160181
# To speed up model fitting on GPU, we use single precision float numbers.
161182
# (This step probably does not change significantly the performances on non-GPU

tutorials/movies_3T/03_plot_motion_energy_model.py

Lines changed: 63 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -3,40 +3,37 @@
33
Fit a ridge model with motion energy features
44
=============================================
55
6-
In this second example, we model the fMRI responses with motion-energy features
7-
extracted from the movie stimulus. The model is still a regularized linear
8-
regression model.
6+
In this example, we model the fMRI responses with motion-energy features
7+
extracted from the movie stimulus. The model is a regularized linear regression
8+
model.
99
1010
This tutorial reproduces part of the analysis described in Nishimoto et al
1111
(2011) [1]_. See this publication for more details about the experiment, the
1212
motion-energy features, along with more results and more discussions.
1313
14-
Motion-energy features result from filtering a video stimulus with
15-
spatio-temporal Gabor filters. A pyramid of filters is used to compute the
16-
motion-energy features at multiple spatial and temporal scales.
17-
18-
As in the previous example, we first concatenate the features with multiple
19-
delays, to account for the hemodynamic response. A linear regression model
20-
then weights each delayed feature with a different weight, to build a
21-
predictive model of BOLD activity.
22-
Again, the linear regression is regularized to improve robustness to correlated
23-
features and to improve generalization. The optimal regularization
24-
hyperparameter is selected over a grid-search with cross-validation.
25-
Finally, the model generalization performance is evaluated on a held-out test
26-
set, comparing the model predictions with the corresponding ground-truth fMRI
27-
responses.
28-
29-
The ridge model is fitted with the package
30-
`himalaya <https://github.com/gallantlab/himalaya>`_.
14+
*Motion-energy features:* Motion-energy features result from filtering a video
15+
stimulus with spatio-temporal Gabor filters. A pyramid of filters is used to
16+
compute the motion-energy features at multiple spatial and temporal scales.
17+
18+
*Summary:* As in the previous example, we first concatenate the features with
19+
multiple delays, to account for the slow hemodynamic response. A linear
20+
regression model then weights each delayed feature with a different weight, to
21+
build a predictive model of BOLD activity. Again, the linear regression is
22+
regularized to improve robustness to correlated features and to improve
23+
generalization. The optimal regularization hyperparameter is selected
24+
independently on each voxel over a grid-search with cross-validation. Finally,
25+
the model generalization performance is evaluated on a held-out test set,
26+
comparing the model predictions with the ground-truth fMRI responses.
3127
"""
3228
###############################################################################
33-
34-
# path of the data directory
29+
# Path of the data directory
3530
import os
3631
from voxelwise_tutorials.io import get_data_home
3732
directory = os.path.join(get_data_home(), "vim-4")
3833
print(directory)
3934

35+
###############################################################################
36+
4037
# modify to use another subject
4138
subject = "S01"
4239

@@ -52,29 +49,32 @@
5249
file_name = os.path.join(directory, "responses", f"{subject}_responses.hdf")
5350
Y_train = load_hdf5_array(file_name, key="Y_train")
5451
Y_test = load_hdf5_array(file_name, key="Y_test")
55-
run_onsets = load_hdf5_array(file_name, key="run_onsets")
5652

57-
# We average the test repeats, since we cannot model the non-repeatable part of
58-
# fMRI responses. It means that the prediction :math:`R^2` scores will be
59-
# relative to the explainable variance.
53+
print("(n_samples_train, n_voxels) =", Y_train.shape)
54+
print("(n_repeats, n_samples_test, n_voxels) =", Y_test.shape)
55+
56+
###############################################################################
57+
# We average the test repeats, to remove the non-repeatable part of fMRI
58+
# responses.
6059
Y_test = Y_test.mean(0)
6160

62-
# We remove NaN values present on non-cortical voxels.
61+
print("(n_samples_test, n_voxels) =", Y_test.shape)
62+
63+
###############################################################################
64+
# We fill potential NaN (not-a-number) values with zeros.
6365
Y_train = np.nan_to_num(Y_train)
6466
Y_test = np.nan_to_num(Y_test)
6567

6668
###############################################################################
67-
# Then we load the "motion-energy" features, that we will
68-
# use for the linear regression model.
69+
# Then we load the precomputed "motion-energy" features.
6970

7071
feature_space = "motion_energy"
7172
file_name = os.path.join(directory, "features", f"{feature_space}.hdf")
7273
X_train = load_hdf5_array(file_name, key="X_train")
7374
X_test = load_hdf5_array(file_name, key="X_test")
7475

75-
# We use single precision float to speed up model fitting on GPU.
76-
X_train = X_train.astype("float32")
77-
X_test = X_test.astype("float32")
76+
print("(n_samples_train, n_features) =", X_train.shape)
77+
print("(n_samples_test, n_features) =", X_test.shape)
7878

7979
###############################################################################
8080
# Define the cross-validation scheme
@@ -88,8 +88,10 @@
8888

8989
# indice of first sample of each run
9090
run_onsets = load_hdf5_array(file_name, key="run_onsets")
91+
print(run_onsets)
9192

92-
# define a cross-validation splitter, compatible with ``scikit-learn``` API
93+
###############################################################################
94+
# We define a cross-validation splitter, compatible with ``scikit-learn`` API.
9395
n_samples_train = X_train.shape[0]
9496
cv = generate_leave_one_run_out(n_samples_train, run_onsets)
9597
cv = check_cv(cv) # copy the cross-validation splitter into a reusable list
@@ -108,6 +110,9 @@
108110
from himalaya.backend import set_backend
109111
backend = set_backend("torch_cuda", on_error="warn")
110112

113+
X_train = X_train.astype("float32")
114+
X_test = X_test.astype("float32")
115+
111116
alphas = np.logspace(1, 20, 20)
112117

113118
pipeline_motion_energy = make_pipeline(
@@ -135,6 +140,8 @@
135140
scores_motion_energy = pipeline_motion_energy.score(X_test, Y_test)
136141
scores_motion_energy = backend.to_numpy(scores_motion_energy)
137142

143+
print("(n_voxels,) =", scores_motion_energy.shape)
144+
138145
###############################################################################
139146
# Plot the model performances
140147
# ---------------------------
@@ -150,27 +157,27 @@
150157

151158
###############################################################################
152159
# The motion-energy features lead to large generalization scores in the
153-
# early visual cortex (V1, V2? V3, ...). For more discussions about these
160+
# early visual cortex (V1, V2, V3, ...). For more discussions about these
154161
# results, we refer the reader to the original publication [1]_.
155162

156163
###############################################################################
157164
# Compare with the wordnet model
158165
# ------------------------------
159166
#
160-
# It is interesting to compare the performances of this motion-energy model
161-
# to the performances of the wordnet model fitted in the previous example.
162-
# To compare them, we first need to fit again the semantic wordnet model.
167+
# Interestingly, the motion-energy model performs well in different brain
168+
# regions than the semantic "wordnet" model fitted in the previous example. To
169+
# compare the two models, we first need to fit again the wordnet model.
163170

164171
feature_space = "wordnet"
165172
file_name = os.path.join(directory, "features", f"{feature_space}.hdf")
166173
X_train = load_hdf5_array(file_name, key="X_train")
167174
X_test = load_hdf5_array(file_name, key="X_test")
168175

169-
# We use single precision float to speed up model fitting on GPU.
170176
X_train = X_train.astype("float32")
171177
X_test = X_test.astype("float32")
172178

173-
# We can create an unfitted copy of the pipeline with the `clone` function.
179+
###############################################################################
180+
# We can create an unfitted copy of the pipeline with the ``clone`` function.
174181
from sklearn.base import clone
175182
pipeline_wordnet = clone(pipeline_motion_energy)
176183
pipeline_wordnet
@@ -202,7 +209,7 @@
202209
###############################################################################
203210
# Interestingly, the well predicted voxels are different in the two models.
204211
# To further describe these differences, we can plot both performances on the
205-
# same flatmap.
212+
# same flatmap, using a 2D colormap.
206213

207214
from voxelwise_tutorials.viz import plot_2d_flatmap_from_mapper
208215

@@ -214,27 +221,27 @@
214221
plt.show()
215222

216223
###############################################################################
217-
# The blue regions are well predicted by the motion-energy features,
218-
# the orange regions are well predicted by the wordnet features,
219-
# and the white regions are well predicted by both feature spaces.
224+
# The blue regions are well predicted by the motion-energy features, the orange
225+
# regions are well predicted by the wordnet features, and the white regions are
226+
# well predicted by both feature spaces.
220227
#
221228
# Interestingly, a large part of the visual semantic areas are not only well
222-
# predicted by the wordnet features, but also by the motion-energy features,
223-
# as indicated by the white color. Since these two features spaces encode
224-
# quite different information, two interpretations are possible.
225-
# In the first one, the two feature spaces encode complementary information,
226-
# and could be used jointly to further increase the generalization
227-
# performances. In the second interpretation, both feature spaces encode the
228-
# same information, because of spurious correlation in the stimulus. For
229-
# example, all faces in the stimulus might be located in the same part of the
230-
# visual field, thus a motion-energy feature at this location might contain
231-
# all the necessary information to predict the presence of a face, without
232-
# specifically encoding for the semantic of faces.
229+
# predicted by the wordnet features, but also by the motion-energy features, as
230+
# indicated by the white color. Since these two features spaces encode quite
231+
# different information, two interpretations are possible. In the first
232+
# interpretation, the two feature spaces encode complementary information, and
233+
# could be used jointly to further increase the generalization performances. In
234+
# the second interpretation, both feature spaces encode the same information,
235+
# because of spurious correlation in the stimulus. For example, all faces in
236+
# the stimulus might be located in the same part of the visual field, thus a
237+
# motion-energy feature at this location might contain all the necessary
238+
# information to predict the presence of a face, without specifically encoding
239+
# for the semantic of faces.
233240
#
234241
# To better disentangle the two feature spaces, we developed a joint model
235242
# called `banded ridge regression` [2]_, which fits multiple feature spaces
236-
# simultaneously with optimal regularization for each feature space.
237-
# This model is described in the next example.
243+
# simultaneously with optimal regularization for each feature space. This model
244+
# is described in the next example.
238245

239246
###############################################################################
240247
# References

0 commit comments

Comments
 (0)