Skip to content

Commit ece4dc1

Browse files
committed
ENH clarify the text
1 parent e6f2d22 commit ece4dc1

12 files changed

Lines changed: 42 additions & 33 deletions

tutorials/movies_3T/01_plot_explainable_variance.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,8 @@
7575
# repeats. We also see that some voxels reach an explainable variance of 0.7,
7676
# which is quite high. It means that these voxels consistently record the same
7777
# activity across a repeated stimulus, and thus are good targets for encoding
78-
# models.
78+
# models. Of course, this set of explainable voxels changes from task to
79+
# task, depending on what you are trying to model.
7980

8081
###############################################################################
8182
# Map to subject flatmap
@@ -85,7 +86,7 @@
8586
# values to the subject brain. This can be done with `pycortex
8687
# <https://gallantlab.github.io/pycortex/>`_, which can create interactive 3D
8788
# viewers to be displayed in any modern browser. ``pycortex`` can also display
88-
# flatten maps of the cortical surface, to visualize the entire cortical
89+
# flattened maps of the cortical surface, to visualize the entire cortical
8990
# surface at once.
9091
#
9192
# Here, we do not share the anatomical information of the subjects for privacy
@@ -96,9 +97,8 @@
9697
#
9798
# The first mapper is 2D matrix of shape (n_pixels, n_voxels), that map each
9899
# voxel to a set of pixel in a flatmap. The matrix is efficient stored using a
99-
# ``scipy`` sparse CSR matrix format. To ease the use of this mapper, we
100-
# provide an example function ``plot_flatmap_from_mapper``. This function mimic
101-
# the behavior of ``pycortex.quickshow``.
100+
# ``scipy`` sparse CSR matrix format. The function ``plot_flatmap_from_mapper``
101+
# provides an example of how to use the mapper and visualize the flatmap.
102102

103103
from voxelwise_tutorials.viz import plot_flatmap_from_mapper
104104

@@ -140,9 +140,9 @@
140140

141141
###############################################################################
142142
# Then, we load the "fsaverage" mapper. The mapper is a matrix of shape
143-
# (n_vertices, n_voxels), which map each voxel to some vertices in the
144-
# fsaverage surface. It is also stored with a sparse CSR matrix format. The
145-
# mapper is applied with a dot product ``@`` (equivalent to ``np.dot``).
143+
# (n_vertices, n_voxels), which maps each voxel to some vertices in the
144+
# fsaverage surface. It is stored as a sparse CSR matrix. The mapper is applied
145+
# with a dot product ``@`` (equivalent to ``np.dot``).
146146
from voxelwise_tutorials.io import load_hdf5_sparse_array
147147
voxel_to_fsaverage = load_hdf5_sparse_array(mapper_file,
148148
key='voxel_to_fsaverage')

tutorials/movies_3T/02_plot_wordnet_model.py

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,10 @@
2020
labels. To interpret our model, labels can be organized in a graph of semantic
2121
relashionship based on the `Wordnet <https://wordnet.princeton.edu/>`_ dataset.
2222
23-
*Summary:* We first concatenate the features with multiple delays, to account
24-
for the slow hemodynamic response. We then fit a predictive model of BOLD
25-
activity, using a linear regression that weights differently each delayed
26-
feature. The linear regression is regularized to improve robustness to
23+
*Summary:* We first concatenate the features with multiple temporal delays, to
24+
account for the slow hemodynamic response. We then fit a predictive model of
25+
BOLD activity, using a linear regression that weighs each delayed feature
26+
differently. The linear regression is regularized to improve robustness to
2727
correlated features and to improve generalization. The optimal regularization
2828
hyperparameter is selected over a grid-search with cross-validation. Finally,
2929
the model generalization performance is evaluated on a held-out test set,
@@ -46,7 +46,13 @@
4646
# Load the data
4747
# -------------
4848
#
49-
# We first load the fMRI responses.
49+
# We first load the fMRI responses. These responses have been preprocessed as
50+
# decribed in [1]_. The data is separated into a training set ``Y_train`` and a
51+
# testing set ``Y_test``. The training set is used for fitting models, and
52+
# selecting the best models and hyperparameters. The testing set is later used
53+
# to estimate the generalization performances of the selected model. The
54+
# testing set contains multiple repetitions of the same experiment, to estimate
55+
# an upper bound of the model performances (cf. previous example).
5056
import numpy as np
5157
from voxelwise_tutorials.io import load_hdf5_array
5258

@@ -75,7 +81,10 @@
7581
Y_test = np.nan_to_num(Y_test)
7682

7783
###############################################################################
78-
# Then, we load the semantic "wordnet" features.
84+
# Then, we load the semantic "wordnet" features, extracted from the stimulus at
85+
# each time point. The features corresponding to the training set are noted
86+
# ``X_train``, and the features corresponding to the testing set are noted
87+
# ``X_test``.
7988
feature_space = "wordnet"
8089

8190
file_name = os.path.join(directory, "features", f"{feature_space}.hdf")
@@ -123,7 +132,7 @@
123132
#
124133
# However, we prefer not to normalize by the standard deviation of each
125134
# feature. Indeed, if the features are extracted in a consistent way from the
126-
# stimulus, there relative scale is meaningful. Normalizing them independently
135+
# stimulus, their relative scale is meaningful. Normalizing them independently
127136
# from each other would remove this meaning. Moreover, the wordnet features are
128137
# one-hot-encoded, which means that each feature is either present (1) or not
129138
# present (0) in each sample. Normalizing one-hot-encoded features is not
@@ -225,7 +234,7 @@
225234
###############################################################################
226235
# We can display the ``scikit-learn`` pipeline with an HTML diagram.
227236
from sklearn import set_config
228-
set_config(display='diagram')
237+
set_config(display='diagram') # requires scikit-learn 0.23
229238
pipeline
230239

231240
###############################################################################

tutorials/movies_3T/03_plot_hemodynamic_response.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@
119119

120120
###############################################################################
121121
from sklearn import set_config
122-
set_config(display='diagram')
122+
set_config(display='diagram') # requires scikit-learn 0.23
123123
pipeline
124124

125125
###############################################################################

tutorials/movies_3T/04_plot_motion_energy_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@
127127

128128
###############################################################################
129129
from sklearn import set_config
130-
set_config(display='diagram')
130+
set_config(display='diagram') # requires scikit-learn 0.23
131131
pipeline_motion_energy
132132

133133
###############################################################################

tutorials/movies_3T/05_plot_banded_ridge_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@
201201
# ``Kernelizer``.
202202
from himalaya.kernel_ridge import Kernelizer
203203
from sklearn import set_config
204-
set_config(display='diagram')
204+
set_config(display='diagram') # requires scikit-learn 0.23
205205

206206
preprocess_pipeline = make_pipeline(
207207
StandardScaler(with_mean=True, with_std=False),

tutorials/movies_4T/02_plot_ridge_model.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,9 +131,9 @@
131131
from sklearn.pipeline import make_pipeline
132132
from sklearn.preprocessing import StandardScaler
133133

134-
# display the scikit-learn pipeline with an HTML diagram
134+
# Display the scikit-learn pipeline with an HTML diagram.
135135
from sklearn import set_config
136-
set_config(display='diagram')
136+
set_config(display='diagram') # requires scikit-learn 0.23
137137

138138
###############################################################################
139139
# With one target, we could directly use the pipeline in scikit-learn's

tutorials/notebooks/movies_3T/01_plot_explainable_variance.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -134,14 +134,14 @@
134134
"cell_type": "markdown",
135135
"metadata": {},
136136
"source": [
137-
"We see that most voxels have a rather low explainable variance, around 0.1\n(when not using the bias correction). This is expected, since most voxels are\nnot directly driven by a visual stimulus, and their activity change over\nrepeats. We also see that some voxels reach an explainable variance of 0.7,\nwhich is quite high. It means that these voxels consistently record the same\nactivity across a repeated stimulus, and thus are good targets for encoding\nmodels.\n\n"
137+
"We see that most voxels have a rather low explainable variance, around 0.1\n(when not using the bias correction). This is expected, since most voxels are\nnot directly driven by a visual stimulus, and their activity change over\nrepeats. We also see that some voxels reach an explainable variance of 0.7,\nwhich is quite high. It means that these voxels consistently record the same\nactivity across a repeated stimulus, and thus are good targets for encoding\nmodels. Of course, this set of explainable voxels changes from task to\ntask, depending on what you are trying to model.\n\n"
138138
]
139139
},
140140
{
141141
"cell_type": "markdown",
142142
"metadata": {},
143143
"source": [
144-
"Map to subject flatmap\n----------------------\n\nTo better understand the distribution of explainable variance, we map the\nvalues to the subject brain. This can be done with `pycortex\n<https://gallantlab.github.io/pycortex/>`_, which can create interactive 3D\nviewers to be displayed in any modern browser. ``pycortex`` can also display\nflatten maps of the cortical surface, to visualize the entire cortical\nsurface at once.\n\nHere, we do not share the anatomical information of the subjects for privacy\nconcerns. Instead, we provide two mappers:\n\n- to map the voxels to a (subject-specific) flatmap\n- to map the voxels to the Freesurfer average cortical surface (\"fsaverage\")\n\nThe first mapper is 2D matrix of shape (n_pixels, n_voxels), that map each\nvoxel to a set of pixel in a flatmap. The matrix is efficient stored using a\n``scipy`` sparse CSR matrix format. To ease the use of this mapper, we\nprovide an example function ``plot_flatmap_from_mapper``. This function mimic\nthe behavior of ``pycortex.quickshow``.\n\n"
144+
"Map to subject flatmap\n----------------------\n\nTo better understand the distribution of explainable variance, we map the\nvalues to the subject brain. This can be done with `pycortex\n<https://gallantlab.github.io/pycortex/>`_, which can create interactive 3D\nviewers to be displayed in any modern browser. ``pycortex`` can also display\nflattened maps of the cortical surface, to visualize the entire cortical\nsurface at once.\n\nHere, we do not share the anatomical information of the subjects for privacy\nconcerns. Instead, we provide two mappers:\n\n- to map the voxels to a (subject-specific) flatmap\n- to map the voxels to the Freesurfer average cortical surface (\"fsaverage\")\n\nThe first mapper is 2D matrix of shape (n_pixels, n_voxels), that map each\nvoxel to a set of pixel in a flatmap. The matrix is efficient stored using a\n``scipy`` sparse CSR matrix format. The function ``plot_flatmap_from_mapper``\nprovides an example of how to use the mapper and visualize the flatmap.\n\n"
145145
]
146146
},
147147
{
@@ -191,7 +191,7 @@
191191
"cell_type": "markdown",
192192
"metadata": {},
193193
"source": [
194-
"Then, we load the \"fsaverage\" mapper. The mapper is a matrix of shape\n(n_vertices, n_voxels), which map each voxel to some vertices in the\nfsaverage surface. It is also stored with a sparse CSR matrix format. The\nmapper is applied with a dot product ``@`` (equivalent to ``np.dot``).\n\n"
194+
"Then, we load the \"fsaverage\" mapper. The mapper is a matrix of shape\n(n_vertices, n_voxels), which maps each voxel to some vertices in the\nfsaverage surface. It is stored as a sparse CSR matrix. The mapper is applied\nwith a dot product ``@`` (equivalent to ``np.dot``).\n\n"
195195
]
196196
},
197197
{

tutorials/notebooks/movies_3T/02_plot_wordnet_model.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# Fit a ridge model with wordnet features\n\n\nIn this example, we model the fMRI responses with semantic \"wordnet\" features,\nmanually annotated on each frame of the movie stimulus. The model is a\nregularized linear regression model, known as ridge regression. Since this\nmodel is used to predict brain activity from the stimulus, it is called a\n(voxelwise) encoding model.\n\nThis example reproduces part of the analysis described in Huth et al (2012)\n[1]_. See this publication for more details about the experiment, the wordnet\nfeatures, along with more results and more discussions.\n\n*Wordnet features:* The features used in this example are semantic labels\nmanually annotated on each frame of the movie stimulus. The semantic labels\ninclude nouns (such as \"woman\", \"car\", or \"building\") and verbs (such as\n\"talking\", \"touching\", or \"walking\"), for a total of 1705 distinct category\nlabels. To interpret our model, labels can be organized in a graph of semantic\nrelashionship based on the `Wordnet <https://wordnet.princeton.edu/>`_ dataset.\n\n*Summary:* We first concatenate the features with multiple delays, to account\nfor the slow hemodynamic response. We then fit a predictive model of BOLD\nactivity, using a linear regression that weights differently each delayed\nfeature. The linear regression is regularized to improve robustness to\ncorrelated features and to improve generalization. The optimal regularization\nhyperparameter is selected over a grid-search with cross-validation. Finally,\nthe model generalization performance is evaluated on a held-out test set,\ncomparing the model predictions with the corresponding ground-truth fMRI\nresponses.\n"
18+
"\n# Fit a ridge model with wordnet features\n\n\nIn this example, we model the fMRI responses with semantic \"wordnet\" features,\nmanually annotated on each frame of the movie stimulus. The model is a\nregularized linear regression model, known as ridge regression. Since this\nmodel is used to predict brain activity from the stimulus, it is called a\n(voxelwise) encoding model.\n\nThis example reproduces part of the analysis described in Huth et al (2012)\n[1]_. See this publication for more details about the experiment, the wordnet\nfeatures, along with more results and more discussions.\n\n*Wordnet features:* The features used in this example are semantic labels\nmanually annotated on each frame of the movie stimulus. The semantic labels\ninclude nouns (such as \"woman\", \"car\", or \"building\") and verbs (such as\n\"talking\", \"touching\", or \"walking\"), for a total of 1705 distinct category\nlabels. To interpret our model, labels can be organized in a graph of semantic\nrelashionship based on the `Wordnet <https://wordnet.princeton.edu/>`_ dataset.\n\n*Summary:* We first concatenate the features with multiple temporal delays, to\naccount for the slow hemodynamic response. We then fit a predictive model of\nBOLD activity, using a linear regression that weighs each delayed feature\ndifferently. The linear regression is regularized to improve robustness to\ncorrelated features and to improve generalization. The optimal regularization\nhyperparameter is selected over a grid-search with cross-validation. Finally,\nthe model generalization performance is evaluated on a held-out test set,\ncomparing the model predictions with the corresponding ground-truth fMRI\nresponses.\n"
1919
]
2020
},
2121
{
@@ -51,7 +51,7 @@
5151
"cell_type": "markdown",
5252
"metadata": {},
5353
"source": [
54-
"Load the data\n-------------\n\nWe first load the fMRI responses.\n\n"
54+
"Load the data\n-------------\n\nWe first load the fMRI responses. These responses have been preprocessed as\ndecribed in [1]_. The data is separated into a training set ``Y_train`` and a\ntesting set ``Y_test``. The training set is used for fitting models, and\nselecting the best models and hyperparameters. The testing set is later used\nto estimate the generalization performances of the selected model. The\ntesting set contains multiple repetitions of the same experiment, to estimate\nan upper bound of the model performances (cf. previous example).\n\n"
5555
]
5656
},
5757
{
@@ -105,7 +105,7 @@
105105
"cell_type": "markdown",
106106
"metadata": {},
107107
"source": [
108-
"Then, we load the semantic \"wordnet\" features.\n\n"
108+
"Then, we load the semantic \"wordnet\" features, extracted from the stimulus at\neach time point. The features corresponding to the training set are noted\n``X_train``, and the features corresponding to the testing set are noted\n``X_test``.\n\n"
109109
]
110110
},
111111
{
@@ -159,7 +159,7 @@
159159
"cell_type": "markdown",
160160
"metadata": {},
161161
"source": [
162-
"Define the model\n----------------\n\nNow, let's define the model pipeline.\n\nWe first center the features, since we will not use an intercept. Indeed, the\nmean value in fMRI recording is non-informative, so each run is detrended and\ndemeaned independently, and we do not need to predict an intercept value in\nthe linear model.\n\nHowever, we prefer not to normalize by the standard deviation of each\nfeature. Indeed, if the features are extracted in a consistent way from the\nstimulus, there relative scale is meaningful. Normalizing them independently\nfrom each other would remove this meaning. Moreover, the wordnet features are\none-hot-encoded, which means that each feature is either present (1) or not\npresent (0) in each sample. Normalizing one-hot-encoded features is not\nrecommended, since it would scale disproportionately the infrequent features.\n\n"
162+
"Define the model\n----------------\n\nNow, let's define the model pipeline.\n\nWe first center the features, since we will not use an intercept. Indeed, the\nmean value in fMRI recording is non-informative, so each run is detrended and\ndemeaned independently, and we do not need to predict an intercept value in\nthe linear model.\n\nHowever, we prefer not to normalize by the standard deviation of each\nfeature. Indeed, if the features are extracted in a consistent way from the\nstimulus, their relative scale is meaningful. Normalizing them independently\nfrom each other would remove this meaning. Moreover, the wordnet features are\none-hot-encoded, which means that each feature is either present (1) or not\npresent (0) in each sample. Normalizing one-hot-encoded features is not\nrecommended, since it would scale disproportionately the infrequent features.\n\n"
163163
]
164164
},
165165
{
@@ -314,7 +314,7 @@
314314
},
315315
"outputs": [],
316316
"source": [
317-
"from sklearn import set_config\nset_config(display='diagram')\npipeline"
317+
"from sklearn import set_config\nset_config(display='diagram') # requires scikit-learn 0.23\npipeline"
318318
]
319319
},
320320
{

tutorials/notebooks/movies_3T/03_plot_hemodynamic_response.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@
192192
},
193193
"outputs": [],
194194
"source": [
195-
"from sklearn import set_config\nset_config(display='diagram')\npipeline"
195+
"from sklearn import set_config\nset_config(display='diagram') # requires scikit-learn 0.23\npipeline"
196196
]
197197
},
198198
{

tutorials/notebooks/movies_3T/04_plot_motion_energy_model.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@
181181
},
182182
"outputs": [],
183183
"source": [
184-
"from sklearn import set_config\nset_config(display='diagram')\npipeline_motion_energy"
184+
"from sklearn import set_config\nset_config(display='diagram') # requires scikit-learn 0.23\npipeline_motion_energy"
185185
]
186186
},
187187
{

0 commit comments

Comments
 (0)