MNT use markdown in notebooks

mvdoc · mvdoc · commit 85941440cef2 · 2023-02-09T17:35:17.000-08:00
diff --git a/tutorials/notebooks/shortclips/00_download_shortclips.ipynb b/tutorials/notebooks/shortclips/00_download_shortclips.ipynb
@@ -89,7 +89,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.10.9"
+      "version": "3.8.3"
     }
   },
   "nbformat": 4,
diff --git a/tutorials/notebooks/shortclips/00_setup_colab.ipynb b/tutorials/notebooks/shortclips/00_setup_colab.ipynb
@@ -132,7 +132,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.10.9"
+      "version": "3.8.3"
     }
   },
   "nbformat": 4,
diff --git a/tutorials/notebooks/shortclips/01_plot_explainable_variance.ipynb b/tutorials/notebooks/shortclips/01_plot_explainable_variance.ipynb
@@ -337,7 +337,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.10.9"
+      "version": "3.8.3"
     }
   },
   "nbformat": 4,
diff --git a/tutorials/notebooks/shortclips/02_plot_ridge_regression.ipynb b/tutorials/notebooks/shortclips/02_plot_ridge_regression.ipynb
@@ -386,7 +386,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "To summarize, to select the best hyperparameter $\\alpha$, the standard\nmethod is to perform a grid search:\n\n  - Split the training set into two subsets: one subset used to fit the\n    models, and one subset to estimate the prediction accuracy (*validation\n    set*)\n  - Define a number of hyperparameter candidates, for example [0.1, 1, 10,\n    100].\n  - Fit a separate ridge model with each hyperparameter candidate\n    $\\alpha$.\n  - Compute the prediction accuracy on the validation set.\n  - Select the hyperparameter candidate leading to the best validation\n    accuracy.\n\nTo make the grid search less sensitive to the choice of how the training data\nwas split, the process can be repeated for multiple splits. Then, the\ndifferent prediction accuracies can be averaged over splits before the\nhyperparameter selection. Thus, the process is called a *cross-validation*.\n\nLearn more about hyperparameter selection and cross-validation on the\n`scikit-learn documentation\n<https://scikit-learn.org/stable/modules/cross_validation.html>`_.\n\n"
+        "To summarize, to select the best hyperparameter $\\alpha$, the standard\nmethod is to perform a grid search:\n\n  - Split the training set into two subsets: one subset used to fit the\n    models, and one subset to estimate the prediction accuracy (*validation\n    set*)\n  - Define a number of hyperparameter candidates, for example [0.1, 1, 10,\n    100].\n  - Fit a separate ridge model with each hyperparameter candidate\n    $\\alpha$.\n  - Compute the prediction accuracy on the validation set.\n  - Select the hyperparameter candidate leading to the best validation\n    accuracy.\n\nTo make the grid search less sensitive to the choice of how the training data\nwas split, the process can be repeated for multiple splits. Then, the\ndifferent prediction accuracies can be averaged over splits before the\nhyperparameter selection. Thus, the process is called a *cross-validation*.\n\nLearn more about hyperparameter selection and cross-validation on the\n[scikit-learn documentation](https://scikit-learn.org/stable/modules/cross_validation.html).\n\n"
       ]
     }
   ],
@@ -406,7 +406,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.7.12"
+      "version": "3.8.3"
     }
   },
   "nbformat": 4,
diff --git a/tutorials/notebooks/shortclips/03_plot_wordnet_model.ipynb b/tutorials/notebooks/shortclips/03_plot_wordnet_model.ipynb
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Fit a ridge model with wordnet features\n\nIn this example, we model the fMRI responses with semantic \"wordnet\" features,\nmanually annotated on each frame of the movie stimulus. The model is a\nregularized linear regression model, known as ridge regression. Since this\nmodel is used to predict brain activity from the stimulus, it is called a\n(voxelwise) encoding model.\n\nThis example reproduces part of the analysis described in Huth et al (2012)\n[1]_. See this publication for more details about the experiment, the wordnet\nfeatures, along with more results and more discussions.\n\n*Wordnet features:* The features used in this example are semantic labels\nmanually annotated on each frame of the movie stimulus. The semantic labels\ninclude nouns (such as \"woman\", \"car\", or \"building\") and verbs (such as\n\"talking\", \"touching\", or \"walking\"), for a total of 1705 distinct category\nlabels. To interpret our model, labels can be organized in a graph of semantic\nrelashionship based on the `Wordnet <https://wordnet.princeton.edu/>`_ dataset.\n\n*Summary:* We first concatenate the features with multiple temporal delays to\naccount for the slow hemodynamic response. We then use linear regression to fit\na predictive model of brain activity. The linear regression is regularized to\nimprove robustness to correlated features and to improve generalization\nperformance. The optimal regularization hyperparameter is selected over a\ngrid-search with cross-validation. Finally, the model generalization\nperformance is evaluated on a held-out test set, comparing the model\npredictions to the corresponding ground-truth fMRI responses.\n"
+        "\n# Fit a ridge model with wordnet features\n\nIn this example, we model the fMRI responses with semantic \"wordnet\" features,\nmanually annotated on each frame of the movie stimulus. The model is a\nregularized linear regression model, known as ridge regression. Since this\nmodel is used to predict brain activity from the stimulus, it is called a\n(voxelwise) encoding model.\n\nThis example reproduces part of the analysis described in Huth et al (2012)\n[1]_. See this publication for more details about the experiment, the wordnet\nfeatures, along with more results and more discussions.\n\n*Wordnet features:* The features used in this example are semantic labels\nmanually annotated on each frame of the movie stimulus. The semantic labels\ninclude nouns (such as \"woman\", \"car\", or \"building\") and verbs (such as\n\"talking\", \"touching\", or \"walking\"), for a total of 1705 distinct category\nlabels. To interpret our model, labels can be organized in a graph of semantic\nrelashionship based on the [Wordnet](https://wordnet.princeton.edu/) dataset.\n\n*Summary:* We first concatenate the features with multiple temporal delays to\naccount for the slow hemodynamic response. We then use linear regression to fit\na predictive model of brain activity. The linear regression is regularized to\nimprove robustness to correlated features and to improve generalization\nperformance. The optimal regularization hyperparameter is selected over a\ngrid-search with cross-validation. Finally, the model generalization\nperformance is evaluated on a held-out test set, comparing the model\npredictions to the corresponding ground-truth fMRI responses.\n"
       ]
     },
     {
@@ -195,7 +195,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "Finally, we use a ridge regression model. Ridge regression is a linear\nregression with L2 regularization. The L2 regularization improves robustness\nto correlated features and improves generalization performance. The L2\nregularization is controlled by a hyperparameter ``alpha`` that needs to be\ntuned for each dataset. This regularization hyperparameter is usually\nselected over a grid search with cross-validation, selecting the\nhyperparameter that maximizes the predictive performances on the validation\nset. See the previous example for more details about ridge regression and\nhyperparameter selection.\n\nFor computational reasons, when the number of features is larger than the\nnumber of samples, it is more efficient to solve ridge regression using the\n(equivalent) dual formulation [2]_. This dual formulation is equivalent to\nkernel ridge regression with a linear kernel. Here, we have 3600 training\nsamples, and 1705 * 4 = 6820 features (we multiply by 4 since we use 4 time\ndelays), therefore it is more efficient to use kernel ridge regression.\n\nWith one target, we could directly use the pipeline in ``scikit-learn``'s\n``GridSearchCV``, to select the optimal regularization hyperparameter\n(``alpha``) over cross-validation. However, ``GridSearchCV`` can only\noptimize a single score across all voxels (targets). Thus, in the\nmultiple-target case, ``GridSearchCV`` can only optimize (for example) the\nmean score over targets. Here, we want to find a different optimal\nhyperparameter per target/voxel, so we use the package `himalaya\n<https://github.com/gallantlab/himalaya>`_ which implements a\n``scikit-learn`` compatible estimator ``KernelRidgeCV``, with hyperparameter\nselection independently on each target.\n\n"
+        "Finally, we use a ridge regression model. Ridge regression is a linear\nregression with L2 regularization. The L2 regularization improves robustness\nto correlated features and improves generalization performance. The L2\nregularization is controlled by a hyperparameter ``alpha`` that needs to be\ntuned for each dataset. This regularization hyperparameter is usually\nselected over a grid search with cross-validation, selecting the\nhyperparameter that maximizes the predictive performances on the validation\nset. See the previous example for more details about ridge regression and\nhyperparameter selection.\n\nFor computational reasons, when the number of features is larger than the\nnumber of samples, it is more efficient to solve ridge regression using the\n(equivalent) dual formulation [2]_. This dual formulation is equivalent to\nkernel ridge regression with a linear kernel. Here, we have 3600 training\nsamples, and 1705 * 4 = 6820 features (we multiply by 4 since we use 4 time\ndelays), therefore it is more efficient to use kernel ridge regression.\n\nWith one target, we could directly use the pipeline in ``scikit-learn``'s\n``GridSearchCV``, to select the optimal regularization hyperparameter\n(``alpha``) over cross-validation. However, ``GridSearchCV`` can only\noptimize a single score across all voxels (targets). Thus, in the\nmultiple-target case, ``GridSearchCV`` can only optimize (for example) the\nmean score over targets. Here, we want to find a different optimal\nhyperparameter per target/voxel, so we use the package [himalaya](https://github.com/gallantlab/himalaya) which implements a\n``scikit-learn`` compatible estimator ``KernelRidgeCV``, with hyperparameter\nselection independently on each target.\n\n"
       ]
     },
     {
@@ -285,7 +285,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "Finally, we use a ``scikit-learn`` ``Pipeline`` to link the different steps\ntogether. A ``Pipeline`` can be used as a regular estimator, calling\n``pipeline.fit``, ``pipeline.predict``, etc. Using a ``Pipeline`` can be\nuseful to clarify the different steps, avoid cross-validation mistakes, or\nautomatically cache intermediate results. See the ``scikit-learn``\n`documentation <https://scikit-learn.org/stable/modules/compose.html>`_ for\nmore information.\n\n"
+        "Finally, we use a ``scikit-learn`` ``Pipeline`` to link the different steps\ntogether. A ``Pipeline`` can be used as a regular estimator, calling\n``pipeline.fit``, ``pipeline.predict``, etc. Using a ``Pipeline`` can be\nuseful to clarify the different steps, avoid cross-validation mistakes, or\nautomatically cache intermediate results. See the ``scikit-learn``\n[documentation](https://scikit-learn.org/stable/modules/compose.html) for\nmore information.\n\n"
       ]
     },
     {
@@ -653,7 +653,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.7.12"
+      "version": "3.8.3"
     }
   },
   "nbformat": 4,
diff --git a/tutorials/notebooks/shortclips/04_plot_hemodynamic_response.ipynb b/tutorials/notebooks/shortclips/04_plot_hemodynamic_response.ipynb
@@ -352,7 +352,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.10.9"
+      "version": "3.8.3"
     }
   },
   "nbformat": 4,
diff --git a/tutorials/notebooks/shortclips/05_plot_motion_energy_model.ipynb b/tutorials/notebooks/shortclips/05_plot_motion_energy_model.ipynb
@@ -341,7 +341,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.7.12"
+      "version": "3.8.3"
     }
   },
   "nbformat": 4,
diff --git a/tutorials/notebooks/shortclips/06_plot_banded_ridge_model.ipynb b/tutorials/notebooks/shortclips/06_plot_banded_ridge_model.ipynb
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Fit a banded ridge model with both wordnet and motion energy features\n\nIn this example, we model the fMRI responses with a `banded ridge regression`,\nwith two different feature spaces: motion energy and wordnet categories.\n\n*Banded ridge regression:* Since the relative scaling of both feature spaces is\nunknown, we use two regularization hyperparameters (one per feature space) in a\nmodel called banded ridge regression [1]_. Just like with ridge regression, we\noptimize the hyperparameters over cross-validation. An efficient implementation\nof this model is available in the `himalaya\n<https://github.com/gallantlab/himalaya>`_ package.\n\n*Running time:* This example is more computationally intensive than the\nprevious examples. With a GPU backend, model fitting takes around 6 minutes.\nWith a CPU backend, it can last 10 times more.\n"
+        "\n# Fit a banded ridge model with both wordnet and motion energy features\n\nIn this example, we model the fMRI responses with a `banded ridge regression`,\nwith two different feature spaces: motion energy and wordnet categories.\n\n*Banded ridge regression:* Since the relative scaling of both feature spaces is\nunknown, we use two regularization hyperparameters (one per feature space) in a\nmodel called banded ridge regression [1]_. Just like with ridge regression, we\noptimize the hyperparameters over cross-validation. An efficient implementation\nof this model is available in the [himalaya](https://github.com/gallantlab/himalaya) package.\n\n*Running time:* This example is more computationally intensive than the\nprevious examples. With a GPU backend, model fitting takes around 6 minutes.\nWith a CPU backend, it can last 10 times more.\n"
       ]
     },
     {
@@ -510,7 +510,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.7.12"
+      "version": "3.8.3"
     }
   },
   "nbformat": 4,
diff --git a/tutorials/notebooks/shortclips/07_extract_motion_energy.ipynb b/tutorials/notebooks/shortclips/07_extract_motion_energy.ipynb
@@ -136,7 +136,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.10.9"
+      "version": "3.8.3"
     }
   },
   "nbformat": 4,
diff --git a/tutorials/notebooks/shortclips/merged_for_colab.ipynb b/tutorials/notebooks/shortclips/merged_for_colab.ipynb
@@ -1333,8 +1333,7 @@
     "hyperparameter selection. Thus, the process is called a *cross-validation*.\n",
     "\n",
     "Learn more about hyperparameter selection and cross-validation on the\n",
-    "`scikit-learn documentation\n",
-    "<https://scikit-learn.org/stable/modules/cross_validation.html>`_.\n",
+    "[scikit-learn documentation](https://scikit-learn.org/stable/modules/cross_validation.html).\n",
     "\n"
    ]
   },
@@ -1390,7 +1389,7 @@
     "include nouns (such as \"woman\", \"car\", or \"building\") and verbs (such as\n",
     "\"talking\", \"touching\", or \"walking\"), for a total of 1705 distinct category\n",
     "labels. To interpret our model, labels can be organized in a graph of semantic\n",
-    "relashionship based on the `Wordnet <https://wordnet.princeton.edu/>`_ dataset.\n",
+    "relashionship based on the [Wordnet](https://wordnet.princeton.edu/) dataset.\n",
     "\n",
     "*Summary:* We first concatenate the features with multiple temporal delays to\n",
     "account for the slow hemodynamic response. We then use linear regression to fit\n",
@@ -1688,8 +1687,7 @@
     "optimize a single score across all voxels (targets). Thus, in the\n",
     "multiple-target case, ``GridSearchCV`` can only optimize (for example) the\n",
     "mean score over targets. Here, we want to find a different optimal\n",
-    "hyperparameter per target/voxel, so we use the package `himalaya\n",
-    "<https://github.com/gallantlab/himalaya>`_ which implements a\n",
+    "hyperparameter per target/voxel, so we use the package [himalaya](https://github.com/gallantlab/himalaya) which implements a\n",
     "``scikit-learn`` compatible estimator ``KernelRidgeCV``, with hyperparameter\n",
     "selection independently on each target.\n",
     "\n"
@@ -1809,7 +1807,7 @@
     "``pipeline.fit``, ``pipeline.predict``, etc. Using a ``Pipeline`` can be\n",
     "useful to clarify the different steps, avoid cross-validation mistakes, or\n",
     "automatically cache intermediate results. See the ``scikit-learn``\n",
-    "`documentation <https://scikit-learn.org/stable/modules/compose.html>`_ for\n",
+    "[documentation](https://scikit-learn.org/stable/modules/compose.html) for\n",
     "more information.\n",
     "\n"
    ]
@@ -3518,8 +3516,7 @@
     "unknown, we use two regularization hyperparameters (one per feature space) in a\n",
     "model called banded ridge regression [1]_. Just like with ridge regression, we\n",
     "optimize the hyperparameters over cross-validation. An efficient implementation\n",
-    "of this model is available in the `himalaya\n",
-    "<https://github.com/gallantlab/himalaya>`_ package.\n",
+    "of this model is available in the [himalaya](https://github.com/gallantlab/himalaya) package.\n",
     "\n",
     "*Running time:* This example is more computationally intensive than the\n",
     "previous examples. With a GPU backend, model fitting takes around 6 minutes.\n",
@@ -4287,7 +4284,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.9"
+   "version": "3.8.3"
   },
   "name": "_merged"
  },
diff --git a/tutorials/notebooks/shortclips/merged_for_colab_model_fitting.ipynb b/tutorials/notebooks/shortclips/merged_for_colab_model_fitting.ipynb
@@ -763,7 +763,7 @@
     "include nouns (such as \"woman\", \"car\", or \"building\") and verbs (such as\n",
     "\"talking\", \"touching\", or \"walking\"), for a total of 1705 distinct category\n",
     "labels. To interpret our model, labels can be organized in a graph of semantic\n",
-    "relashionship based on the `Wordnet <https://wordnet.princeton.edu/>`_ dataset.\n",
+    "relashionship based on the [Wordnet](https://wordnet.princeton.edu/) dataset.\n",
     "\n",
     "*Summary:* We first concatenate the features with multiple temporal delays to\n",
     "account for the slow hemodynamic response. We then use linear regression to fit\n",
@@ -1061,8 +1061,7 @@
     "optimize a single score across all voxels (targets). Thus, in the\n",
     "multiple-target case, ``GridSearchCV`` can only optimize (for example) the\n",
     "mean score over targets. Here, we want to find a different optimal\n",
-    "hyperparameter per target/voxel, so we use the package `himalaya\n",
-    "<https://github.com/gallantlab/himalaya>`_ which implements a\n",
+    "hyperparameter per target/voxel, so we use the package [himalaya](https://github.com/gallantlab/himalaya) which implements a\n",
     "``scikit-learn`` compatible estimator ``KernelRidgeCV``, with hyperparameter\n",
     "selection independently on each target.\n",
     "\n"
@@ -1182,7 +1181,7 @@
     "``pipeline.fit``, ``pipeline.predict``, etc. Using a ``Pipeline`` can be\n",
     "useful to clarify the different steps, avoid cross-validation mistakes, or\n",
     "automatically cache intermediate results. See the ``scikit-learn``\n",
-    "`documentation <https://scikit-learn.org/stable/modules/compose.html>`_ for\n",
+    "[documentation](https://scikit-learn.org/stable/modules/compose.html) for\n",
     "more information.\n",
     "\n"
    ]
@@ -2296,8 +2295,7 @@
     "unknown, we use two regularization hyperparameters (one per feature space) in a\n",
     "model called banded ridge regression [1]_. Just like with ridge regression, we\n",
     "optimize the hyperparameters over cross-validation. An efficient implementation\n",
-    "of this model is available in the `himalaya\n",
-    "<https://github.com/gallantlab/himalaya>`_ package.\n",
+    "of this model is available in the [himalaya](https://github.com/gallantlab/himalaya) package.\n",
     "\n",
     "*Running time:* This example is more computationally intensive than the\n",
     "previous examples. With a GPU backend, model fitting takes around 6 minutes.\n",
@@ -3065,7 +3063,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.9"
+   "version": "3.8.3"
   },
   "name": "_merged"
  },
diff --git a/tutorials/notebooks/vim2/00_download_vim2.ipynb b/tutorials/notebooks/vim2/00_download_vim2.ipynb
@@ -100,7 +100,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.10.9"
+      "version": "3.8.3"
     }
   },
   "nbformat": 4,
diff --git a/tutorials/notebooks/vim2/01_extract_motion_energy.ipynb b/tutorials/notebooks/vim2/01_extract_motion_energy.ipynb
@@ -143,7 +143,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.10.9"
+      "version": "3.8.3"
     }
   },
   "nbformat": 4,
diff --git a/tutorials/notebooks/vim2/02_plot_ridge_model.ipynb b/tutorials/notebooks/vim2/02_plot_ridge_model.ipynb

Original file line number	Diff line number	Diff line change
`@@ -89,7 +89,7 @@`
`89`	`89`	`"name": "python",`
`90`	`90`	`"nbconvert_exporter": "python",`
`91`	`91`	`"pygments_lexer": "ipython3",`
`92`		`- "version": "3.10.9"`
	`92`	`+ "version": "3.8.3"`
`93`	`93`	`}`
`94`	`94`	`},`
`95`	`95`	`"nbformat": 4,`
Original file line number	Diff line number	Diff line change
`@@ -132,7 +132,7 @@`
`132`	`132`	`"name": "python",`
`133`	`133`	`"nbconvert_exporter": "python",`
`134`	`134`	`"pygments_lexer": "ipython3",`
`135`		`- "version": "3.10.9"`
	`135`	`+ "version": "3.8.3"`
`136`	`136`	`}`
`137`	`137`	`},`
`138`	`138`	`"nbformat": 4,`
Original file line number	Diff line number	Diff line change
`@@ -337,7 +337,7 @@`
`337`	`337`	`"name": "python",`
`338`	`338`	`"nbconvert_exporter": "python",`
`339`	`339`	`"pygments_lexer": "ipython3",`
`340`		`- "version": "3.10.9"`
	`340`	`+ "version": "3.8.3"`
`341`	`341`	`}`
`342`	`342`	`},`
`343`	`343`	`"nbformat": 4,`
Original file line number	Diff line number	Diff line change
`@@ -386,7 +386,7 @@`
`386`	`386`	`"cell_type": "markdown",`
`387`	`387`	`"metadata": {},`
`388`	`388`	`"source": [`
`389`		- "To summarize, to select the best hyperparameter $\\alpha$, the standard\nmethod is to perform a grid search:\n\n - Split the training set into two subsets: one subset used to fit the\n models, and one subset to estimate the prediction accuracy (validation\n set)\n - Define a number of hyperparameter candidates, for example [0.1, 1, 10,\n 100].\n - Fit a separate ridge model with each hyperparameter candidate\n $\\alpha$.\n - Compute the prediction accuracy on the validation set.\n - Select the hyperparameter candidate leading to the best validation\n accuracy.\n\nTo make the grid search less sensitive to the choice of how the training data\nwas split, the process can be repeated for multiple splits. Then, the\ndifferent prediction accuracies can be averaged over splits before the\nhyperparameter selection. Thus, the process is called a cross-validation.\n\nLearn more about hyperparameter selection and cross-validation on the\n`scikit-learn documentation\n<https://scikit-learn.org/stable/modules/cross_validation.html>`_.\n\n"
	`389`	+ "To summarize, to select the best hyperparameter $\\alpha$, the standard\nmethod is to perform a grid search:\n\n - Split the training set into two subsets: one subset used to fit the\n models, and one subset to estimate the prediction accuracy (validation\n set)\n - Define a number of hyperparameter candidates, for example [0.1, 1, 10,\n 100].\n - Fit a separate ridge model with each hyperparameter candidate\n $\\alpha$.\n - Compute the prediction accuracy on the validation set.\n - Select the hyperparameter candidate leading to the best validation\n accuracy.\n\nTo make the grid search less sensitive to the choice of how the training data\nwas split, the process can be repeated for multiple splits. Then, the\ndifferent prediction accuracies can be averaged over splits before the\nhyperparameter selection. Thus, the process is called a cross-validation.\n\nLearn more about hyperparameter selection and cross-validation on the\n[scikit-learn documentation](https://scikit-learn.org/stable/modules/cross_validation.html).\n\n"
`390`	`390`	`]`
`391`	`391`	`}`
`392`	`392`	`],`
`@@ -406,7 +406,7 @@`
`406`	`406`	`"name": "python",`
`407`	`407`	`"nbconvert_exporter": "python",`
`408`	`408`	`"pygments_lexer": "ipython3",`
`409`		`- "version": "3.7.12"`
	`409`	`+ "version": "3.8.3"`
`410`	`410`	`}`
`411`	`411`	`},`
`412`	`412`	`"nbformat": 4,`
Original file line number	Diff line number	Diff line change
`@@ -15,7 +15,7 @@`
`15`	`15`	`"cell_type": "markdown",`
`16`	`16`	`"metadata": {},`
`17`	`17`	`"source": [`
`18`		- "\n# Fit a ridge model with wordnet features\n\nIn this example, we model the fMRI responses with semantic \"wordnet\" features,\nmanually annotated on each frame of the movie stimulus. The model is a\nregularized linear regression model, known as ridge regression. Since this\nmodel is used to predict brain activity from the stimulus, it is called a\n(voxelwise) encoding model.\n\nThis example reproduces part of the analysis described in Huth et al (2012)\n[1]_. See this publication for more details about the experiment, the wordnet\nfeatures, along with more results and more discussions.\n\nWordnet features: The features used in this example are semantic labels\nmanually annotated on each frame of the movie stimulus. The semantic labels\ninclude nouns (such as \"woman\", \"car\", or \"building\") and verbs (such as\n\"talking\", \"touching\", or \"walking\"), for a total of 1705 distinct category\nlabels. To interpret our model, labels can be organized in a graph of semantic\nrelashionship based on the `Wordnet <https://wordnet.princeton.edu/>`_ dataset.\n\nSummary: We first concatenate the features with multiple temporal delays to\naccount for the slow hemodynamic response. We then use linear regression to fit\na predictive model of brain activity. The linear regression is regularized to\nimprove robustness to correlated features and to improve generalization\nperformance. The optimal regularization hyperparameter is selected over a\ngrid-search with cross-validation. Finally, the model generalization\nperformance is evaluated on a held-out test set, comparing the model\npredictions to the corresponding ground-truth fMRI responses.\n"
	`18`	+ "\n# Fit a ridge model with wordnet features\n\nIn this example, we model the fMRI responses with semantic \"wordnet\" features,\nmanually annotated on each frame of the movie stimulus. The model is a\nregularized linear regression model, known as ridge regression. Since this\nmodel is used to predict brain activity from the stimulus, it is called a\n(voxelwise) encoding model.\n\nThis example reproduces part of the analysis described in Huth et al (2012)\n[1]_. See this publication for more details about the experiment, the wordnet\nfeatures, along with more results and more discussions.\n\nWordnet features: The features used in this example are semantic labels\nmanually annotated on each frame of the movie stimulus. The semantic labels\ninclude nouns (such as \"woman\", \"car\", or \"building\") and verbs (such as\n\"talking\", \"touching\", or \"walking\"), for a total of 1705 distinct category\nlabels. To interpret our model, labels can be organized in a graph of semantic\nrelashionship based on the [Wordnet](https://wordnet.princeton.edu/) dataset.\n\nSummary: We first concatenate the features with multiple temporal delays to\naccount for the slow hemodynamic response. We then use linear regression to fit\na predictive model of brain activity. The linear regression is regularized to\nimprove robustness to correlated features and to improve generalization\nperformance. The optimal regularization hyperparameter is selected over a\ngrid-search with cross-validation. Finally, the model generalization\nperformance is evaluated on a held-out test set, comparing the model\npredictions to the corresponding ground-truth fMRI responses.\n"
`19`	`19`	`]`
`20`	`20`	`},`
`21`	`21`	`{`
`@@ -195,7 +195,7 @@`
`195`	`195`	`"cell_type": "markdown",`
`196`	`196`	`"metadata": {},`
`197`	`197`	`"source": [`
`198`		- "Finally, we use a ridge regression model. Ridge regression is a linear\nregression with L2 regularization. The L2 regularization improves robustness\nto correlated features and improves generalization performance. The L2\nregularization is controlled by a hyperparameter ``alpha`` that needs to be\ntuned for each dataset. This regularization hyperparameter is usually\nselected over a grid search with cross-validation, selecting the\nhyperparameter that maximizes the predictive performances on the validation\nset. See the previous example for more details about ridge regression and\nhyperparameter selection.\n\nFor computational reasons, when the number of features is larger than the\nnumber of samples, it is more efficient to solve ridge regression using the\n(equivalent) dual formulation [2]_. This dual formulation is equivalent to\nkernel ridge regression with a linear kernel. Here, we have 3600 training\nsamples, and 1705 * 4 = 6820 features (we multiply by 4 since we use 4 time\ndelays), therefore it is more efficient to use kernel ridge regression.\n\nWith one target, we could directly use the pipeline in ``scikit-learn``'s\n``GridSearchCV``, to select the optimal regularization hyperparameter\n(``alpha``) over cross-validation. However, ``GridSearchCV`` can only\noptimize a single score across all voxels (targets). Thus, in the\nmultiple-target case, ``GridSearchCV`` can only optimize (for example) the\nmean score over targets. Here, we want to find a different optimal\nhyperparameter per target/voxel, so we use the package `himalaya\n<https://github.com/gallantlab/himalaya>`_ which implements a\n``scikit-learn`` compatible estimator ``KernelRidgeCV``, with hyperparameter\nselection independently on each target.\n\n"
	`198`	+ "Finally, we use a ridge regression model. Ridge regression is a linear\nregression with L2 regularization. The L2 regularization improves robustness\nto correlated features and improves generalization performance. The L2\nregularization is controlled by a hyperparameter ``alpha`` that needs to be\ntuned for each dataset. This regularization hyperparameter is usually\nselected over a grid search with cross-validation, selecting the\nhyperparameter that maximizes the predictive performances on the validation\nset. See the previous example for more details about ridge regression and\nhyperparameter selection.\n\nFor computational reasons, when the number of features is larger than the\nnumber of samples, it is more efficient to solve ridge regression using the\n(equivalent) dual formulation [2]_. This dual formulation is equivalent to\nkernel ridge regression with a linear kernel. Here, we have 3600 training\nsamples, and 1705 * 4 = 6820 features (we multiply by 4 since we use 4 time\ndelays), therefore it is more efficient to use kernel ridge regression.\n\nWith one target, we could directly use the pipeline in ``scikit-learn``'s\n``GridSearchCV``, to select the optimal regularization hyperparameter\n(``alpha``) over cross-validation. However, ``GridSearchCV`` can only\noptimize a single score across all voxels (targets). Thus, in the\nmultiple-target case, ``GridSearchCV`` can only optimize (for example) the\nmean score over targets. Here, we want to find a different optimal\nhyperparameter per target/voxel, so we use the package [himalaya](https://github.com/gallantlab/himalaya) which implements a\n``scikit-learn`` compatible estimator ``KernelRidgeCV``, with hyperparameter\nselection independently on each target.\n\n"
`199`	`199`	`]`
`200`	`200`	`},`
`201`	`201`	`{`
`@@ -285,7 +285,7 @@`
`285`	`285`	`"cell_type": "markdown",`
`286`	`286`	`"metadata": {},`
`287`	`287`	`"source": [`
`288`		- "Finally, we use a ``scikit-learn`` ``Pipeline`` to link the different steps\ntogether. A ``Pipeline`` can be used as a regular estimator, calling\n``pipeline.fit``, ``pipeline.predict``, etc. Using a ``Pipeline`` can be\nuseful to clarify the different steps, avoid cross-validation mistakes, or\nautomatically cache intermediate results. See the ``scikit-learn``\n`documentation <https://scikit-learn.org/stable/modules/compose.html>`_ for\nmore information.\n\n"
	`288`	+ "Finally, we use a ``scikit-learn`` ``Pipeline`` to link the different steps\ntogether. A ``Pipeline`` can be used as a regular estimator, calling\n``pipeline.fit``, ``pipeline.predict``, etc. Using a ``Pipeline`` can be\nuseful to clarify the different steps, avoid cross-validation mistakes, or\nautomatically cache intermediate results. See the ``scikit-learn``\n[documentation](https://scikit-learn.org/stable/modules/compose.html) for\nmore information.\n\n"
`289`	`289`	`]`
`290`	`290`	`},`
`291`	`291`	`{`
`@@ -653,7 +653,7 @@`
`653`	`653`	`"name": "python",`
`654`	`654`	`"nbconvert_exporter": "python",`
`655`	`655`	`"pygments_lexer": "ipython3",`
`656`		`- "version": "3.7.12"`
	`656`	`+ "version": "3.8.3"`
`657`	`657`	`}`
`658`	`658`	`},`
`659`	`659`	`"nbformat": 4,`
Original file line number	Diff line number	Diff line change
`@@ -352,7 +352,7 @@`
`352`	`352`	`"name": "python",`
`353`	`353`	`"nbconvert_exporter": "python",`
`354`	`354`	`"pygments_lexer": "ipython3",`
`355`		`- "version": "3.10.9"`
	`355`	`+ "version": "3.8.3"`
`356`	`356`	`}`
`357`	`357`	`},`
`358`	`358`	`"nbformat": 4,`
Original file line number	Diff line number	Diff line change
`@@ -341,7 +341,7 @@`
`341`	`341`	`"name": "python",`
`342`	`342`	`"nbconvert_exporter": "python",`
`343`	`343`	`"pygments_lexer": "ipython3",`
`344`		`- "version": "3.7.12"`
	`344`	`+ "version": "3.8.3"`
`345`	`345`	`}`
`346`	`346`	`},`
`347`	`347`	`"nbformat": 4,`
Original file line number	Diff line number	Diff line change
`@@ -136,7 +136,7 @@`
`136`	`136`	`"name": "python",`
`137`	`137`	`"nbconvert_exporter": "python",`
`138`	`138`	`"pygments_lexer": "ipython3",`
`139`		`- "version": "3.10.9"`
	`139`	`+ "version": "3.8.3"`
`140`	`140`	`}`
`141`	`141`	`},`
`142`	`142`	`"nbformat": 4,`