Skip to content

Commit 1ceffdc

Browse files
committed
Fix references and add merged notebooks
1 parent cda4e57 commit 1ceffdc

3 files changed

Lines changed: 176 additions & 117 deletions

File tree

tutorials/notebooks/shortclips/vem_tutorials_merged_for_colab.ipynb

Lines changed: 95 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1422,6 +1422,36 @@
14221422
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)"
14231423
]
14241424
},
1425+
{
1426+
"cell_type": "markdown",
1427+
"metadata": {},
1428+
"source": [
1429+
"Before fitting an encoding model, the fMRI responses are typically z-scored over time. This normalization step is performed for two reasons.\n",
1430+
"First, the regularized regression methods used to estimate encoding models generally assume the data to be normalized {cite:t}`Hastie2009`. \n",
1431+
"Second, the temporal mean and standard deviation of a voxel are typically considered uninformative in fMRI because they can vary due to factors unrelated to the task, such as differences in signal-to-noise ratio (SNR).\n",
1432+
"\n",
1433+
"To preserve each run independent from the others, we z-score each run separately."
1434+
]
1435+
},
1436+
{
1437+
"cell_type": "code",
1438+
"execution_count": null,
1439+
"metadata": {},
1440+
"outputs": [],
1441+
"source": [
1442+
"from scipy.stats import zscore\n",
1443+
"\n",
1444+
"# indice of first sample of each run\n",
1445+
"run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
1446+
"print(run_onsets)\n",
1447+
"\n",
1448+
"# zscore each training run separately\n",
1449+
"Y_train = np.split(Y_train, run_onsets[1:])\n",
1450+
"Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
1451+
"# zscore each test run separately\n",
1452+
"Y_test = zscore(Y_test, axis=1)"
1453+
]
1454+
},
14251455
{
14261456
"cell_type": "markdown",
14271457
"metadata": {},
@@ -1443,6 +1473,9 @@
14431473
"outputs": [],
14441474
"source": [
14451475
"Y_test = Y_test.mean(0)\n",
1476+
"# We need to zscore the test data again, because we took the mean across repetitions.\n",
1477+
"# This averaging step makes the standard deviation approximately equal to 1/sqrt(n_repeats)\n",
1478+
"Y_test = zscore(Y_test, axis=0)\n",
14461479
"\n",
14471480
"print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
14481481
]
@@ -1510,7 +1543,8 @@
15101543
"following time sample in the validation set. Thus, we define here a\n",
15111544
"leave-one-run-out cross-validation split that keeps each recording run\n",
15121545
"intact.\n",
1513-
"\n"
1546+
"\n",
1547+
"We define a cross-validation splitter, compatible with ``scikit-learn`` API."
15141548
]
15151549
},
15161550
{
@@ -1524,27 +1558,6 @@
15241558
"from sklearn.model_selection import check_cv\n",
15251559
"from voxelwise_tutorials.utils import generate_leave_one_run_out\n",
15261560
"\n",
1527-
"# indice of first sample of each run\n",
1528-
"run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
1529-
"print(run_onsets)"
1530-
]
1531-
},
1532-
{
1533-
"cell_type": "markdown",
1534-
"metadata": {},
1535-
"source": [
1536-
"We define a cross-validation splitter, compatible with ``scikit-learn`` API.\n",
1537-
"\n"
1538-
]
1539-
},
1540-
{
1541-
"cell_type": "code",
1542-
"execution_count": null,
1543-
"metadata": {
1544-
"collapsed": false
1545-
},
1546-
"outputs": [],
1547-
"source": [
15481561
"n_samples_train = X_train.shape[0]\n",
15491562
"cv = generate_leave_one_run_out(n_samples_train, run_onsets)\n",
15501563
"cv = check_cv(cv) # copy the cross-validation splitter into a reusable list"
@@ -1558,19 +1571,24 @@
15581571
"\n",
15591572
"Now, let's define the model pipeline.\n",
15601573
"\n",
1574+
"With regularized linear regression models, it is generally recommended to normalize \n",
1575+
"(z-score) both the responses and the features before fitting the model {cite:t}`Hastie2009`. \n",
1576+
"Z-scoring corresponds to removing the temporal mean and dividing by the temporal standard deviation.\n",
1577+
"We already z-scored the fMRI responses after loading them, so now we need to specify\n",
1578+
"in the model how to deal with the features. \n",
1579+
"\n",
15611580
"We first center the features, since we will not use an intercept. The mean\n",
15621581
"value in fMRI recording is non-informative, so each run is detrended and\n",
15631582
"demeaned independently, and we do not need to predict an intercept value in\n",
15641583
"the linear model.\n",
15651584
"\n",
1566-
"However, we prefer to avoid normalizing by the standard deviation of each\n",
1567-
"feature. If the features are extracted in a consistent way from the stimulus,\n",
1585+
"For this particular dataset and example, we do not normalize by the standard deviation \n",
1586+
"of each feature. If the features are extracted in a consistent way from the stimulus,\n",
15681587
"their relative scale is meaningful. Normalizing them independently from each\n",
15691588
"other would remove this information. Moreover, the wordnet features are\n",
15701589
"one-hot-encoded, which means that each feature is either present (1) or not\n",
15711590
"present (0) in each sample. Normalizing one-hot-encoded features is not\n",
1572-
"recommended, since it would scale disproportionately the infrequent features.\n",
1573-
"\n"
1591+
"recommended, since it would scale disproportionately the infrequent features."
15741592
]
15751593
},
15761594
{
@@ -2096,7 +2114,7 @@
20962114
"cell_type": "markdown",
20972115
"metadata": {},
20982116
"source": [
2099-
"Similarly to [1]_, we correct the coefficients of features linked by a\n",
2117+
"Similarly to {cite:t}`huth2012`, we correct the coefficients of features linked by a\n",
21002118
"semantic relationship. When building the wordnet features, if a frame was\n",
21012119
"labeled with `wolf`, the authors automatically added the semantically linked\n",
21022120
"categories `canine`, `carnivore`, `placental mammal`, `mamma`, `vertebrate`,\n",
@@ -2272,10 +2290,11 @@
22722290
"voxel_colors = scale_to_rgb_cube(average_coef_transformed[1:4].T, clip=3).T\n",
22732291
"print(\"(n_channels, n_voxels) =\", voxel_colors.shape)\n",
22742292
"\n",
2275-
"ax = plot_3d_flatmap_from_mapper(voxel_colors[0], voxel_colors[1],\n",
2276-
" voxel_colors[2], mapper_file=mapper_file,\n",
2277-
" vmin=0, vmax=1, vmin2=0, vmax2=1, vmin3=0,\n",
2278-
" vmax3=1)\n",
2293+
"ax = plot_3d_flatmap_from_mapper(\n",
2294+
" voxel_colors[0], voxel_colors[1], voxel_colors[2], \n",
2295+
" mapper_file=mapper_file, \n",
2296+
" vmin=0, vmax=1, vmin2=0, vmax2=1, vmin3=0, vmax3=1\n",
2297+
")\n",
22792298
"plt.show()"
22802299
]
22812300
},
@@ -2379,8 +2398,7 @@
23792398
"source": [
23802399
"## Load the data\n",
23812400
"\n",
2382-
"We first load the fMRI responses.\n",
2383-
"\n"
2401+
"We first load and normalize the fMRI responses."
23842402
]
23852403
},
23862404
{
@@ -2393,23 +2411,32 @@
23932411
"source": [
23942412
"import os\n",
23952413
"import numpy as np\n",
2414+
"from scipy.stats import zscore\n",
23962415
"from voxelwise_tutorials.io import load_hdf5_array\n",
23972416
"\n",
23982417
"file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
23992418
"Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
24002419
"Y_test = load_hdf5_array(file_name, key=\"Y_test\")\n",
24012420
"\n",
24022421
"print(\"(n_samples_train, n_voxels) =\", Y_train.shape)\n",
2403-
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)"
2422+
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)\n",
2423+
"\n",
2424+
"# indice of first sample of each run\n",
2425+
"run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
2426+
"\n",
2427+
"# zscore each training run separately\n",
2428+
"Y_train = np.split(Y_train, run_onsets[1:])\n",
2429+
"Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
2430+
"# zscore each test run separately\n",
2431+
"Y_test = zscore(Y_test, axis=1)"
24042432
]
24052433
},
24062434
{
24072435
"cell_type": "markdown",
24082436
"metadata": {},
24092437
"source": [
24102438
"We average the test repeats, to remove the non-repeatable part of fMRI\n",
2411-
"responses.\n",
2412-
"\n"
2439+
"responses, and normalize the average across repeats."
24132440
]
24142441
},
24152442
{
@@ -2421,6 +2448,7 @@
24212448
"outputs": [],
24222449
"source": [
24232450
"Y_test = Y_test.mean(0)\n",
2451+
"Y_test = zscore(Y_test, axis=0)\n",
24242452
"\n",
24252453
"print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
24262454
]
@@ -2479,7 +2507,8 @@
24792507
"\n",
24802508
"We define the same leave-one-run-out cross-validation split as in the\n",
24812509
"previous example.\n",
2482-
"\n"
2510+
"\n",
2511+
"We define a cross-validation splitter, compatible with ``scikit-learn`` API."
24832512
]
24842513
},
24852514
{
@@ -2493,27 +2522,6 @@
24932522
"from sklearn.model_selection import check_cv\n",
24942523
"from voxelwise_tutorials.utils import generate_leave_one_run_out\n",
24952524
"\n",
2496-
"# indice of first sample of each run\n",
2497-
"run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
2498-
"print(run_onsets)"
2499-
]
2500-
},
2501-
{
2502-
"cell_type": "markdown",
2503-
"metadata": {},
2504-
"source": [
2505-
"We define a cross-validation splitter, compatible with ``scikit-learn`` API.\n",
2506-
"\n"
2507-
]
2508-
},
2509-
{
2510-
"cell_type": "code",
2511-
"execution_count": null,
2512-
"metadata": {
2513-
"collapsed": false
2514-
},
2515-
"outputs": [],
2516-
"source": [
25172525
"n_samples_train = X_train.shape[0]\n",
25182526
"cv = generate_leave_one_run_out(n_samples_train, run_onsets)\n",
25192527
"cv = check_cv(cv) # copy the cross-validation splitter into a reusable list"
@@ -2964,7 +2972,7 @@
29642972
"source": [
29652973
"## Load the data\n",
29662974
"\n",
2967-
"We first load the fMRI responses.\n",
2975+
"We first load and normalize the fMRI responses.\n",
29682976
"\n"
29692977
]
29702978
},
@@ -2978,23 +2986,32 @@
29782986
"source": [
29792987
"import os\n",
29802988
"import numpy as np\n",
2989+
"from scipy.stats import zscore\n",
29812990
"from voxelwise_tutorials.io import load_hdf5_array\n",
29822991
"\n",
29832992
"file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
29842993
"Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
29852994
"Y_test = load_hdf5_array(file_name, key=\"Y_test\")\n",
29862995
"\n",
29872996
"print(\"(n_samples_train, n_voxels) =\", Y_train.shape)\n",
2988-
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)"
2997+
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)\n",
2998+
"\n",
2999+
"# indice of first sample of each run\n",
3000+
"run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
3001+
"\n",
3002+
"# zscore each training run separately\n",
3003+
"Y_train = np.split(Y_train, run_onsets[1:])\n",
3004+
"Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
3005+
"# zscore each test run separately\n",
3006+
"Y_test = zscore(Y_test, axis=1)"
29893007
]
29903008
},
29913009
{
29923010
"cell_type": "markdown",
29933011
"metadata": {},
29943012
"source": [
29953013
"We average the test repeats, to remove the non-repeatable part of fMRI\n",
2996-
"responses.\n",
2997-
"\n"
3014+
"responses, and normalize the average across repeats."
29983015
]
29993016
},
30003017
{
@@ -3006,6 +3023,7 @@
30063023
"outputs": [],
30073024
"source": [
30083025
"Y_test = Y_test.mean(0)\n",
3026+
"Y_test = zscore(Y_test, axis=0)\n",
30093027
"\n",
30103028
"print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
30113029
]
@@ -3457,28 +3475,35 @@
34573475
"## Load the data\n",
34583476
"\n",
34593477
"As in the previous examples, we first load the fMRI responses, which are our\n",
3460-
"regression targets.\n",
3461-
"\n"
3478+
"regression targets. We then normalize the data independently for each run."
34623479
]
34633480
},
34643481
{
34653482
"cell_type": "code",
34663483
"execution_count": null,
3467-
"metadata": {
3468-
"collapsed": false
3469-
},
3484+
"metadata": {},
34703485
"outputs": [],
34713486
"source": [
34723487
"import os\n",
34733488
"import numpy as np\n",
3489+
"from scipy.stats import zscore\n",
34743490
"from voxelwise_tutorials.io import load_hdf5_array\n",
34753491
"\n",
34763492
"file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
34773493
"Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
34783494
"Y_test = load_hdf5_array(file_name, key=\"Y_test\")\n",
34793495
"\n",
34803496
"print(\"(n_samples_train, n_voxels) =\", Y_train.shape)\n",
3481-
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)"
3497+
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)\n",
3498+
"\n",
3499+
"# indice of first sample of each run\n",
3500+
"run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
3501+
"\n",
3502+
"# zscore each training run separately\n",
3503+
"Y_train = np.split(Y_train, run_onsets[1:])\n",
3504+
"Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
3505+
"# zscore each test run separately\n",
3506+
"Y_test = zscore(Y_test, axis=1)"
34823507
]
34833508
},
34843509
{
@@ -3511,8 +3536,7 @@
35113536
"metadata": {},
35123537
"source": [
35133538
"We average the test repeats, to remove the non-repeatable part of fMRI\n",
3514-
"responses.\n",
3515-
"\n"
3539+
"responses, and normalize the averaged data."
35163540
]
35173541
},
35183542
{
@@ -3524,6 +3548,7 @@
35243548
"outputs": [],
35253549
"source": [
35263550
"Y_test = Y_test.mean(0)\n",
3551+
"Y_test = zscore(Y_test, axis=0)\n",
35273552
"\n",
35283553
"print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
35293554
]

0 commit comments

Comments
 (0)