|
1422 | 1422 | "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)" |
1423 | 1423 | ] |
1424 | 1424 | }, |
| 1425 | + { |
| 1426 | + "cell_type": "markdown", |
| 1427 | + "metadata": {}, |
| 1428 | + "source": [ |
| 1429 | + "Before fitting an encoding model, the fMRI responses are typically z-scored over time. This normalization step is performed for two reasons.\n", |
| 1430 | + "First, the regularized regression methods used to estimate encoding models generally assume the data to be normalized {cite:t}`Hastie2009`. \n", |
| 1431 | + "Second, the temporal mean and standard deviation of a voxel are typically considered uninformative in fMRI because they can vary due to factors unrelated to the task, such as differences in signal-to-noise ratio (SNR).\n", |
| 1432 | + "\n", |
| 1433 | + "To preserve each run independent from the others, we z-score each run separately." |
| 1434 | + ] |
| 1435 | + }, |
| 1436 | + { |
| 1437 | + "cell_type": "code", |
| 1438 | + "execution_count": null, |
| 1439 | + "metadata": {}, |
| 1440 | + "outputs": [], |
| 1441 | + "source": [ |
| 1442 | + "from scipy.stats import zscore\n", |
| 1443 | + "\n", |
| 1444 | + "# indice of first sample of each run\n", |
| 1445 | + "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n", |
| 1446 | + "print(run_onsets)\n", |
| 1447 | + "\n", |
| 1448 | + "# zscore each training run separately\n", |
| 1449 | + "Y_train = np.split(Y_train, run_onsets[1:])\n", |
| 1450 | + "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n", |
| 1451 | + "# zscore each test run separately\n", |
| 1452 | + "Y_test = zscore(Y_test, axis=1)" |
| 1453 | + ] |
| 1454 | + }, |
1425 | 1455 | { |
1426 | 1456 | "cell_type": "markdown", |
1427 | 1457 | "metadata": {}, |
|
1443 | 1473 | "outputs": [], |
1444 | 1474 | "source": [ |
1445 | 1475 | "Y_test = Y_test.mean(0)\n", |
| 1476 | + "# We need to zscore the test data again, because we took the mean across repetitions.\n", |
| 1477 | + "# This averaging step makes the standard deviation approximately equal to 1/sqrt(n_repeats)\n", |
| 1478 | + "Y_test = zscore(Y_test, axis=0)\n", |
1446 | 1479 | "\n", |
1447 | 1480 | "print(\"(n_samples_test, n_voxels) =\", Y_test.shape)" |
1448 | 1481 | ] |
|
1510 | 1543 | "following time sample in the validation set. Thus, we define here a\n", |
1511 | 1544 | "leave-one-run-out cross-validation split that keeps each recording run\n", |
1512 | 1545 | "intact.\n", |
1513 | | - "\n" |
| 1546 | + "\n", |
| 1547 | + "We define a cross-validation splitter, compatible with ``scikit-learn`` API." |
1514 | 1548 | ] |
1515 | 1549 | }, |
1516 | 1550 | { |
|
1524 | 1558 | "from sklearn.model_selection import check_cv\n", |
1525 | 1559 | "from voxelwise_tutorials.utils import generate_leave_one_run_out\n", |
1526 | 1560 | "\n", |
1527 | | - "# indice of first sample of each run\n", |
1528 | | - "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n", |
1529 | | - "print(run_onsets)" |
1530 | | - ] |
1531 | | - }, |
1532 | | - { |
1533 | | - "cell_type": "markdown", |
1534 | | - "metadata": {}, |
1535 | | - "source": [ |
1536 | | - "We define a cross-validation splitter, compatible with ``scikit-learn`` API.\n", |
1537 | | - "\n" |
1538 | | - ] |
1539 | | - }, |
1540 | | - { |
1541 | | - "cell_type": "code", |
1542 | | - "execution_count": null, |
1543 | | - "metadata": { |
1544 | | - "collapsed": false |
1545 | | - }, |
1546 | | - "outputs": [], |
1547 | | - "source": [ |
1548 | 1561 | "n_samples_train = X_train.shape[0]\n", |
1549 | 1562 | "cv = generate_leave_one_run_out(n_samples_train, run_onsets)\n", |
1550 | 1563 | "cv = check_cv(cv) # copy the cross-validation splitter into a reusable list" |
|
1558 | 1571 | "\n", |
1559 | 1572 | "Now, let's define the model pipeline.\n", |
1560 | 1573 | "\n", |
| 1574 | + "With regularized linear regression models, it is generally recommended to normalize \n", |
| 1575 | + "(z-score) both the responses and the features before fitting the model {cite:t}`Hastie2009`. \n", |
| 1576 | + "Z-scoring corresponds to removing the temporal mean and dividing by the temporal standard deviation.\n", |
| 1577 | + "We already z-scored the fMRI responses after loading them, so now we need to specify\n", |
| 1578 | + "in the model how to deal with the features. \n", |
| 1579 | + "\n", |
1561 | 1580 | "We first center the features, since we will not use an intercept. The mean\n", |
1562 | 1581 | "value in fMRI recording is non-informative, so each run is detrended and\n", |
1563 | 1582 | "demeaned independently, and we do not need to predict an intercept value in\n", |
1564 | 1583 | "the linear model.\n", |
1565 | 1584 | "\n", |
1566 | | - "However, we prefer to avoid normalizing by the standard deviation of each\n", |
1567 | | - "feature. If the features are extracted in a consistent way from the stimulus,\n", |
| 1585 | + "For this particular dataset and example, we do not normalize by the standard deviation \n", |
| 1586 | + "of each feature. If the features are extracted in a consistent way from the stimulus,\n", |
1568 | 1587 | "their relative scale is meaningful. Normalizing them independently from each\n", |
1569 | 1588 | "other would remove this information. Moreover, the wordnet features are\n", |
1570 | 1589 | "one-hot-encoded, which means that each feature is either present (1) or not\n", |
1571 | 1590 | "present (0) in each sample. Normalizing one-hot-encoded features is not\n", |
1572 | | - "recommended, since it would scale disproportionately the infrequent features.\n", |
1573 | | - "\n" |
| 1591 | + "recommended, since it would scale disproportionately the infrequent features." |
1574 | 1592 | ] |
1575 | 1593 | }, |
1576 | 1594 | { |
|
2096 | 2114 | "cell_type": "markdown", |
2097 | 2115 | "metadata": {}, |
2098 | 2116 | "source": [ |
2099 | | - "Similarly to [1]_, we correct the coefficients of features linked by a\n", |
| 2117 | + "Similarly to {cite:t}`huth2012`, we correct the coefficients of features linked by a\n", |
2100 | 2118 | "semantic relationship. When building the wordnet features, if a frame was\n", |
2101 | 2119 | "labeled with `wolf`, the authors automatically added the semantically linked\n", |
2102 | 2120 | "categories `canine`, `carnivore`, `placental mammal`, `mamma`, `vertebrate`,\n", |
|
2272 | 2290 | "voxel_colors = scale_to_rgb_cube(average_coef_transformed[1:4].T, clip=3).T\n", |
2273 | 2291 | "print(\"(n_channels, n_voxels) =\", voxel_colors.shape)\n", |
2274 | 2292 | "\n", |
2275 | | - "ax = plot_3d_flatmap_from_mapper(voxel_colors[0], voxel_colors[1],\n", |
2276 | | - " voxel_colors[2], mapper_file=mapper_file,\n", |
2277 | | - " vmin=0, vmax=1, vmin2=0, vmax2=1, vmin3=0,\n", |
2278 | | - " vmax3=1)\n", |
| 2293 | + "ax = plot_3d_flatmap_from_mapper(\n", |
| 2294 | + " voxel_colors[0], voxel_colors[1], voxel_colors[2], \n", |
| 2295 | + " mapper_file=mapper_file, \n", |
| 2296 | + " vmin=0, vmax=1, vmin2=0, vmax2=1, vmin3=0, vmax3=1\n", |
| 2297 | + ")\n", |
2279 | 2298 | "plt.show()" |
2280 | 2299 | ] |
2281 | 2300 | }, |
|
2379 | 2398 | "source": [ |
2380 | 2399 | "## Load the data\n", |
2381 | 2400 | "\n", |
2382 | | - "We first load the fMRI responses.\n", |
2383 | | - "\n" |
| 2401 | + "We first load and normalize the fMRI responses." |
2384 | 2402 | ] |
2385 | 2403 | }, |
2386 | 2404 | { |
|
2393 | 2411 | "source": [ |
2394 | 2412 | "import os\n", |
2395 | 2413 | "import numpy as np\n", |
| 2414 | + "from scipy.stats import zscore\n", |
2396 | 2415 | "from voxelwise_tutorials.io import load_hdf5_array\n", |
2397 | 2416 | "\n", |
2398 | 2417 | "file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n", |
2399 | 2418 | "Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n", |
2400 | 2419 | "Y_test = load_hdf5_array(file_name, key=\"Y_test\")\n", |
2401 | 2420 | "\n", |
2402 | 2421 | "print(\"(n_samples_train, n_voxels) =\", Y_train.shape)\n", |
2403 | | - "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)" |
| 2422 | + "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)\n", |
| 2423 | + "\n", |
| 2424 | + "# indice of first sample of each run\n", |
| 2425 | + "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n", |
| 2426 | + "\n", |
| 2427 | + "# zscore each training run separately\n", |
| 2428 | + "Y_train = np.split(Y_train, run_onsets[1:])\n", |
| 2429 | + "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n", |
| 2430 | + "# zscore each test run separately\n", |
| 2431 | + "Y_test = zscore(Y_test, axis=1)" |
2404 | 2432 | ] |
2405 | 2433 | }, |
2406 | 2434 | { |
2407 | 2435 | "cell_type": "markdown", |
2408 | 2436 | "metadata": {}, |
2409 | 2437 | "source": [ |
2410 | 2438 | "We average the test repeats, to remove the non-repeatable part of fMRI\n", |
2411 | | - "responses.\n", |
2412 | | - "\n" |
| 2439 | + "responses, and normalize the average across repeats." |
2413 | 2440 | ] |
2414 | 2441 | }, |
2415 | 2442 | { |
|
2421 | 2448 | "outputs": [], |
2422 | 2449 | "source": [ |
2423 | 2450 | "Y_test = Y_test.mean(0)\n", |
| 2451 | + "Y_test = zscore(Y_test, axis=0)\n", |
2424 | 2452 | "\n", |
2425 | 2453 | "print(\"(n_samples_test, n_voxels) =\", Y_test.shape)" |
2426 | 2454 | ] |
|
2479 | 2507 | "\n", |
2480 | 2508 | "We define the same leave-one-run-out cross-validation split as in the\n", |
2481 | 2509 | "previous example.\n", |
2482 | | - "\n" |
| 2510 | + "\n", |
| 2511 | + "We define a cross-validation splitter, compatible with ``scikit-learn`` API." |
2483 | 2512 | ] |
2484 | 2513 | }, |
2485 | 2514 | { |
|
2493 | 2522 | "from sklearn.model_selection import check_cv\n", |
2494 | 2523 | "from voxelwise_tutorials.utils import generate_leave_one_run_out\n", |
2495 | 2524 | "\n", |
2496 | | - "# indice of first sample of each run\n", |
2497 | | - "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n", |
2498 | | - "print(run_onsets)" |
2499 | | - ] |
2500 | | - }, |
2501 | | - { |
2502 | | - "cell_type": "markdown", |
2503 | | - "metadata": {}, |
2504 | | - "source": [ |
2505 | | - "We define a cross-validation splitter, compatible with ``scikit-learn`` API.\n", |
2506 | | - "\n" |
2507 | | - ] |
2508 | | - }, |
2509 | | - { |
2510 | | - "cell_type": "code", |
2511 | | - "execution_count": null, |
2512 | | - "metadata": { |
2513 | | - "collapsed": false |
2514 | | - }, |
2515 | | - "outputs": [], |
2516 | | - "source": [ |
2517 | 2525 | "n_samples_train = X_train.shape[0]\n", |
2518 | 2526 | "cv = generate_leave_one_run_out(n_samples_train, run_onsets)\n", |
2519 | 2527 | "cv = check_cv(cv) # copy the cross-validation splitter into a reusable list" |
|
2964 | 2972 | "source": [ |
2965 | 2973 | "## Load the data\n", |
2966 | 2974 | "\n", |
2967 | | - "We first load the fMRI responses.\n", |
| 2975 | + "We first load and normalize the fMRI responses.\n", |
2968 | 2976 | "\n" |
2969 | 2977 | ] |
2970 | 2978 | }, |
|
2978 | 2986 | "source": [ |
2979 | 2987 | "import os\n", |
2980 | 2988 | "import numpy as np\n", |
| 2989 | + "from scipy.stats import zscore\n", |
2981 | 2990 | "from voxelwise_tutorials.io import load_hdf5_array\n", |
2982 | 2991 | "\n", |
2983 | 2992 | "file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n", |
2984 | 2993 | "Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n", |
2985 | 2994 | "Y_test = load_hdf5_array(file_name, key=\"Y_test\")\n", |
2986 | 2995 | "\n", |
2987 | 2996 | "print(\"(n_samples_train, n_voxels) =\", Y_train.shape)\n", |
2988 | | - "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)" |
| 2997 | + "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)\n", |
| 2998 | + "\n", |
| 2999 | + "# indice of first sample of each run\n", |
| 3000 | + "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n", |
| 3001 | + "\n", |
| 3002 | + "# zscore each training run separately\n", |
| 3003 | + "Y_train = np.split(Y_train, run_onsets[1:])\n", |
| 3004 | + "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n", |
| 3005 | + "# zscore each test run separately\n", |
| 3006 | + "Y_test = zscore(Y_test, axis=1)" |
2989 | 3007 | ] |
2990 | 3008 | }, |
2991 | 3009 | { |
2992 | 3010 | "cell_type": "markdown", |
2993 | 3011 | "metadata": {}, |
2994 | 3012 | "source": [ |
2995 | 3013 | "We average the test repeats, to remove the non-repeatable part of fMRI\n", |
2996 | | - "responses.\n", |
2997 | | - "\n" |
| 3014 | + "responses, and normalize the average across repeats." |
2998 | 3015 | ] |
2999 | 3016 | }, |
3000 | 3017 | { |
|
3006 | 3023 | "outputs": [], |
3007 | 3024 | "source": [ |
3008 | 3025 | "Y_test = Y_test.mean(0)\n", |
| 3026 | + "Y_test = zscore(Y_test, axis=0)\n", |
3009 | 3027 | "\n", |
3010 | 3028 | "print(\"(n_samples_test, n_voxels) =\", Y_test.shape)" |
3011 | 3029 | ] |
|
3457 | 3475 | "## Load the data\n", |
3458 | 3476 | "\n", |
3459 | 3477 | "As in the previous examples, we first load the fMRI responses, which are our\n", |
3460 | | - "regression targets.\n", |
3461 | | - "\n" |
| 3478 | + "regression targets. We then normalize the data independently for each run." |
3462 | 3479 | ] |
3463 | 3480 | }, |
3464 | 3481 | { |
3465 | 3482 | "cell_type": "code", |
3466 | 3483 | "execution_count": null, |
3467 | | - "metadata": { |
3468 | | - "collapsed": false |
3469 | | - }, |
| 3484 | + "metadata": {}, |
3470 | 3485 | "outputs": [], |
3471 | 3486 | "source": [ |
3472 | 3487 | "import os\n", |
3473 | 3488 | "import numpy as np\n", |
| 3489 | + "from scipy.stats import zscore\n", |
3474 | 3490 | "from voxelwise_tutorials.io import load_hdf5_array\n", |
3475 | 3491 | "\n", |
3476 | 3492 | "file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n", |
3477 | 3493 | "Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n", |
3478 | 3494 | "Y_test = load_hdf5_array(file_name, key=\"Y_test\")\n", |
3479 | 3495 | "\n", |
3480 | 3496 | "print(\"(n_samples_train, n_voxels) =\", Y_train.shape)\n", |
3481 | | - "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)" |
| 3497 | + "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)\n", |
| 3498 | + "\n", |
| 3499 | + "# indice of first sample of each run\n", |
| 3500 | + "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n", |
| 3501 | + "\n", |
| 3502 | + "# zscore each training run separately\n", |
| 3503 | + "Y_train = np.split(Y_train, run_onsets[1:])\n", |
| 3504 | + "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n", |
| 3505 | + "# zscore each test run separately\n", |
| 3506 | + "Y_test = zscore(Y_test, axis=1)" |
3482 | 3507 | ] |
3483 | 3508 | }, |
3484 | 3509 | { |
|
3511 | 3536 | "metadata": {}, |
3512 | 3537 | "source": [ |
3513 | 3538 | "We average the test repeats, to remove the non-repeatable part of fMRI\n", |
3514 | | - "responses.\n", |
3515 | | - "\n" |
| 3539 | + "responses, and normalize the averaged data." |
3516 | 3540 | ] |
3517 | 3541 | }, |
3518 | 3542 | { |
|
3524 | 3548 | "outputs": [], |
3525 | 3549 | "source": [ |
3526 | 3550 | "Y_test = Y_test.mean(0)\n", |
| 3551 | + "Y_test = zscore(Y_test, axis=0)\n", |
3527 | 3552 | "\n", |
3528 | 3553 | "print(\"(n_samples_test, n_voxels) =\", Y_test.shape)" |
3529 | 3554 | ] |
|
0 commit comments