|
7 | 7 | "source": [ |
8 | 8 | "# Basics\n", |
9 | 9 | "\n", |
10 | | - "> An end-to-end tutorial on how to use the dabest.\n", |
| 10 | + "> An end-to-end tutorial on how to use the dabest library.\n", |
11 | 11 | "\n", |
12 | 12 | "- order: 1" |
13 | 13 | ] |
|
17 | 17 | "id": "c964abcb", |
18 | 18 | "metadata": {}, |
19 | 19 | "source": [ |
20 | | - "## Load Libraries" |
| 20 | + "## Load libraries" |
21 | 21 | ] |
22 | 22 | }, |
23 | 23 | { |
|
55 | 55 | "id": "c45f63cd", |
56 | 56 | "metadata": {}, |
57 | 57 | "source": [ |
58 | | - "Here, we create a dataset to illustrate how ``dabest`` functions. In\n", |
| 58 | + "Here, we create a dataset to illustrate how ``dabest`` works. In\n", |
59 | 59 | "this dataset, each column corresponds to a group of observations." |
60 | 60 | ] |
61 | 61 | }, |
|
68 | 68 | "source": [ |
69 | 69 | "from scipy.stats import norm # Used in generation of populations.\n", |
70 | 70 | "\n", |
71 | | - "np.random.seed(9999) # Fix the seed so the results are replicable.\n", |
72 | | - "# pop_size = 10000 # Size of each population.\n", |
| 71 | + "np.random.seed(9999) # Fix the seed to ensure reproducibility of results.\n", |
| 72 | + "\n", |
73 | 73 | "Ns = 20 # The number of samples taken from each population\n", |
74 | 74 | "\n", |
75 | 75 | "# Create samples\n", |
|
102 | 102 | " })" |
103 | 103 | ] |
104 | 104 | }, |
| 105 | + { |
| 106 | + "cell_type": "code", |
| 107 | + "execution_count": null, |
| 108 | + "id": "142607a1", |
| 109 | + "metadata": {}, |
| 110 | + "outputs": [], |
| 111 | + "source": [] |
| 112 | + }, |
105 | 113 | { |
106 | 114 | "cell_type": "markdown", |
107 | 115 | "id": "51097f12", |
108 | 116 | "metadata": {}, |
109 | 117 | "source": [ |
110 | 118 | "Note that we have 9 groups (3 Control samples and 6 Test samples). Our\n", |
111 | | - "dataset also has a non\\-numerical column indicating gender, and another\n", |
| 119 | + "dataset has also a non\\-numerical column indicating gender, and another\n", |
112 | 120 | "column indicating the identity of each observation." |
113 | 121 | ] |
114 | 122 | }, |
|
117 | 125 | "id": "e975d14a", |
118 | 126 | "metadata": {}, |
119 | 127 | "source": [ |
120 | | - "This is known as a 'wide' dataset. See this \n", |
| 128 | + "This is known as a *wide* dataset. See this \n", |
121 | 129 | "[writeup](https://sejdemyr.github.io/r-tutorials/basics/wide-and-long/) \n", |
122 | 130 | "for more details." |
123 | 131 | ] |
|
267 | 275 | "id": "7dd2c3f4", |
268 | 276 | "metadata": {}, |
269 | 277 | "source": [ |
270 | | - "## Loading Data" |
| 278 | + "## Loading data" |
271 | 279 | ] |
272 | 280 | }, |
273 | 281 | { |
274 | 282 | "cell_type": "markdown", |
275 | 283 | "id": "eda4a39f", |
276 | 284 | "metadata": {}, |
277 | 285 | "source": [ |
278 | | - "Before we create estimation plots and obtain confidence intervals for\n", |
279 | | - "our effect sizes, we need to load the data and the relevant groups.\n", |
| 286 | + "Before creating estimation plots and obtaining confidence intervals for our effect sizes, we need to load the data and specify the relevant groups.\n", |
280 | 287 | "\n", |
281 | | - "We simply supply the DataFrame to ``dabest.load()``. We also must supply\n", |
282 | | - "the two groups you want to compare in the ``idx`` argument as a tuple or\n", |
283 | | - "list." |
| 288 | + "We can achieve this by supplying the dataframe to ``dabest.load()``. Additionally, we must provide the two groups to be compared in the ``idx`` argument as a tuple or list." |
284 | 289 | ] |
285 | 290 | }, |
286 | 291 | { |
|
345 | 350 | "id": "f71a2c3d", |
346 | 351 | "metadata": {}, |
347 | 352 | "source": [ |
348 | | - "You can change the width of the confidence interval that will be\n", |
349 | | - "produced by manipulating the ``ci`` argument." |
| 353 | + "You can change the width of the confidence interval by manipulating the ``ci`` argument." |
350 | 354 | ] |
351 | 355 | }, |
352 | 356 | { |
|
402 | 406 | "id": "837ffe5c", |
403 | 407 | "metadata": {}, |
404 | 408 | "source": [ |
405 | | - "``dabest`` now features a range of effect sizes:\n", |
| 409 | + "The **dabest** library now features a range of effect sizes:\n", |
| 410 | + "\n", |
406 | 411 | " - the mean difference (``mean_diff``)\n", |
407 | 412 | " - the median difference (``median_diff``)\n", |
408 | 413 | " - [Cohen's d](https://en.wikipedia.org/wiki/Effect_size#Cohen's_d) (``cohens_d``)\n", |
|
457 | 462 | "\"unpaired mean difference\"). The confidence interval is reported as:\n", |
458 | 463 | "[*confidenceIntervalWidth* *LowerBound*, *UpperBound*]\n", |
459 | 464 | "\n", |
460 | | - "This confidence interval is generated through bootstrap resampling. See\n", |
461 | | - ":doc:`bootstraps` for more details.\n", |
| 465 | + "This confidence interval is generated through bootstrap resampling. See :doc:`bootstraps` for more details.\n", |
462 | 466 | "\n", |
463 | | - "Since v0.3.0, DABEST will report the p-value of the [non-parametric two-sided approximate permutation t-test](https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests). This is also known as the Monte Carlo permutation test.\n", |
| 467 | + "Since v0.3.0, DABEST will report the p-value of the [non-parametric two-sided approximate permutation t-test](https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests). This is also known as *the Monte Carlo permutation test*.\n", |
464 | 468 | "\n", |
465 | 469 | "For unpaired comparisons, the p-values and test statistics of [Welch's t test](https://en.wikipedia.org/wiki/Welch%27s_t-test>), \n", |
466 | 470 | "[Student's t test](https://en.wikipedia.org/wiki/Student%27s_t-test), \n", |
467 | | - "and [Mann-Whitney U test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test) can be found in addition. For paired comparisons, the p-values and test statistics of the \n", |
| 471 | + "and [Mann-Whitney U test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test) can be found. For paired comparisons, the p-values and test statistics of the \n", |
468 | 472 | "[paired Student's t](https://en.wikipedia.org/wiki/Student%27s_t-test#Paired_samples)\n", |
469 | 473 | "and [Wilcoxon](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test) tests are presented.\n" |
470 | 474 | ] |
|
695 | 699 | "id": "2548d82c", |
696 | 700 | "metadata": {}, |
697 | 701 | "source": [ |
698 | | - "Let's compute the Hedges' *g* for our comparison." |
| 702 | + "Let's compute the *Hedges'g* for our comparison." |
699 | 703 | ] |
700 | 704 | }, |
701 | 705 | { |
|
869 | 873 | "id": "b451ab38", |
870 | 874 | "metadata": {}, |
871 | 875 | "source": [ |
872 | | - "To produce a **Gardner-Altman estimation plot**, simply use the\n", |
873 | | - "``.plot()`` method. You can read more about its genesis and design\n", |
| 876 | + "To generate a **Gardner-Altman estimation plot**, simply use the\n", |
| 877 | + "``.plot()`` method. You can learn more about its genesis and design\n", |
874 | 878 | "inspiration at :doc:`robust-beautiful`.\n", |
875 | 879 | "\n", |
876 | | - "Every effect size instance has access to the ``.plot()`` method. This\n", |
877 | | - "means you can quickly create plots for different effect sizes easily." |
| 880 | + "Each instance of an effect size has access to the ``.plot()`` method. This allows you to quickly create plots for different effect sizes with ease." |
878 | 881 | ] |
879 | 882 | }, |
880 | 883 | { |
|
924 | 927 | "id": "5b566185", |
925 | 928 | "metadata": {}, |
926 | 929 | "source": [ |
927 | | - "Instead of a Gardner-Altman plot, you can produce a **Cumming estimation\n", |
| 930 | + "Instead of a Gardner-Altman plot, you can generate a **Cumming estimation\n", |
928 | 931 | "plot** by setting ``float_contrast=False`` in the ``plot()`` method.\n", |
929 | 932 | "This will plot the bootstrap effect sizes below the raw data, and also\n", |
930 | 933 | "displays the the mean (gap) and ± standard deviation of each group\n", |
|
966 | 969 | "``dabest.load()`` is first invoked.\n", |
967 | 970 | "\n", |
968 | 971 | "Thus, the lower axes in the Cumming plot is effectively a [forest\n", |
969 | | - "plot](https://en.wikipedia.org/wiki/Forest_plot), used in\n", |
970 | | - "meta-analyses to aggregate and compare data from different experiments." |
| 972 | + "plot](https://en.wikipedia.org/wiki/Forest_plot), commonly used in\n", |
| 973 | + "meta-analyses to aggregate and to compare data from different experiments." |
971 | 974 | ] |
972 | 975 | }, |
973 | 976 | { |
|
1132 | 1135 | "id": "0848f20b", |
1133 | 1136 | "metadata": {}, |
1134 | 1137 | "source": [ |
1135 | | - "``dabest`` thus empowers you to robustly perform and elegantly present\n", |
1136 | | - "complex visualizations and statistics." |
| 1138 | + "Thus ``dabest`` empowers you to perform robust analyses and present complex visualizations of your statistics elegantly." |
1137 | 1139 | ] |
1138 | 1140 | }, |
1139 | 1141 | { |
|
1268 | 1270 | "id": "1f532032", |
1269 | 1271 | "metadata": {}, |
1270 | 1272 | "source": [ |
1271 | | - "``dabest`` can also work with 'melted' or 'long' data. This term is so\n", |
1272 | | - "used because each row will now correspond to a single datapoint, with\n", |
1273 | | - "one column carrying the value and other columns carrying 'metadata'\n", |
1274 | | - "describing that datapoint.\n", |
| 1273 | + "``dabest`` can also handle 'melted' or 'long' data. This term is used because each row now corresponds to a single data point, with one column carrying the value and other columns containing 'metadata'\n", |
| 1274 | + "describing that data point.\n", |
1275 | 1275 | "\n", |
1276 | | - "More details on wide vs long or 'melted' data can be found in this\n", |
| 1276 | + "For more details on wide vs long or 'melted' data, refer to this\n", |
1277 | 1277 | "[Wikipedia article](https://en.wikipedia.org/wiki/Wide_and_narrow_data). The\n", |
1278 | 1278 | "[pandas documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html)\n", |
1279 | | - "gives recipes for melting dataframes.\n" |
| 1279 | + "provides recipes for melting dataframes.\n" |
1280 | 1280 | ] |
1281 | 1281 | }, |
1282 | 1282 | { |
|
1386 | 1386 | "id": "1ffb38fa", |
1387 | 1387 | "metadata": {}, |
1388 | 1388 | "source": [ |
1389 | | - "When your data is in this format, you will need to specify the ``x`` and\n", |
| 1389 | + "When your data is in this format, you need to specify the ``x`` and\n", |
1390 | 1390 | "``y`` columns in ``dabest.load()``.\n" |
1391 | 1391 | ] |
1392 | 1392 | }, |
|
1443 | 1443 | "source": [ |
1444 | 1444 | "analysis_of_long_df.mean_diff.plot();" |
1445 | 1445 | ] |
1446 | | - }, |
1447 | | - { |
1448 | | - "cell_type": "code", |
1449 | | - "execution_count": null, |
1450 | | - "id": "ec5c9c8b", |
1451 | | - "metadata": {}, |
1452 | | - "outputs": [], |
1453 | | - "source": [] |
1454 | 1446 | } |
1455 | 1447 | ], |
1456 | 1448 | "metadata": { |
|
0 commit comments