Skip to content

Commit 7057e18

Browse files
committed
Documentation changes
- Thanks @sangyu for making changes in delta-delta documentation and corresponding API signatures. - Add line breaks for visual effect on tutorial pages. - Add notes for median difference in the documentation
1 parent f0eb6c0 commit 7057e18

4 files changed

Lines changed: 95 additions & 47 deletions

File tree

dabest/_classes.py

Lines changed: 39 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -451,6 +451,7 @@ def mean_diff(self):
451451
\\text{Mean difference} = \\overline{x}_{Test} - \\overline{x}_{Control}
452452
453453
where :math:`\\overline{x}` is the mean for the group :math:`x`.
454+
454455
"""
455456
return self.__mean_diff
456457

@@ -459,7 +460,8 @@ def mean_diff(self):
459460
def median_diff(self):
460461
"""
461462
Returns an :py:class:`EffectSizeDataFrame` for the median difference, its confidence interval, and relevant statistics, for all comparisons as indicated via the `idx` and `paired` argument in `dabest.load()`.
462-
463+
464+
463465
Example
464466
-------
465467
>>> from scipy.stats import norm
@@ -471,7 +473,8 @@ def median_diff(self):
471473
"test": test})
472474
>>> my_dabest_object = dabest.load(my_df, idx=("control", "test"))
473475
>>> my_dabest_object.median_diff
474-
476+
477+
475478
Notes
476479
-----
477480
This is the median difference between the control group and the test group.
@@ -487,6 +490,15 @@ def median_diff(self):
487490
488491
.. math::
489492
\\text{Median difference} = \\widetilde{x}_{Test - Control}
493+
494+
495+
Things to note
496+
--------------
497+
Using median difference as the statistic in bootstrapping may result in a biased estimate and cause problems with BCa confidence intervals. Consider using mean difference instead.
498+
499+
When plotting, consider using percentile confidence intervals instead of BCa confidence intervals by specifying `ci_type = 'percentile'` in .plot().
500+
501+
For detailed information, please refer to `Issue 129 <https://github.com/ACCLAB/DABEST-python/issues/129>`_.
490502
491503
"""
492504
return self.__median_diff
@@ -549,6 +561,7 @@ def cohens_d(self):
549561
https://en.wikipedia.org/wiki/Effect_size#Cohen's_d
550562
https://en.wikipedia.org/wiki/Bessel%27s_correction
551563
https://en.wikipedia.org/wiki/Standard_deviation#Corrected_sample_standard_deviation
564+
552565
"""
553566
return self.__cohens_d
554567

@@ -588,6 +601,7 @@ def cohens_h(self):
588601
589602
References:
590603
https://en.wikipedia.org/wiki/Cohen%27s_h
604+
591605
"""
592606
return self.__cohens_h
593607

@@ -630,6 +644,7 @@ def hedges_g(self):
630644
References:
631645
https://en.wikipedia.org/wiki/Effect_size#Hedges'_g
632646
https://journals.sagepub.com/doi/10.3102/10769986006002107
647+
633648
"""
634649
return self.__hedges_g
635650

@@ -669,6 +684,7 @@ def cliffs_delta(self):
669684
References:
670685
https://en.wikipedia.org/wiki/Effect_size#Effect_size_for_ordinal_data
671686
https://psycnet.apa.org/record/1994-08169-001
687+
672688
"""
673689
return self.__cliffs_delta
674690

@@ -857,28 +873,37 @@ def _all_plot_groups(self):
857873

858874
class DeltaDelta(object):
859875
"""
860-
A class to compute and store the delta-delta statistics. In a 2-by-2 arrangement where two independent variables, A and B, each have two categorical values, two primary deltas are first calculated with one independent variable and a delta-delta effect size is calculated as a difference between the two primary deltas.
876+
A class to compute and store the delta-delta statistics for experiments with a 2-by-2 arrangement where two independent variables, A and B, each have two categorical values, 1 and 2. The data is divided into two pairs of two groups, and a primary delta is first calculated as the mean difference between each of the pairs:
861877
862878
.. math::
863879
864-
\\hat{\\theta}_{B1} = \\overline{X}_{A2, B1} - \\overline{X}_{A1, B1}
880+
\\Delta_{1} = \\overline{X}_{A_{2}, B_{1}} - \\overline{X}_{A_{1}, B_{1}}
865881
866-
\\hat{\\theta}_{B2} = \\overline{X}_{A2, B2} - \\overline{X}_{A1, B2}
882+
\\Delta_{2} = \\overline{X}_{A_{2}, B_{2}} - \\overline{X}_{A_{1}, B_{2}}
867883
884+
where :math:`\overline{X}_{A_{i}, B_{j}}` is the mean of the sample with A = i and B = j, :math:`\\Delta` is the mean difference between two samples.
885+
886+
A delta-delta value is then calculated as the mean difference between the two primary deltas:
887+
868888
.. math::
869889
870-
\\hat{\\theta}_{\\theta} = \\hat{\\theta}_{B2} - \\hat{\\theta}_{B1}
890+
\\Delta_{\\Delta} = \\Delta_{B_{2}} - \\Delta_{B_{1}}
871891
872892
and:
873893
894+
and the standard deviation of the delta-delta value is calculated from a pooled variance of the 4 samples:
895+
874896
.. math::
875897
876-
s_{\\theta} = \\frac{(n_{A2, B1}-1)s_{A2, B1}^2+(n_{A1, B1}-1)s_{A1, B1}^2+(n_{A2, B2}-1)s_{A2, B2}^2+(n_{A1, B2}-1)s_{A1, B2}^2}{(n_{A2, B1} - 1) + (n_{A1, B1} - 1) + (n_{A2, B2} - 1) + (n_{A1, B2} - 1)}
898+
s_{\\Delta_{\\Delta}} = \\sqrt{\\frac{(n_{A_{2}, B_{1}}-1)s_{A_{2}, B_{1}}^2+(n_{A_{1}, B_{1}}-1)s_{A_{1}, B_{1}}^2+(n_{A_{2}, B_{2}}-1)s_{A_{2}, B_{2}}^2+(n_{A_{1}, B_{2}}-1)s_{A_{1}, B_{2}}^2}{(n_{A_{2}, B_{1}} - 1) + (n_{A_{1}, B_{1}} - 1) + (n_{A_{2}, B_{2}} - 1) + (n_{A_{1}, B_{2}} - 1)}}
899+
900+
where :math:`s` is the standard deviation and :math:`n` is the sample size.
877901
878902
Example
879903
-------
880904
>>> import numpy as np
881905
>>> import pandas as pd
906+
>>> import dabest
882907
>>> from scipy.stats import norm # Used in generation of populations.
883908
>>> np.random.seed(9999) # Fix the seed so the results are replicable.
884909
>>> from scipy.stats import norm # Used in generation of populations.
@@ -887,16 +912,16 @@ class DeltaDelta(object):
887912
>>> y = norm.rvs(loc=3, scale=0.4, size=N*4)
888913
>>> y[N:2*N] = y[N:2*N]+1
889914
>>> y[2*N:3*N] = y[2*N:3*N]-0.5
890-
>>> # Add drug column
915+
>>> # Add a `Treatment` column
891916
>>> t1 = np.repeat('Placebo', N*2).tolist()
892917
>>> t2 = np.repeat('Drug', N*2).tolist()
893918
>>> treatment = t1 + t2
894-
>>> # Add a `rep` column as the first variable for the 2 replicates of experiments done
919+
>>> # Add a `Rep` column as the first variable for the 2 replicates of experiments done
895920
>>> rep = []
896921
>>> for i in range(N*2):
897922
>>> rep.append('Rep1')
898923
>>> rep.append('Rep2')
899-
>>> # Add a `genotype` column as the second variable
924+
>>> # Add a `Genotype` column as the second variable
900925
>>> wt = np.repeat('W', N).tolist()
901926
>>> mt = np.repeat('M', N).tolist()
902927
>>> wt2 = np.repeat('W', N).tolist()
@@ -909,10 +934,12 @@ class DeltaDelta(object):
909934
>>> df_delta2 = pd.DataFrame({'ID' : id_col,
910935
>>> 'Rep' : rep,
911936
>>> 'Genotype' : genotype,
912-
>>> 'Drug': treatment,
937+
>>> 'Treatment': treatment,
913938
>>> 'Y' : y
914939
>>> })
915-
940+
>>> unpaired_delta2 = dabest.load(data = df_delta2, x = ["Genotype", "Genotype"], y = "Y", delta2 = True, experiment = "Treatment")
941+
>>> unpaired_delta2.mean_diff.plot()
942+
916943
917944
918945

dabest/_stats_tools/effsize.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -78,12 +78,12 @@ def two_group_difference(control, test, is_paired=None,
7878
return func_difference(control, test, np.mean, is_paired)
7979

8080
elif effect_size == "median_diff":
81-
mes1 = "Using median as the statistic in bootstrapping may \
82-
result in a biased estimate and cause problems with \
83-
BCa confidence intervals. Consider using a different statistic, such as the mean.\n"
84-
mes2 = "When plotting, please consider using percetile confidence intervals\
85-
by specifying `ci_type='percentile'`. For detailed information, \
86-
refer to https://github.com/ACCLAB/DABEST-python/issues/129"
81+
mes1 = "Using median as the statistic in bootstrapping may " + \
82+
"result in a biased estimate and cause problems with " + \
83+
"BCa confidence intervals. Consider using a different statistic, such as the mean.\n"
84+
mes2 = "When plotting, please consider using percetile confidence intervals " + \
85+
"by specifying `ci_type='percentile'`. For detailed information, " + \
86+
"refer to https://github.com/ACCLAB/DABEST-python/issues/129 \n"
8787
warnings.warn(message=mes1+mes2, category=UserWarning)
8888
return func_difference(control, test, np.median, is_paired)
8989

docs/source/deltadelta.rst

Lines changed: 49 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Effectively, we have 4 groups of subjects for comparison.
3535
<thead>
3636
<tr style="text-align: right;">
3737
<th></th>
38-
<th>Wildtype</th>
38+
<th>Wild type</th>
3939
<th>Mutant</th>
4040
</tr>
4141
</thead>
@@ -60,17 +60,33 @@ Effectively, we have 4 groups of subjects for comparison.
6060
</div>
6161

6262

63-
There are 2 ``Treatment`` conditions, ``Placebo`` (control group) and ``Drug`` (test group). There are 2 ``Genotype`` s: ``W`` (wildtype population) and ``M`` (mutant population). In addition, each experiment was done twice (``Rep1`` and ``Rep2``). We shall do a few analyses to visualise these differences in a simulated dataset.
63+
There are 2 ``Treatment`` conditions, ``Placebo`` (control group) and ``Drug`` (test group). There are 2 ``Genotype``\s: ``W`` (wild type population) and ``M`` (mutant population). In addition, each experiment was done twice (``Rep1`` and ``Rep2``). We shall do a few analyses to visualise these differences in a simulated dataset.
6464

65-
Simulate a dataset
66-
------------------
65+
Load Libraries
66+
--------------
6767

6868
.. code-block:: python3
6969
:linenos:
7070
7171
7272
import numpy as np
7373
import pandas as pd
74+
import dabest
75+
76+
print("We're using DABEST v{}".format(dabest.__version__))
77+
78+
79+
.. parsed-literal::
80+
81+
We're using DABEST v2023.02.14
82+
83+
84+
Simulate a dataset
85+
------------------
86+
87+
.. code-block:: python3
88+
:linenos:
89+
7490
from scipy.stats import norm # Used in generation of populations.
7591
7692
np.random.seed(seed) # Fix the seed so the results are replicable.
@@ -83,18 +99,18 @@ Simulate a dataset
8399
y[N:2*N] = y[N:2*N]+1
84100
y[2*N:3*N] = y[2*N:3*N]-0.5
85101
86-
# Add drug column
102+
# Add a `Treatment` column
87103
t1 = np.repeat('Placebo', N*2).tolist()
88104
t2 = np.repeat('Drug', N*2).tolist()
89105
treatment = t1 + t2
90106
91-
# Add a `rep` column as the first variable for the 2 replicates of experiments done
107+
# Add a `Rep` column as the first variable for the 2 replicates of experiments done
92108
rep = []
93109
for i in range(N*2):
94110
rep.append('Rep1')
95111
rep.append('Rep2')
96112
97-
# Add a `genotype` column as the second variable
113+
# Add a `Genotype` column as the second variable
98114
wt = np.repeat('W', N).tolist()
99115
mt = np.repeat('M', N).tolist()
100116
wt2 = np.repeat('W', N).tolist()
@@ -112,7 +128,7 @@ Simulate a dataset
112128
df_delta2 = pd.DataFrame({'ID' : id_col,
113129
'Rep' : rep,
114130
'Genotype' : genotype,
115-
'Drug': treatment,
131+
'Treatment': treatment,
116132
'Y' : y
117133
})
118134
@@ -206,8 +222,7 @@ for slopegraphs. We use the ``experiment`` input to specify grouping of the data
206222
.. code-block:: python3
207223
:linenos:
208224
209-
unpaired_delta2 = dabest.load(data = df_delta2, x = ["Genotype", "Genotype"], y = "Y", delta2 = True,
210-
experiment = "Drug")
225+
unpaired_delta2 = dabest.load(data = df_delta2, x = ["Genotype", "Genotype"], y = "Y", delta2 = True, experiment = "Treatment")
211226
212227
The above function creates the following object:
213228

@@ -279,26 +294,31 @@ administered, the mutant phenotype is around 1.23 [95%CI 0.948, 1.52]. This diff
279294
and ``Drug`` group are plotted at the right bottom with a separate y-axis from other bootstrap plots.
280295
This effect size, at about -0.903 [95%CI -1.26, -0.535], is the net effect size of the drug treatment. That is to say that treatment with drug A reduced disease phenotype by 0.903.
281296

297+
Mean difference between mutants and wild types given the placebo treatment is:
298+
282299
.. math::
283300
284-
\hat{\theta}_{P} = \overline{X}_{P, M} - \overline{X}_{P, W}
301+
\Delta_{1} = \overline{X}_{P, M} - \overline{X}_{P, W}
302+
303+
Mean difference between mutants and wild types given the drug treatment is:
285304

286-
\hat{\theta}_{D} = \overline{X}_{D, M} - \overline{X}_{D, W}
287-
288305
.. math::
289306
307+
\Delta_{2} = \overline{X}_{D, M} - \overline{X}_{D, W}
290308
291-
\hat{\theta}_{\theta} = \hat{\theta}_{D} - \hat{\theta}_{P}
309+
The net effect of the drug on mutants is:
292310

293-
and:
294-
295311
.. math::
296312
297-
s_{\theta} = \frac{(n_{P, M}-1)s_{P, M}^2+(n_{P, W}-1)s_{P, W}^2+(n_{D, M}-1)s_{D, M}^2+(n_{D, M}-1)s_{D, M}^2}{(n_{P, M} - 1) + (n_{P, W} - 1) + (n_{D, M} - 1) + (n_{D, M} - 1)}
298313
314+
\Delta_{\Delta} = \Delta_{2} - \Delta_{1}
315+
316+
317+
where :math:`\overline{X}` is the sample mean, :math:`\Delta` is the mean difference.
299318

300319

301-
where :math:`\overline{X}` is the sample mean, :math:`\hat{\theta}` is the mean difference, :math:`s` is the variance and :math:`n` is the sample size.
320+
Specifying Grouping for Comparisons
321+
-----------------------------------
302322

303323

304324
In the example above, we used the convention of "test - control' but you can manipulate the orders of experiment groups as well as the horizontal axis variable by setting ``experiment_label`` and ``x1_level``.
@@ -334,28 +354,29 @@ We produce the following plot:
334354

335355
.. image:: _images/tutorial_108_0.png
336356

337-
We see that the drug had a non-specific effect of -0.321 [95%CI -0.498, -0.131] on wildtype subjects even when they were not sick, and it had a bigger effect of -1.22 [95%CI -1.52, -0.906] in mutant subjects. In this visualisation, we can see the delta-delta value of -0.903 [95%CI -1.21, -0.587] as the net effect of the drug accounting for non-specific actions in healthy individuals.
338-
339-
.. math::
357+
We see that the drug had a non-specific effect of -0.321 [95%CI -0.498, -0.131] on wild type subjects even when they were not sick, and it had a bigger effect of -1.22 [95%CI -1.52, -0.906] in mutant subjects. In this visualisation, we can see the delta-delta value of -0.903 [95%CI -1.21, -0.587] as the net effect of the drug accounting for non-specific actions in healthy individuals.
340358

341-
\hat{\theta}_{W} = \overline{X}_{D, W} - \overline{X}_{P, W}
342359

343-
\hat{\theta}_{W} = \overline{X}_{D, M} - \overline{X}_{P, M}
360+
Mean difference between drug and placebo treatments in wild type subjects is:
344361

345362
.. math::
346363
347-
\hat{\theta}_{\theta} = \hat{\theta}_{M} - \hat{\theta}_{W}
348-
349-
and:
364+
\Delta_{1} = \overline{X}_{D, W} - \overline{X}_{P, W}
365+
366+
Mean difference between drug and placebo treatments in mutant subjects is:
350367

351368
.. math::
352369
353-
s_{\theta} = \frac{(n_{D, W}-1)s_{D, W}^2+(n_{P, W}-1)s_{P, W}^2+(n_{D, M}-1)s_{D, M}^2+(n_{P, M}-1)s_{P, M}^2}{(n_{D, W} - 1) + (n_{P, W} - 1) + (n_{D, M} - 1) + (n_{P, M} - 1)}
370+
\Delta_{2} = \overline{X}_{D, M} - \overline{X}_{P, M}
354371
355372
373+
The net effect of the drug on mutants is:
356374

357-
where :math:`\overline{X}` is the sample mean, :math:`\hat{\theta}` is the mean difference, :math:`s` is the variance and :math:`n` is the sample size.
375+
.. math::
358376
377+
\Delta_{\Delta} = \Delta_{2} - \Delta_{1}
378+
379+
where :math:`\overline{X}` is the sample mean, :math:`\Delta` is the mean difference.
359380

360381

361382
Connection to ANOVA

docs/source/proportion-plot.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -411,7 +411,7 @@ Repeated measures is also supported in paired proportional plot, by changing the
411411
412412
.. image:: _images/sankey_4.png
413413

414-
From the above two images, we can see that the on both the observed value plot and delta plot, the pairs compared are different in terms of the paired settings.
414+
From the above two images, we can see that the on both the observed value plot and delta plot, the pairs compared are different in terms of the paired settings. And for detailed information about repeated measures, please refer to :doc:`repeatedmeasures` .
415415

416416
If you want to specify the order of the groups, you can use the ``idx`` parameter in the ``.load()`` method.
417417

0 commit comments

Comments
 (0)