iqtree
diff --git a/‎doc/AliSim.md‎
Lines changed: 12 additions & 3 deletions b/‎doc/AliSim.md‎
Lines changed: 12 additions & 3 deletions
diff --git a/‎doc/Command-Reference.md‎
Lines changed: 47 additions & 5 deletions b/‎doc/Command-Reference.md‎
Lines changed: 47 additions & 5 deletions
diff --git a/‎doc/Compilation-Guide.md‎
Lines changed: 25 additions & 8 deletions b/‎doc/Compilation-Guide.md‎
Lines changed: 25 additions & 8 deletions
diff --git a/‎doc/Complex-Models.md‎
Lines changed: 3 additions & 35 deletions b/‎doc/Complex-Models.md‎
Lines changed: 3 additions & 35 deletions
diff --git a/‎doc/Concordance-Factor.md‎
Lines changed: 3 additions & 3 deletions b/‎doc/Concordance-Factor.md‎
Lines changed: 3 additions & 3 deletions
@@ -55,11 +55,20 @@ Sequence simulators play an important role in phylogenetics. Simulated data has
 
 To use AliSim please make sure that you download the IQ-TREE version 2.2.0 or later.
 
-If you use AliSim please cite the following paper(s):
+If you use AliSim please cite:
 
-- Nhan Ly-Trong, Suha Naser-Khdour, Robert Lanfear, Bui Quang Minh, AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era, Molecular Biology and Evolution, Volume 39, Issue 5, May 2022, msac092, <https://doi.org/10.1093/molbev/msac092>
+- Nhan Ly-Trong, Giuseppe M.J. Barca, Bui Quang Minh (2023) 
+  AliSim-HPC: parallel sequence simulator for phylogenetics.
+  Bioinformatics, Volume 39, Issue 9, btad540.
+  <https://doi.org/10.1093/bioinformatics/btad540>
+
+For the original algorithms of AliSim please cite:
+
+- Nhan Ly-Trong, Suha Naser-Khdour, Robert Lanfear, Bui Quang Minh (2022)
+  AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era.
+  _Molecular Biology and Evolution_, Volume 39, Issue 5, msac092.
+  <https://doi.org/10.1093/molbev/msac092>
 
-- Nhan Ly-Trong, Giuseppe M.J. Barca, Bui Quang Minh, AliSim-HPC: parallel sequence simulator for phylogenetics, Bioinformatics, Volume 39, Issue 9, Sep 2023, btad540, <https://doi.org/10.1093/bioinformatics/btad540> (*for the parallel version*)
 
 
 Simulating an alignment from a tree and model
 
@@ -28,6 +28,8 @@ sections:
     url: site-specific-frequency-model-options
   - name: Tree search parameters
     url: tree-search-parameters
+  - name: Tree search for pathogen data
+    url: tree-search-for-pathogen-data
   - name: Ultrafast bootstrap parameters
     url: ultrafast-bootstrap-parameters
   - name: Nonparametric bootstrap
@@ -328,17 +330,16 @@ Further options:
 
 | Option | Usage and meaning |
 |----------|------------------------------------------------------------------------------|
-| `--link-exchange-rates` | Turn on linked exchangeability estimation for a profile mixture model. Note that the model must have specified `GTR20` exchangeabilities for eg.`GTR20+C20+G`. |
-| `--gtr20-model` | Specify the initial exchangeabilities for linked exchangeability estimation. Note that this must be used with `--link-exchange-rates.` |
-| `--rates-file` | Produces a nexus file with the exchangeability matrix obtained from the optimization. This file can be later used for phylogenetic inference with the use of the `-mdef` flag  |
+| `--link-exchange` | Turn on linked exchangeability estimation for a profile mixture model. Note that the model must have specified `GTR20` exchangeabilities for eg.`GTR20+C20+G`. This option also produces a nexus file `GTRPMIX.nex` with the exchangeability matrix obtained from the optimization. This file can be later used for phylogenetic inference with the use of the `-mdef` flag|
+| `--init-exchange` | Specify the initial exchangeabilities for linked exchangeability estimation. Note that this must be used with `--link-exchange`. |
 
 ### Example usages:
 
 * Estimate linked exchangeabilities for a protein alignment `prot.phy` under C60+G model and a guide tree `guide.treefile`, where optimization is initialized from LG exchangeabilities
 
-        iqtree -s prot.phy -m GTR20+C60+G --link-exchange-rates --gtr20-model LG -te guide.treefile
+        iqtree -s prot.phy -m GTR20+C60+G --link-exchange --init-exchange LG -te guide.treefile
 
->**NOTE**: For better and faster performance, read the [recommendations](Complex-Models#linked-gtr-exchangeabilities-models) provided in the Complex Models section.
+>**NOTE**: For better and faster performance, read the [recommendations](Estimating-amino-acid-substitution-models#estimating-linked-exchangeabilities) provided in the Estimating amino acid substitution models section.
 
 
 Rate heterogeneity
@@ -432,6 +433,46 @@ The new IQ-TREE search algorithm ([Nguyen et al., 2015]) has several parameters
 
         iqtree -s data.phy -m TEST -g constraint.tree
 
+Tree search for pathogen data
+-----------------------------
+<div class="hline"></div>
+
+For pathogen data such as SARS-CoV-2 virus alignments, version 2.3.4.cmaple implements
+the MAPLE algorithm ([De Maio et al., 2023]) that performs tree search very quickly by
+exploiting the low divergent property of the sequences (i.e., sequences in the alignment
+are very similar to each other).
+
+| Option | Usage and meaning |
+|----------|------------------------------------------------------------------------------|
+| `--pathogen` | Apply CMAPLE tree search algorithm if sequence divergence is low, otherwise, apply IQ-TREE algorithm. |
+| `--pathogen-force` | Apply CMAPLE tree search algorithm regardless of sequence divergence. |
+| `-alrt`   | Specify number of replicates (>=1000) to perform SH-like approximate likelihood ratio test (SH-aLRT) ([Guindon et al., 2010]). |
+| `-T` | Specify the number of CPU cores to use only for the SH-aLRT test. If `-T AUTO` is specified, IQ-TREE will use all available cores. NOTE: this option has no effect on tree search, which is still single-threaded. |
+
+### Example usages:
+
+* Infer a maximum-likelihood tree for an alignment, automatically switching to CMAPLE algorithm 
+  if sequence divergence is low:
+
+        iqtree2 -s data.phy --pathogen --prefix pathogen
+        
+It will print two output files:
+
+* `pathogen.treefile`: The best approximate maximum-likelihood tree in NEWICK format.
+* `pathogen.log`: The log file.
+
+
+If you want to do other analyses on this tree and thus saving the tree search time, 
+add `-te pathogen.treefile` to the command line of a subsequent IQ-TREE run to fix this tree topology
+and remove `--pathogen` option to invoke the default IQ-TREE machinery.
+
+* Infer a tree like above and additionally assign branch supports using SH-aLRT test 
+  with 1000 replicates using 4 CPU cores:
+
+        iqtree2 -s data.phy --pathogen --alrt 1000 -T 4 --prefix pathogen
+
+The tree `pathogen.treefile` will contain branch supports for all internal branches.
+
 Ultrafast bootstrap parameters
 ------------------------------
 <div class="hline"></div>
@@ -730,6 +771,7 @@ The first few lines of the output file example.phy.sitelh (printed by `-wslr` op
 [Adachi and Hasegawa, 1996b]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.476.8552
 [Anisimova and Gascuel 2006]: https://doi.org/10.1080/10635150600755453
 [Anisimova et al., 2011]: https://doi.org/10.1093/sysbio/syr041
+[De Maio et al., 2023]: https://doi.org/10.1038/s41588-023-01368-0
 [Felsenstein, 1985]: https://doi.org/10.2307/2408678
 [Flouri et al., 2015]: https://doi.org/10.1093/sysbio/syu084
 [Gadagkar et al., 2005]: https://doi.org/10.1002/jez.b.21026
 
@@ -66,7 +66,7 @@ For IQ-TREE version 1 please use:
 
 Alternatively, if you have `git` installed, you can also clone the source code from GitHub with:
 
-    git clone https://github.com/iqtree/iqtree2.git
+    git clone --recursive https://github.com/iqtree/iqtree2.git
 
 For IQ-TREE version 1 please clone:
 
@@ -108,16 +108,15 @@ Compiling under Linux
 
 This creates an executable `iqtree2` (`iqtree` for version 1). It can be copied to your system search path so that IQ-TREE can be called from the Terminal simply with the command line `iqtree2`.
 
+To compile IQ-TREE under Linux with ARM processor, use either GCC 10 (but not above), or Clang 14 or above.
+
 >**TIP**: The above guide typically compiles IQ-TREE with `gcc`. If you have Clang installed and want to compile with Clang, the compilation will be similar to Mac OS X like below.
 {: .tip}
 
 Compiling under Mac OS X
 ------------------------
 <div class="hline"></div>
 
->**TIP**: A ready made IQ-TREE package is provided by * [Homebrew](https://github.com/brewsci/homebrew-science/blob/master/Formula/iqtree.rb) by simply running `brew install homebrew/science/iqtree2`.
-{: .tip}
-
 * Make sure that Clang compiler is installed, which is typically the case if you installed Xcode and the associated command line tools.
 
 * If you installed cmake with Homebrew 
@@ -130,13 +129,18 @@ The steps to compile IQ-TREE are similar to Linux (see above), except that you n
 
 (please change `cmake` to absolute path like `/Applications/CMake.app/Contents/bin/cmake`).
 
-To compile the multicore version, the default installed Clang unfortunately does not support OpenMP (which might change in the near future). However, the latest Clang 3.7 supports OpenMP, which can be downloaded from <http://clang.llvm.org>. After that you can run CMake with:
+* To compile IQ-TREE under Mac with ARM processor, use Clang 17 or above.
+
+* If the OpenMP include or lib files cannot be found, then you can specify the location of OpenMP include or lib files, for example:
 
-    cmake -DIQTREE_FLAGS=omp -DCMAKE_C_COMPILER=clang-3.7 -DCMAKE_CXX_COMPILER=clang++-3.7 ..
+		export LDFLAGS="-L/opt/homebrew/opt/libomp/lib"
 
-(assuming that `clang-3.7` and `clang++-3.7` points to the installed Clang 3.7).
+    	export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"
 
+    	cmake -DCMAKE_CXX_FLAGS="$LDFLAGS $CPPFLAGS" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ..
 
+(please change the path to the installed location of your OpenMP library)
+    
 Compiling under Windows
 -----------------------
 <div class="hline"></div>
@@ -257,6 +261,19 @@ The compiled `iqtree` binary will automatically choose the proper computational
     IQ-TREE multicore Xeon Phi KNL version 1.6.beta for Linux 64-bit built May  7 2017
 
 
+Compiling IQ-TREE2 lib file
+---------------------------
+<div class="hline"></div>
+
+Starting with version 2.3.3, you can compile and create IQ-TREE2 lib file.
+
+If you want to compile the IQ-TREE2 lib file, simply run:
+
+    cmake -DBUILD_LIB=ON ..
+    make -j4
+
+
+<!--
 Compling with deep learning kernel for ModelFinder 2
 --------------------------------------------------
 
@@ -280,7 +297,7 @@ where 1.11.0 is the version of onnxruntime at the time of writing this document.
 Now you will need to run cmake by additional options:
 
 	cmake -Donnxruntime_INCLUDE_DIRS=/usr/local/Cellar//onnxruntime/1.11.0/include/onnxruntime/core/session/ -Donnxruntime_LIBRARIES=/usr/local/Cellar//onnxruntime/1.11.0/lib/libonnxruntime.dylib ..
-
+-->
 
 About precompiled binaries
 --------------------------
 
@@ -14,8 +14,6 @@ sections:
   url: partition-models
 - name: Mixture models
   url: mixture-models
-- name: Linked GTR exchangeabilities models
-  url: linked-gtr-exchangeabilities-models
 - name: Site-specific frequency models
   url: site-specific-frequency-models
 - name: Heterotachy models
@@ -177,14 +175,15 @@ Options for ModelFinder also work for MixtureFinder, e.g.:
 The `-mset HKY,GTR` means we select subtitution model type among only `HKY` and `GTR` substitution models in each iteration of adding one more class. The `-mrate E,I,G,I+G` means we select the rate heterogeneity across sites models among `+E`, `+I`, `G` and `+I+G` models.
 
 Other options for MixtureFinder:
+
 | Model option   | Description                                                                                                                          |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
 | `-qmax`        | Maximum number of Q-mixture classes (default: 10). Specify a number after the option (e.g., `-qmax 5`).                              |
 | `-mrate-twice` | Whether estimate the rate heterogeneity across sites models again after select the best Q-mixture model. 1: yes, 0: no. (default: 0) |
 
 If you use MixtureFinder in a publication please cite:
 
-> __H. Ren, T.K.F. Wong, B.Q. Minh, R. Lanfear__ (2024) MixtureFinder: Estimating DNA mixture models for phylogenetic analyses. _BioRxiv_. https://doi.org/10.1101/2024.03.20.586035
+> __H. Ren, T.K.F. Wong, B.Q. Minh, R. Lanfear__ (2024) MixtureFinder: Estimating DNA mixture models for phylogenetic analyses. _BioRxiv_. <https://doi.org/10.1101/2024.03.20.586035>
 
 
 
@@ -204,36 +203,6 @@ Sometimes one only wants to model the changes in nucleotide or amino-acid freque
 
 >**NOTE**: The amino-acid order in this file is: A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V.
 
-Linked GTR exchangeabilities models
----------------------------------------
-<div class="hline"></div>
-
-Starting with version 2.3.1, IQ-TREE allows the user to estimate exchangeabilities under profile mixture models.
-
-### Exchangeability estimation
-
-To start with, we show an example:
-
-    iqtree -s <alignment> -m GTR20+C60+G4 --link-exchange-rates -te  <guide_tree> -me 0.99
-
-In this example exchangeabilities will be estimated for a profile mixture model `C60+G4` but any profile mixture model and rates can be used. To estimate a single set of linked exchangeabilities, in the model definition the matrix `GTR20` must be specified (resp. GTR for nucleotide data) together with the flag `--link-exchange-rates`. While a guide tree is not needed, we highly recommend using a fixed tree topology to estimate exchangeabilities.  Since matrix estimation can be time-consuming, we also recommend using the flag `-me 0.99` to reduce the optimization threshold for faster optimization. Simulations have shown that changing this parameter has no significant effect on exchangeability estimation.
-
-The user can determine the starting exchangeabilities before optimization. Choosing adequate exchangeabilities can make estimation considerably faster. For example:
-
-    iqtree -s example.phy -m GTR20+C60+G4 --link-exchange-rates --gtr20-model LG  -te  <guide_tree> -me 0.99
-
-specifies the LG matrix as the starting matrix via the flag `--gtr20-model` (the default starting matrix is POISSON, i.e. equal exchangeabilities). For this flag, the user can specify any matrix, even those matrices defined by the user via the `-mdef` flag. If the user is agnostic of the exchangeabilities, we recommend using the default matrix (although it can be time-consuming).
-
-Note that the user can estimate exchangeabilities jointly with weights of the profiles, branch lengths, and rates. This can be very time-consuming. If the goal is to optimize exchange abilities, one can fix the other parameters to reasonable estimates (for eg. fixing branch lengths  and rates has been shown to perform adequately for estimation of exchangeabilities) 
-
-There is an additional flag `--rates-file` that will produce a nexus file with the exchangeability matrix obtained from the optimization. This file can be later used for phylogenetic inference with the use of the `-mdef` flag.
-
-
-If you use this routine in a publication please cite:
-
-> __H. Banos et al.__ (2024) Estimating Linked Exchangeabilities for Profile Mixture Models. _Bioraxiv.
-
-
 Here, the NEXUS file contains a `models` block to define new models. More explicitly, we define four AA profiles `Fclass1` to `Fclass4`, each containing 20 AA frequencies. Then, the frequency mixture is defined with
 
     FMIX{empirical,Fclass1,Fclass2,Fclass3,Fclass4}
@@ -242,8 +211,7 @@ This means, we have five components: the first corresponds to empirical AA frequ
 
     iqtree -s some_protein.aln -mdef mymodels.nex -m JTT+CF4model+G
 
-The `-mdef` option specifies the NEXUS file containing user-defined models. Here, the `JTT` matrix is applied for all alignment sites and one varies the AA profiles along the alignment. One can use the NEXUS syntax to define all other profile mixture models such as `C10` to `C60`.
-
+The `-mdef` option specifies the NEXUS file containing user-defined models (see below). Here, the `JTT` matrix is applied for all alignment sites and one varies the AA profiles along the alignment. One can use the NEXUS syntax to define all other profile mixture models such as `C10` to `C60`.
 
 ### NEXUS model file
 
 
@@ -164,9 +164,9 @@ So, suppose that in the first step of the analysis you ran the command as above:
 
 That command will have figured out for you the model of evolution, all the parameters of that model, and the branch lengths of the corresponding tree. We can re-use all of that useful information in the final step. It just takes a little bit of effort to find what you need.
 
-First we'll get the model parameters we need. If you take a look at the end of the `concat.log` file you will find a little section called `ALISIM COMMAND`. You can find it like this on mac/linux (or just open the `concat.log` file in a text editor and scroll to the end:
+First we'll get the model parameters we need. If you take a look at the end of the `concat.iqtree` file you will find a little section called `ALISIM COMMAND`. You can find it like this on mac/linux (or just open the `concat.iqtree` file in a text editor and scroll to the end:
 
-	tail concat.log
+	tail concat.iqtree
 
 You should see something like this:
 
@@ -189,7 +189,7 @@ To put all of that together, we are going to change the final command of the tut
 	# compute site concordance factor using likelihood with v2.2.2
 	iqtree2 -te concat.treefile -s ALN_FILE --scfl 100 --prefix concord2
 
-To one of these, where we add the two extra commands via `-blfix` and `-m`, to fix all the parameters we already calculated. A reminder - do NOT use the exact commandlines above. You have to replace everything after the `-m` with what you found in your own `concat.log` file:
+To one of these, where we add the two extra commands via `-blfix` and `-m`, to fix all the parameters we already calculated. A reminder - do NOT use the exact commandlines above. You have to replace everything after the `-m` with what you found in your own `concat.iqtree` file:
 
 	# faster analysis, using pre-computed model parameters, with per-locus alignments
 	# compute site concordance factor using likelihood with v2.2.2