Skip to content

Commit 73053d3

Browse files
committed
Update README
1 parent cb6d61b commit 73053d3

1 file changed

Lines changed: 15 additions & 15 deletions

File tree

README.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Dependencies:
77
`blastn` https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
88
`cutadapt` http://cutadapt.readthedocs.io/en/stable/
99
`R` https://www.r-project.org/
10+
`parallel` https://www.gnu.org/software/parallel/
1011

1112
To make obitools available everywhere, add both the obitools binary and the obitools `/export/bin` folder to your path.
1213

@@ -125,19 +126,22 @@ Do the rest separately for your two amplicon types.
125126
As above, use obiannotate to annotate your sequences.
126127
`01_scripts/obiannotate_ident.sh`
127128

129+
After completing this step, one can use the following scripts to account for reads from each amplicon type:
130+
`01_scripts/account_reads_annot.sh`
131+
`02_scripts/account_reads_annot.R`
132+
128133
Move on to [Part 2](#part-2-main-analysis).
129134

130135
## Part 2. Main Analysis
131136
![](00_archive/eDNA_metabarcoding_workflow_2.png)
132137

133138
### 2.1. Retain Only Unique Reads
134-
Input is a single fastq file containing all samples for a specific amplicon, annotated with sample name.
139+
Input is a single fastq file containing all samples for a specific amplicon, annotated in the fastq accession with sample name.
135140

136-
Use obiuniq to keep one record per unique amplicon in the fastq (outputs fasta).
141+
Use obiuniq to keep one record per unique amplicon, retaining the count number (outputs fasta):
137142
For paired-end data: `./01_scripts/03_retain_unique_PE.sh`
138143
For single-end data: `./01_scripts/03_retain_unique_SE.sh`
139-
(in brief: `obiuniq -m sample 04_samples/*assi.fq > 04_samples/*_uniq.fa`)
140-
Note: one can also add other -m flags, such as `run`, or `pcr_rep`, etc., anything that you may want to summarize over using obitab later.
144+
Note: one can also edit this script to add other -m flags, such as `run`, or `pcr_rep`, etc., anything that you may want to summarize over using obitab later.
141145

142146
Audit: sum up the count value to make sure all reads are accounted for:
143147
`grep -E '^>' 04_samples/your_file_ali_assi_uniq.fa | awk -F'\ count=' '{ print $2 }' - | awk -F';' '{ print $1 }' | paste -sd+ - | bc`
@@ -154,32 +158,23 @@ PE data does full filtering as above:
154158
SE data filters only on size and count:
155159
`./01_scripts/04_denoise_and_remove_err_SE.sh` (edit LMIN, LMAX, MIN_READS)
156160

157-
In brief, this does the following:
158-
```
159-
obigrep --lmin <min> --lmax <max> -p 'count>= <min.reads>' input.fa > output.fa
160-
obiclean -s merged sample -r 0.05 output_obigrep.fa > output_obiclean.fa
161-
obigrep -a 'obiclean_status:s|h' output_obiclean.fa > output_all.fa
162-
```
163161

164162
Audit: determine how many reads make it through each filtering steps:
165163
Reads in the fasta after size selecting/count filter:
166164
`grep -E '^>' 04_samples/*_ali_assi_uniq_c10_55-75.fa | awk -F"\ count=" '{ print $2 }' - | awk -F";" '{ print $1 }' - | paste -sd+ - | bc`
167-
Reads in the fasta after only keeping head and singletons:
168-
`grep -E '^>' all_files_ali_assi_uniq_c10_55-75_clean_HS.fa | awk -F"; count=" '{ print $2 }' - | awk -F";" '{ print $1 }' - | paste -sd+ - | bc`
169165

170166
Note: can also test other lengths by streaming into grep:
171167
`obigrep --lmin 55 --lmax 75 -p 'count>=5' 04_samples/yourfile.fa | grep -cE '^>' - `
172168

173169
### 2.3. Export data
174170
Use obitab to output a tab-delimited text file that will be used as an input to the R Script below.
175171
`./01_scripts/05_obitab_export.sh`
176-
(In brief: `obitab --output-seq 04_samples/*clean_HS.fa > 04_samples/*.txt`)
177172

178173
### 2.4. Assign each sequence to a taxon
179174
Use a blastn to align the H and S fasta file against nt (NCBI remote).
180175
`blastn -db nt -query 04_samples/your_cleaned_HS.fa -out 05_annotated/your_lib_output.txt -remote -num_descriptions 10 -num_alignments 10`
181176

182-
Or if running a massive blast, use parallel:
177+
Or if running a large set of queries, use parallel:
183178
`SEQUENCE_FILE="04_samples/your_filtered_fasta.fa" ; OUTPUT="05_annotated/your_filtered_fasta_hits.txt" ; cat $SEQUENCE_FILE | parallel -j 12 -k --block 1k --recstart '>' --pipe 'blastn -db /home/ben/blastplus_databases/nt -query - -num_descriptions 10 -num_alignments 10 ' > $OUTPUT`
184179

185180
Track output:
@@ -207,8 +202,13 @@ This will use the R script `read_counts_to_annotations.R`, run interactively.
207202
Necessary inputs:
208203
Amplicon annotation output from MEGAN, and amplicon read count from `obitab`
209204

205+
This script is currently highly customized to the two projects using this analysis. To change it to fit your data, you would have to provide the appropriate filenames, datatypes, locations of samples. This script would be more considered to be a template to modify for the user for other studies.
206+
210207
In brief, this will merge these two inputs, attach locations, aggregate different amplicons with same annotation, calculate proportions, save out proportion plots and count/proportion tables.
211208

212209
Within here, one can apply a low expression filter to remove any counts less than 10.
210+
Output will be saved to `06_output_figures` (for figures) and `05_annotated` (for counts/proportions).
211+
212+
There is also a second script that is more customized, specifically tailored to the HABs project to deal with the Variant pipeline types `01_scripts/read_counts_to_annotations_HABs.R`. This script is not currently for broader use, and is only used for this project.
213213

214-
(note: currently working on improving this script to be more universal. See a larger version on `read_counts_to_annotations_HABs.R`)
214+
These Rscripts may be continually developed if there is interest in increasing the generality of this pipeline. Please contact the author for more information or comments.

0 commit comments

Comments
 (0)