Skip to content

PGB-LIV/PanOryza-pan-genes-release-v1.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PanOryza-pan-genes-release-v1.0

This repository hosts the code for recreating analyses in the PanOryza manuscript. Code for GET_PANGENES available from: https://github.com/Ensembl/plant-scripts/blob/master/pangenes/. The code for Nipponbare merged genes is available from: https://github.com/Ensembl/plant-scripts/tree/master/scripts. The input files( .fasta and .gff format) for running GET_PANGENES are available from zenodo (https://zenodo.org/records/14772953). Else, the output files for Os4530.POR.1 (version 1.0) are also available at the zenodo repository and can be used for various downstream analyses of the pan-genes using the code available here.

To reproduce the entire analyses starting with the GET_PANGENES result, prepare various tables and intermediate files to recreate manuscript figures.

Output of get_pangenes using RPRP (MAGIC-16 accessions) as input gives out the following set of files:

  1. .cluster_list --> parsed in tabular format using function parse_clusters --> output table named as "df_merged"
  2. .matrix_genes.tr.tab --> read directly as table named "pangene_list"
  3. .matrix.tr.tab
  4. Individual clusters inside folder 'oryzasativanipponbaremerged' --> *.cds.faa files of clusters used to calculate and summarise clusters and individual protein lengths. Clusters sequence summary can be created in R using create_cluster_sum. NOTE: There are also several ways to do this using a Linux terminal. The resulting clusters sequence summary can be further parsed into a dataframe using read_parse_clusters_summary

Additional "cluster_merged" named table used at various places, created by combining "pangene_list" and "df_merged"

Interproscan tabular results for magic18 protein sequences were merged with the cluster files above. Recommended to load the workspace core_workspace.RData in R/Rstudio that will also load these Interproscan results for pan-genes (Available at zenodo). Else, core_files.R can be used to read all these files needed for downstream analysis.

To repoduce the figure-wise analysis, please refer to the scripts folder

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages