Config files¶

There are three types of config files, General use, Module specific and System parameters.

General use: Are those that are used by the main script to feed specific locations or general configuration parameters.
Module specific: Are configurations that are used by the module. The names of these config files start with the wbench prefix.
System parameters: These configs hold the values of colors and others miscellaneous variables for ease of access.

General use¶

These two config files should be properly configured, to ensure the program runs. The install script will fill out all variables.

software_dirs.cfg:

#Path were install script will install software
SOFTWARE=
#Path to workbench http://srna-workbench.cmp.uea.ac.uk/
WBENCH_DIR=
#Path to java use 1.7 or greater
JAVA_DIR=
#Number of times program has been run
RUN=0

The SOFTWARE variable is the path to the directory were the install script will install all necessary dependancies. WBENCH_DIR

workdirs.cfg:

#LINES THE SWITH # ARE INFORMATIONAL ONLY
#Workdir is the path to the directory where this program will run data
#workdir must end with trailing "/"
workdir=${HOME}/miRPursuit_Projects/miRtest/
#Path to the mirbase database. Go to http://www.mirbase.org or download latest from: ftp://mirbase.org/pub/mirbase/CURRENT/
MIRBASE=${source_data}/mirbase/mature.fa
#Used by java
MEMORY="4g"
#Set this to the max number of processed that can be used
THREADS=2
#Path to the directory where input data is located
#Test directory in [pathToMiRPursuit]/testDataset
INSERTS_DIR=${SOURCE_DATA}/sRNA/
#Path to the genome to be used
#Test genome can be found here [pathToMiRPursuit]/testDataset/Genome/Arabidopsis_thaliana.TAIR10.dna_rm.chromosome.4.fa
GENOME=${SOURCE_DATA}/genomes/my_genome.fa
#Path to the genome to be used by mircat. Leave this, as ${GENOME} if no memory resctrictions apply to your case. Check manual on using parts
GENOME_MIRCAT=${GENOME/.fa/part-1.fa}
#The suffix of the filter to be used. Check /config/workbench_filter_*.cfg
FILTER_SUF=18_26_5
#Adaptor trimming
#You must set the --trim flag
ADAPTOR="TGGAATTCTCGGGTGCCAAGG"
#Deprecated - Soon removed
LCSCIENCE_LIB=
#These var are only used for target prediction (PAREsnip)
TRANSCRIPTOME=
DEGRADOME=

Module specific¶

There is a config file for each module in the miRPursuit/config directory. The default values are posted, for further reference, please consult the website of the respective tool.

wbench_filter.cfg - Filter your sRNA sequences. Length, abundance, T/R RNA:

#Broad range default values
min_length=18
max_length=26
min_abundance=5
max_abundance=2147483647
norm_abundance=false
filter_low_comp=true
filter_invalid=true
trrna=true
trrna_sense_only=false
filter_genome_hits=false
filter_norm_abund=false
filter_kill_list=false
add_discard_log=false
genome=null
kill_list=null
discard_log=null

wbench_mircat.cfg - miRCat predict novel miRNAs through alignment with genome to find putative precursors:

#Default values (Broad)
extend=100.0
min_energy=-25.0
min_paired=17
max_gaps=3
max_genome_hits=16
min_length=18
max_length=26
min_gc=20
max_unpaired=60
max_overlap_percentage=80
min_locus_size=1
orientation=80
min_hairpin_len=60
complex_loops=true
pval=0.05
min_abundance=1
cluster_sentinel=200
Thread_Count=12



#Default (plants)
extend=100.0
min_energy=-25.0
min_paired=17
max_gaps=3
max_genome_hits=16
min_length=20
max_length=22
min_gc=20
max_unpaired=50
max_overlap_percentage=80
min_locus_size=1
orientation=80
min_hairpin_len=60
complex_loops=true
pval=0.05
min_abundance=1
cluster_sentinel=200
Thread_Count=20

wbench_mirprof.cfg - miRProf identifies conserved miRNA, through alignment to the miRBase database of miRNA:

#Default values
mismatches=0
overhangs=true
group_mismatches=true
group_organisms=true
group_variant=true
group_mature_and_star=false
only_keep_best=true
min_length=18
max_length=26
min_abundance=5

wbench_tasi.cfg - ta-si predictor, identifies phased 21nt sRNAs characteristic of ta-siRNA loci:
```
#Default values
p_val_threshold=1.0E-4
min_abundance=2
```

paresnip.cfg - PAREsnip validates targets of regulation by sRNAs requires degradome and a transcriptome sequences:

  #Default values
min_sRNA_abundance=5
subsequences_are_secondary_hits=false
output_secondary_hits_to_file=false
use_weighted_fragments_abundance=true
category_0=true
category_1=true
category_2=true
category_3=true
category_4=false
discard_tr_rna=true
discard_low_complexity_srnas=false
discard_low_complexity_candidates=false
min_fragment_length=20
max_fragment_length=21
min_sRNA_length=19
max_sRNA_length=24
allow_single_nt_gap=false
allow_mismatch_position_11=false
allow_adjacent_mismatches=false
max_mismatches=4.0
calculate_pvalues=true
number_of_shuffles=100
pvalue_cutoff=0.05
do_not_include_if_greater_than_cutoff=true
number_of_threads=23
auto_output_tplot_pdf=false

patman_genome.cfg - Patman a pattern matcher for short sequences:

#Default values
#Set maximum edit distance to N (Default: 0)
EDITS=0
#Set maximum number of gaps to N (default: 0)
GAPS=0
#Do not match reverse-complements (default: FALSE)
SINGLESTRAND=FALSE
#Prefetch N nodes (default: 3) Related with performance
PREFETCH=3
#################
#Not implemented#
#################
#Interpret ambiguity codes in patterns (Flag for using ambicodes)
#ambicodes=FALSE

System parameters¶

These are generally hardcoded, don’t change these unless you know what you are doing.

term-colors.cfg - Colors for terminal and other useful vars.