Hich Reference
Sample file
Normally, the sample file is called “samples.tsv” (tab-delimited). Basic sample attributes are usually specified here. Default sample attributes are customizeable and can be specified on the basis of individual sample ids in nextflow.config, which is useful for specifying defaults for biorep and condition samples produced via merge.
Example 1. Because reference, chromsizes, index_dir, index_prefix, and fragfile files are unspecified and the assembly values are supported, Hich will download the reference and produce these needed files automatically.
|
Example 2. Here, needed reference files are given (possibly from a permanent lab repository), so they will be used rather than produced by Hich. Because there’s just one sample, there is no need to specify a biorep or techrep parameter.
Example 3. Here Hich is ingesting files in several formats, autodetecting the datatype.
Example 4. An experiment using a variety of enzymes for reference digestion and fragment tagging, as well as one sample not tagged or filtered (MNase).
nextflow.config
The nextflow.config file is one way to configure Nextflow, including by setting Hich-specific sample attributes. All sample attributes are described in this section.
Scopes
Hich uses specialized config scopes, specified with a name followed by brackets, to group related sample attributes and general Hich workflow parameters. Here is an example with a subset of the real Hich default nextflow.config file and an extra scope used to specify parameters for a merge.
params {
general {
// The general scope holds params
// relevant to general Hich workflow
// control, not sample attributes.
sampleFile {
// Path to sample file
// and column separator.
filename = "samples.tsv"
sep = "\t"
}
}
defaults {
// The default scope gives default
// sample attributes applied to all
// samples if an explicit value is not
// given in samples.csv or in a scope
// specific to the sample's id.
// Default techrep and biorep labels
// applied to any samples where they
// are not specified in samples.csv
techrep = 1
biorep = 1
// Minimum mapq threshold to keep reads
minMapq = 30
}
ko {
// Apply these parameters to samples
// with the id "KO" and "NT"
ids = ["KO", "NT"]
hicrep {
exclude = ["chrM"]
}
}
}
general
last_step
Specifies the last processing step that should be executed when nextflow run hich.nf is invoked (as a stub, humid or full run). QC for that step will also be completed. Useful for test runs, debugging, and making processing decisions based on QC results. Commented out by default.
params {
general {
//last_step = "align"
sampleFile
The filename and column separator for the sample file. The filename param can contain a path relative to the Nextflow projectDir.
params {
general {
sampleFile {
filename = "samples.tsv"
sep = "\t"
}
publish
Specifies the Nextflow publishDir mode and output directory for the results of various Hich processes.
params {
general {
publish {
// Nextflow publishDir param for all processes
// https://www.nextflow.io/docs/latest/process.html#publishdir
mode = "copy"
// Where to publish results of Hich processes
genome = "resources/.hich"
chromsizes = "resources/.hich"
bwa_mem2_index = "resources/.hich/bwa-mem2/index"
bwa_mem_index = "resources/.hich/bwa-mem/index"
digest = "resources/.hich"
bam = "results/bam"
parse = "results/pairs/parse"
dedup = "results/pairs/dedup"
mcool = "results/matrix/mcool"
hic = "results/matrix/hic"
pairStats = "results/pairStats"
qc = "results/qc"
}
qcAfter
params {
general {
// After these steps, generate read-level pairs
// stats files and generate a combined MultiQC report
// for all samples at each processing stage
qcAfter = ["Parse",
"IngestPairs",
"OptionalFragtag",
"TechrepsToBioreps",
"Deduplicate",
"BiorepsToConditions",
"Select"]
humid
params {
general {
// Number of reads to downsample to
// when doing a humid run
humid {
n_reads = 100000
}
defaults
All sample attributes specified under this scope will be applied to any samples for which a value is not given in the sample file or one of the custom scopes.
custom scopes
Custom scopes work just like the defaults scope, except that they have a special ids list specifying the set of ids to which they should be applied. Custom scopes override the values in the sample file.
Sample attributes
In Hich, a sample is a single unit of data, such as a technical replicate, biological replicate, or experimental condition. Each sample has a number of sample attributes. These can be specified via columns in the sample file, or to a subset of sample ids via the nextflow.config file (or anywhere your Nextflow is configured, including directly at the command line).
Basic
condition
biorep
1)techrep
1)assembly
hg38.fastq1 and fastq2
sambam
pairs
reference
hg38,homo_sapiens,GRCh38mm10dm6galGal5bGalGal5danRer11
chromsizes
minMapq
30datatype
fastqdefault + autodetected if “fastq1” and “fastq2” are specified but “sambam” and “pairs” are not.sambamautodetected if “sambam” is specified but “fastq1”, “fastq2”, and “pairs” are not.pairsautodetected if “pairs” is specified but “fastq1”, “fastq2”, and “sambam” are not.
id
{condition}_{biorep}_{techrep})Alignment
aligner
datatype == fastqbwabwa-mem2default
bwa-mem2 requires a 60-80 Gb memory footprint, whereas indexing with bwa can be done in less than 32 Gb.index_dir
index_prefix
alignerThreads
10bwaFlags
-SP5Mbwa mem or bwa-mem2 mem. The default -SP5M is recommended by 4DN for aligning paired-end Hi-C reads with bwa mem or bwa-mem2 mem. See bwa manual reference page for additional options.Pairs processing
enzymes
fragfile
deduplicate
true default
false
pairsFormat
chrom1
pos1
chrom2
pos2
parseParams
--flip--drop-readid--drop-seq--drop-sam
drop-* parameters are one of the most impactful for making Hich fast and giving it a low disk footprint. It is not recommended to remove these parameters unless you know what you are doing, although additional parameters can be added.pairtoolsDedupParams
pairtoolsSelectParams
selectFilters
keepPairTypes
UU, UR, RUkeepTrans
truedefaultfalse
trans-only option, which normalizes contact matrices based exclusively on trans contacts, which are in some cases thought to yield more biologically representative results.keepCis
truedefaultfalse
minDistFR
1000minDistRF
1000minDistFF
0minDistFF
0chroms
discardSingleFrag
truedefaultfalse
Matrix processing
juicerToolsPreParams
pre command when forming a Hi-C contact matrix.coolerCloadParams
cooler cload command for forming .cool format precursors to the .mcool contact matrix.coolerZoomifyParams
--balance--balance-args 'max-iters 2000 --trans-only'
cooler zoomify command for coarsening high-res .cool matrices into multi-resolution .mcool contact matrices. The chosen defaults will generate multi-res contact matrices containing both the raw contacts and balancing weights produced using the trans contacts only.matrix
makeMcoolFileFormat
true default
false
makeHicFileFormat
true default
false default
resolutions
1000
2000
5000
10000
20000
50000
100000
200000
500000
1000000
Quality control
hicrep
call_on
is_techrep default
is_biorep default
is_condition default
resolutions
10000
100000
1000000
chroms
exclude
chromFilter
chrom variable) and size (the size variable). It will be evaluated using Python’s eval statement.h
h parameter to use.dBPMax
dBPMax parameter to use.bDownSample
bDownSample parameter to use.Feature calling
compartments
resolution
cooltools_eigs_cis_params
–bigwig
insulation
resolution
cooltoolsInsulationParams
–bigwig
loops
call_on
is_techrep default
is_biorep default
is_condition default
use_format
mcool default
hic
mustache_params
--resolution 5000--pThreshold .1--sparsityThreshold .88