Configuration
We will distinguish between Nextflow config options (NF-config), Hich config options (Hich-config), and sample attributes. NF-config controls the Nextflow workflow management system (the platform on which Hich runs) and are listed here. Hich-config controls the way Hich processes individual datasets and builds samples, which are bundles of sample attributes including paths to data files containing data used for processing (i.e. fastq files, aligner index files, contact matrices, etc.) as well as directives for how they should be processed. (i.e. minimum MAPQ filters or how loops should be called).
To set up a run, you will typically use three configuration files. As sample attributes can be specified at the command line, none are required. A maximum of one configuration file of each type can be used.
To specify sample attributes to individual samples, the easiest way is with a tab-separated value (TSV) sample file , specified with the
--sample-fileoption. Examples in thevignettesdirectory.To specify Hich-config offering more powerful control over how Hich assigns attributes to samples, such as global default values or complex parameter exploration plans, you can use a YAML params file, specified with the
-params-fileoption. Examples in theparamsdirectory.To define NF-config, such as computational resource management profiles and control how outputs are published, as well as a few rarely modified Hich-config options, a Nextflow configuration syntax Nextflow config file, typically named
nextflow.configor specified at the command line with the-coption. Example in thehichdirectory.
Hich also allows many configuration options to be specified at the command line, using options like --defaults.aligner bwa-mem2.
The Nextflow training page on configuration options page describes how conflicts are resolved.
Note
NF-config uses a - prefix, as for options like -params-file and -c. Hich-config uses a double -- prefix, as for options like --sample-file.
Sample file
The sample file is typically a tab-separated value (TSV) file that describes sample-specific attributes.
--sampleFileThe path to the sample file.
Formatting requirements:
Delimiter matches
--sampleFileSepparameterHas headers for every column with content. No content-free headerless columns between columns with content.
Each row corresponds to one sample.
Each column corresponds to one sample attribute, with the header name matching the name expected by Hich for that sample attribute.
Note
You can choose a different delimiter by setting the --sampleFileSep parameter, like --sampleFileSep "," to use a comma-separated value (CSV) format for the sample file. This is specified in nextflow.config by default, and can be modified there or overridden at the command line.
Params file
The params file is in YAML syntax and offers powerful ways to specify sample attributes and parameterize the workflow:
Attribute values applied to all samples, or a subset of samples, by default, by specifying values of sample attributes in its
defaultsblockComplex parameterizations for individual workflow steps
Relationships between samples to control downsampling, merging, and deduplication
Examples of premade params files can be found in the hich/params directory.
Nextflow config file
The Nextflow config file is in Nextflow configuration syntax and is typically named nextflow.config. It offers exactly the same capabilities as the params file. The reason for using both is for convenience: infrequently changed config options that are shipped with Hich are in nextflow.config, while config options more likely to be adjusted to suit individual runs are placed in params files. The nextflow.config file included with Hich includes a number of resource management profiles in the profiles section, as well as more general parameters in the params and params.general sections, including which containers are used, where outputs are published, when to generate multiQC reports, how many reads to keep for humid runs, and what separator delimits columns in the sample file.
Set samples at the command line
The following allow including fastq samples, and even sample-specific attributes encoded in the fastq filename, directly from the command line. Samples declared this way will be used in addition to the sample file, if specified.
--fastqInterleavedInterleaved fastq files (i.e. r1 followed by r2 in the same file). Example:
--fastqInterleaved fq/*.fq.gz--fastqPairsPaired fastq files (i.e. an r1 and an r2 file). Filenames are parsed using Nextflow’s fromFilePairs syntax. Example:
--fastqPairs fq/*.r{1,2}.fq.gz--samplesFromSRAURL to paired fastq files hosted on SRA.
--paramsFromPathUses a similar syntax to Python’s parse library to extract sample attribute values from filenames using a parsing pattern. Note that if using
--fastqPairs, only the first file will be parsed (i.e. r1 if that was specified first between brackets, as in*.r{1,2}.fq.gz). Example:--fastqPairs fq/*.r{1,2}.fq.gz --paramsFromPath {condition}_{biorep}_{techrep}.r1.fq.gz--samplesInterprets filename to read in fastq (
".fastq", ".fq"), sam/bam (".sam", ".bam"), pairs (".pairs"), mcool (".mcool"), or hic (".hic") files. The extension can be included in the filename ("*.fq.gz") or be at the end of the filename (*.fq). Example:--samples data/*.pairs.gz
Set Hich-config at the command line
Hich-config, such as that specified in the params file or nextflow.config, can also be specified directly via the command line. This will override that option’s specification in the params file or nextflow.config file. Example: nextflow run contents/main.nf --defaults.minMapq 10 --general.publish.mode copy
Hich-config reference
Typically specified in params file
defaultsHashmap. Required. Sample attributes specified in the
defaultsblock are applied to all samples by default, unless overridden on for individual samples, (i.e. in the sample file). YAML example:
defaults:
minMapq: 30
aligner: "bwa-mem2"
bwaFlags: ["-SP5M"]
pairtoolsSelectFilters:
keepPairTypes: ["UU", "UR", "RU"]
keepTrans: true
keepCis: true
aggregateHashmap. Optional. Aggregate profiles are declared here and used control whether and how samples will be downsampled, merged and deduplicated. The keys of the aggregate hashmap are profile names. Values are hashmaps defining how that profile behaves. YAML example:
aggregate:
profile1:
dedupMaxMismatch: 3
dedupMethod: "max"
techrepDedup: true
profile2:
mergeTechrepToBiorep: true
sampleSelectionStrategiesHashmap. Optional. Sample selection strategies are declared here and used to control the samples that are used as inputs into various feature calling methods. The keys of the
sampleSelectionStrategieshashmap are strategy names. Values are a hashmap of sample attributes and lists of acceptable values of those attributes, which can be given as a single acceptable value or as a list of acceptable values. YAML example:
sampleSelectionStrategies:
strategy1:
condition: ["c1", "c2"]
biorep: "b1"
aggregateProfile: "profile1"
strategy2:
condition: "c1"
aggregateProfile: "profile2"
Analysis methods and analysis plans
An analysis plan is a way of parameterizing an analysis method, such as calling HiCRep SCC scores, loops, or compartment scores. They consist of parameter values as well as a sample selection strategy that determines the samples on which the analysis plan will be run. For an analysis plan, its sample selection strategy may be left out, or specified as a single strategy or list of them. If it is a list, the sample must satisfy all the strategies to be included in the analysis plan. If no sample selection strategy is given, all samples will be used as inputs to that analysis plan. Hich manages conflicts by replacing earlier settings with those from more recent strategies, so sampleSelectionStrategy: ["strategy1", "strategy2"] replaces any attribute requirements in strategy1 that also appear in strategy2 with the required values in strategy2.
hicrepHashmap. Optional. Analysis plans for calling HiCRep SCC scores are specified in this block. Keys are names of analysis plans for hicrep. Values are analysis plans themselves. YAML example:
hicrep:
sccPlan1:
sampleSelectionStrategy: "strategy1"
h: 1
resolutions: [2000, 10000]
bDownSample: true
sccPlan2:
sampleSelectionStrategy: ["strategy1", "strategy2"]
h: 2
resolutions: 20000
This defines two analysis plans. The first uses samples conforming to a sample selection strategy named “strategy1”. Using the example provided in the reference for sample selection strategies above, samples are required to have a condition label that is either “c1” or “c2”, a biological replicate label “b1”, the aggregateProfile label “profile1.” The second requires that both “strategy1” and “strategy2” match the samples used as input. Using the example, only samples with the condition “c1”, biological replicate label “b1”, and aggregateProfile label “profile1” will be chosen. All combinations of the other parameters will be used, so sccPlan1 will run on (h: 1, resolution: 2000, bDownSample: true) and (h: 1, resolution: 10000, bDownSample: true).
loopsHashmap. Optional. Analysis plans for calling loops are specified in this block. Keys are names of analysis plans for hicrep. Values are analysis plans themselves. YAML example:
loops:
loopPlan1:
sampleSelectionStrategy: "strategy1"
mustacheParams: ["-r 2000", "-ch chr1"]
Analysis plans simply pass arguments in mustacheParams directly to Mustache, so the parameters specified there should be used for these analysis plans. Do not use the -f or -o arguments as these are hardcoded into Hich.
differentialLoopsHashmap. Optional. Analysis plans for calling differential loop enrichments (diffloops) are specified in this block. Keys are names of analysis plans for hicrep. Values are analysis plans themselves. YAML example:
differentialLoops:
diffloopPlan1:
mustacheParams: ["-r 2000"]
Analysis plans simply pass arguments in mustacheParams directly to Mustache diffloops, so the parameters specified there should be used for these analysis plans. Do not use the -f1, -f2, or -o arguments as these are hardcoded into Hich.
compartmentsHashmap. Optional. Analysis plans for calling compartment scores are specified in this block. Keys are names of analysis plans for hicrep. Values are analysis plans themselves. The
resolutionvalue must be specified. YAML example:
compartments:
compartmentPlan1:
resolution: 2000
hichCompartmentsParams: ["--chroms chr1,chr2,chr3"]
Analysis plans simply pass arguments in hichCompartmentsParams directly to the Hich CLI utilities hich compartments method, so the parameters specified there should be used for these analysis plans. Do not pass --n_eigs as this is hardcoded into Hich (compartment scores based on the first 3 eigenvalues will be generated).
insulationHashmap. Optional. Analysis plans for calling insulation scores are specified in this block. Keys are names of analysis plans for hicrep. Values are analysis plans themselves. YAML example:
insulation:
insulationPlan1:
resolution: 2000
cooltoolsInsulationParams: ["--threshold 1"]
window: 1000
Analysis plans simply pass arguments in cooltoolsInsulationParams will be passed directly to cooltools insulation, so the parameters specified there should be used for these analysis plans. Do not pass --output as this is hardcoded into Hich.
Typically specified in nextflow.config
sampleFileSepSingle-character string. Required to parse sample file. Column separator. Use “t” for tab (TSV files) or “,” for comma (CSV files). Other settings can be used as well.
humidBoolean. Required. If true, then ingested, gzipped fastq files will be downsampled to the number of reads specified in
general.humidDefault.generalHashmap. Required. Contains additional parameters.
general.humidDefaultList of strings. Required. Specified in provided nextflow.config. For gzipped fastq files, the number of reads to downsample to for a “humid” run.
general.hichContainerString. Required. Specified in provided nextflow.config. Location of Hich CLI utilities Docker container.
general.chromsizesContainerString. Required. Specified in provided nextflow.config. Location of ucsc-fasize Docker container used to produce chromsizes file from genome reference.
general.mustacheContainerString. Required. Specified in provided nextflow.config. Location of Docker container used to call Mustache loops and differential loops.
general.juicerContainerString. Required. Specified in provided nextflow.config. Location of Docker container used to call Juicer tools to produce .hic contact matrix.
general.hictkContainerString. Required. Specified in provided nextflow.config. Location of Docker container used to convert between .mcool and .hic formats.
general.qcAfterList of strings. Required. Specified in provided nextflow.config. Processes after which multiQC reports should be generated.
general.publishHashmap. Required. Contains additional parameters used to control where outputs are published.
general.publish.modeString. Required. Mode used to publish outputs. See Nextflow publishDir documentation for options
general.publish.genomeReferenceString. Required if downloading genome reference. Target directory where downloaded genome references will be published.
general.publish.chromsizesString. Required if auto-producing chromsizes. Target directory where chromsizes file will be published.
general.publish.bwaMem2IndexString. Required if auto-producing bwa-mem2 aligner index. Target directory where bwa-mem2 aligner index will be published.
general.publish.bwaIndexString. Required if auto-producing bwa aligner index. Target directory where bwa aligner index will be published.
general.publish.bsboltIndexString. Required if auto-producing bsbolt aligner index. Target directory where bsbolt aligner index will be published.
general.publish.fragmentIndexString. Required if auto-producing restriction digest fragment index. Target directory where fragment index will be published.
general.publish.alignString. Required if aligning .fastq files. Target directory where sam/bam files will be published.
general.publish.parseString. Required if parsing sam/bam files to .pairs format. Target directory where resulting .pairs files will be published.
general.publish.dedupString. Required if deduplicating .pairs format files. Target directory where resulting .pairs files will be published.
general.publish.mcoolString. Required if generating .mcool files either from .pairs files or by converting from .hic format. Target directory where resulting .mcool files will be published.
general.publish.hicString. Required if generating .hic files either from .pairs files or by converting from .mcool format. Target directory where resulting .hic files will be published.
general.publish.pairStatsString. Required if generating pairtools stats files. Target directory where resulting stats files will be published.
general.publish.qcString. Required if generating multiQC reports. Target directory where resulting reports will be published.
Sample attributes reference
Relationships between samples
idString, required, must be different for each sample. Used as filename prefix for output files. Autogenerated if not specified.
conditionString, optional. Labels basal tier in the experimental design hierarchy.
biorepString, optional. Labels secondary tier in experimental design hierarchy.
techrepString, optional. Labels tertiary tier in experimental design hierarchy.
aggregateProfileNameString. Optional. Not typically specified by user, built automatically by Hich based on
aggregateHich-config block. Labels the aggregate profile applied to the sample.
Note
id is typically built algorithmically by concatenating techrep, biorep, and condition, as well as the aggregateProfileName, but can be manually specified.
For example, the following sample file (id unspecified) and aggregate profile block would result in the id values c1_b1_t1_profile1, c1_b1_t1_profile2, c1_b2_t1_profile1, and c1_b2_t1_profile2.
techrep biorep condition
t1 b1 c1
t1 b2 c1
aggregate:
profile1:
mergeTechrepToBiorep: true
dedupMaxMismatch: 3
techrepDedupMethod: "sum"
profile2:
mergeTechrepToBiorep: true
dedupMaxMismatch: 0
techrepDedupMethod: "max"
These attributes are used by aggregate profiles and sample selection strategies to control how samples are downsampled, merged, and deduplicated, as well as how features are called.
Resource files
assemblyString. Required. Genome assembly label. May be used to download genome reference if unspecified. Example: “hg38”.
genomeReferenceString. Required for building fragment index, chromsizes, and aligner index. Can be downloaded by Hich automatically for common genomes if
assemblyis included. Path or URL to genome reference fasta file. IfgenomeReferenceis unspecified butassemblyis one of the supported options, Hich downloads the genome reference from the ENCODE project or NCBI. If multiple samples will use the downloaded reference, it will only be downloaded once and shared by all the samples that need it. Supported options for automatically downloading genome reference:hg38,homo_sapiens, orGRCh38(GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz),mm10(mm10_no_alt_analysis_set_ENCODE.fasta.gz),dm6(GCF_000001215.4_Release_6_plus_ISO1_MT_genomic.fna.gz),galGal5orbgalGal5(GCA_027408225.1_bGalGal5.pri_genomic.fna.gz),danRer11(GCF_000002035.6_GRCz11_genomic.fna.gz), orce10(GCF_000002985.6_WBcel235_genomic.fna.gz).indexDirString. Required for alignment, optional otherwise. Path to directory containing aligner index.
indexPrefixString. Required for alignment, optional otherwise. Prefix of all aligner index files (it is required that all aligner index files share a common prefix). Example: if the aligner index files are
hg38.*, thenindexPrefixshould behg38.chromsizesString. Required to build fragment index, parse sam/bam to pairs or ingest pairs, or build contact matrix. Path to tab-delimited headerless file with contig names in first column, length of contig in base pairs as second column. Automatically created based on
genomeReferenceif unspecified and shared among samples with a common reference that all leftgenomeReferenceunspecified.restrictionEnzymesString. Required to build fragment index, but fragment index is optional. Space-delimited list of restriction enzyme names used in restriction digest for the sample. Any combination of enzymes in the REBASE database as accessed via biopython’s restriction enzymes module can be used (i.e.
DpnII DdeI), as well asArima,Phase Proximo 2021+ Plant(orPhase Plant),Phase Proximo 2021+ Animal(orPhase Animal),Phase Proximo 2021+ Microbiome(orPhase Microbiome),Phase Proximo 2021+ Human(orPhase Human) orPhase Proximo 2021+ Fungal(orPhase Fungal).fragmentIndexString. Optional. Path to BED file containing start and end positions of restriction fragments for the digest used for the sample. If the
restrictionEnzymesoption is specified butfragmentIndexis not, then Hich will create afragmentIndexfile based on therestrictionEnzymesandgenomeReferenceand share it among samples with the same reference and enzymes.
Aligning reads
Hich toolkit: bwa mem, bwa-mem2, bsbolt
See also: Resource files under assembly, genomeReference, indexDir, indexPrefix.
fastq1String. Required for alignment. Path to R1 or single-end read fastq file (may be gzipped).
fastq2String. Required for alignment if samples are paired-end and non-interleaved. Path to R2 fastq file (may be gzipped). Leave blank or unspecified if using single-end reads.
alignerString. Required for alignment. Aligner to use for aligning the sample. Options:
bwa(slower, lower memory footprint),bwa-mem2(fast, higher memory footprint),bsbolt(methyl Hi-C)bwaFlagsList of strings. Required for alignment. CLI options passed to aligner (note that all aligners including BSBolt are based on
bwa mem). Typically, use-SP5M. Do not use thebwaoption-tor the BSBolt options-OT,-O,-DB,-F1, or-F2as these are hardcoded by Hich based on other sample attributes. Example:bwaFlags: ["-S", "-P", "-5", "-M"]
Filters
minMapqInteger. Optional. Reads below this MAPQ cutoff will be discarded. Note that different aligners approximate MAPQ differently. The approach used by
bwais what’s relevant for Hich.
Hi-C contacts ingested from .pairs or parsed from .sam/.bam
Hich toolkit: samtools, pairtools
sambamString. Optional. Specified by user for .sam/.bam files to ingest as inputs into Hich rather than being built via alignment from .fastq data. Path to .sam/.bam file containing aligned reads which will be parsed using
pairtools parse2to obtain a 4DN .pairs file.pairsString. Optional. Specified by user for 4DN .pairs files (may be gzipped) to ingest as inputs into Hich rather than being built via alignment from .fastq data.
pairtoolsParse2ParamsList of flags passed to pairtools parse2. Uses
minMapqif specified for the sample as a default value for thepairtools parse2 --min-mapqoption, but this can be overridden by passing--min-mapqnpairtoolsParse2Params. Hardcoded options that should not be provided here:--flip,--assembly,--chroms-path.
Note
Hich will inspect .sam/.bam files to determine if they are sorted, and sort them automatically by name (required for inputs to pairtools parse) only if necessary. It will then sort the output by position.
Note
pairtools parse2 has a --drop-readid parameter, which can drastically shrink the disk space required for the .pairs file. This is useful, but for single cell data (see below), it was challenging to engineer a way to drop this column when it’s necessary to extract the cellID column value from the readID column of the .sam/.bam file used as input to parsing the .pairs file. For this reason, the --drop-readid parameter is not actually passed to pairtools parse2. Instead, --placeholder readID . is passed to hich reshape, which accomplishes the same result while permitting cellID to be extracted from the readID column if necessary.
Note
These attributes can be ignored for bulk data. For single cell-aware fragment filtering, deduplication and to maintain cell ID for future analysis, Hich must put a unique identifier for the cell attributed to each contact in the .pairs file into a column labeled cellID in the .pairs file. This identifier can be extracted by Hich automatically from the read ID or from a .sam/.bam tag using the sample attributes in this section using the Hich CLI command hich reshape.
cellBarcodeFieldRequired if parsing cell ID from a .sam/.bam file. Should be either
readIDthe name of a .sam/.bam tag. This field will be parsed for each read in the .sam/.bam file in order to extract the value of thecellID. The patterns used to accomplish this extraction are specified below.cellBarcodeRegexPatternOptional. Should be a Python regex compatible with re (regexes can be tested at regex101.com). Along with
cellBarcodeGroup, the regex will be applied to parse the field specified incellBarcodeFieldand the match will be put into thecellIDfield of the .pairs file. OverridescellBarcodeParsePatternif both are specified.cellBarcodeGroupOptional. An integer specifying which match group from the regex specified by
cellBarcodeRegexPatternshould be used as the value ofcellID. 0 uses all match groups. Defaults to 0 ifcellBarcodeFieldandcellBarcodeRegexPatternare specified andcellBarcodeGroupis not.cellBarcodeParsePatternOptional. An alternative and potentially simpler way to parse
cellBarcodeFieldby using Python’s parse library syntax. From the pattern specified the{cellID}named part will be extracted and put into thecellIDcolumn in the .pairs file. Example:{}:{cellID}will extract the part after a colon (:) and put it into thecellIDcolumn.globalDefaultReshapeToCellIDOptional. Must be specified in the params file or nextflow.config. If
cellBarcodeFieldis specified for a sample but eithercellBarcodeRegexPatternnorcellBarcodeParsePatternis specified, thenglobalDefaultReshapeToCellIDis used to determine how thecellIDcolumn will be parsed. Ignored ifcellBarcodeRegexPatternorcellBarcodeParsePatternis given for the sample.globalDefaultReshapeToCellID.optionOptional. Either
--regexor--parse, which determines whetherglobalDefaultReshapeToCellID.pattern(below) will be parsed using Python’srelibrary or itsparselibrary (see above options for details).globalDefaultReshapeToCellID.patternOptional. Interpreted either a Python
reregex or Pythonparsepattern depending on the value ofglobalDefaultReshapeToCellID.option.globalDefaultReshapeToCellID.groupOptional. The match group to use for the regex. Ignored if unspecified, and should be left unspecified if using
parse.reshapeParamsOptional additional params passed to
hich reshape.
Filtering Hi-C contacts
Hich toolkit: pairtools
See also: Resource files under restrictionEnzymes, fragmentIndex
selectFiltersA multi-attribute of filters to apply to Hi-C contacts in .pairs files.
selectFilters.keepPairTypesList of strings. Optional. Pairtools pair types to keep. Keeping
UU,UR, andRUis recommended.selectFilters.keepTransBoolean. Optional. If false, discards reads mapping to different chromosomes/contigs. If unspecified, these contacts will be kept.
selectFilters.keepCisBoolean. Optional. If false, discards reads mapping to the same chromosome/contig. If unspecified, these contacts will be kept.
selectFilters.minDistFRInteger. Optional. If specified, then for reads with the orientation FR, discards if they are below this distance between
pos1andpos2.selectFilters.minDistRFInteger. Optional. If specified, then for reads with the orientation RF, discards if they are below this distance between
pos1andpos2.selectFilters.minDistFFInteger. Optional. If specified, then for reads with the orientation FF, discards if they are below this distance between
pos1andpos2.selectFilters.minDistRRInteger. Optional. If specified, then for reads with the orientation RR, discards if they are below this distance between
pos1andpos2.
Note
Two technical artifacts that routinely appear in Hi-C experiments enriched in short-range contacts are undigested chromatin and self-ligated strands. These will appear in the multiQC reports generated by Hich as a strong enrichment in the FR and RF orientations below a certain distance threshold. By pausing the Hich run after parsing to pairs and inspecting this report, the minDist values can be chosen appropriately according to the QC data. Data with no strand bias should have very close to 25% of each orientation.
selectFilters.discardSingleFragBoolean. Optional. Discard contacts where both ends map to the same restriction fragment as these likely originate from undigested chromatin. Requires that samples have been tagged with this information, which Hich will do automatically if
fragmentIndexis specified.pairtoolsSelectParamsList of strings. Optional. Additional parameters to pass to
pairtools select. The following options are hardcoded in Hich and should not be specified here:--output-rest,--output,--nproc-in,--nproc-out.
Downsampling, merging, and deduplicating samples
aggregateA single Hich-config block, typically declared in the YAML params file. Each sub-entry is the name of an aggregate profile. The aggregate profile defines how samples are to be downsampled, merged and deduplicated. Samples with no aggregate profile are cloned and tagged as belonging to that aggregate profile, and the aggregate profile is incorporated into the cloned sample’s id and therefore becomes part of the filename.
Deduplication
techrepDedupIf true, techrep-level samples in the aggregate profile will be deduplicated after merging them into a biorep-level sample.
techrepDedupMethodControls the
--methodparameter for pairtools dedup when calling duplicates on technical replicate samples.biorepDedupIf true, biorep-level samples in the aggregate profile will be deduplicated after merging them into a condition-level sample.
biorepDedupMethodControls the
--methodparameter for pairtools dedup when calling duplicates on biological replicate samples.conditionDedupIf true, condition-level samples in the aggregate profile will be deduplicated after merging them into a merged-condition-level sample.
conditionDedupMethodControls the
--methodparameter for pairtools dedup when calling duplicates on condition samples.dedupMaxMismatchThe max number of base pairs of mismatch between contacts permitted to deep them as duplicates. This value is interpreted in light of the
techrepDedupMethod,biorepDedupMethod, orconditionDedupMethod.dedupSingleCellIf true, then in addition to the difference in position between contacts being small enough, the contacts’ cellID column values must also match in order for one of the contacts to be discarded as a duplicate. See the section on optional single-cell attributes for options controlling how to parse cellID from the readID or sam/bam tags.
Downsampling
techrepDownsamplePairsIf true, techrep-level samples will be downsampled in a manner controlled by the following parameters.
techrepCisStrataFor downsampling, defines a partition over distance strata for contacts mapping to the same chromosome which will be used to homogenize the number of contacts within that strata across the techrep-level samples being downsampled together.
techrepReadConjunctsFor downsampling, selects which fields will be used to partition techrep-level contacts for downsampling.
techrepDownsampleToMeanDistributionIf true, then during downsampling, the mean fraction of contacts in each block in the partition will be used as the target distribution for each of the techrep-level samples being downsampled together.
techrepToSizeControls the number of contacts each techrep-level sample in the aggregate profile will be downsampled to. If a float from 0-1, downsamples to approximately that fraction of the original size. If an integer greater than 1, downsamples to that number of contacts.
biorepDownsamplePairsIf true, biorep-level samples will be downsampled in a manner controlled by the following parameters.
biorepCisStrataFor downsampling, allows defining a partition over distance strata for contacts mapping to the same chromosome which will be used to homogenize the number of contacts within that strata across the biorep-level samples being downsampled together.
biorepReadConjunctsFor downsampling, selects which fields will be used to partition biorep-level contacts for downsampling.
biorepDownsampleToMeanDistributionIf true, then during downsampling, the mean fraction of contacts in each block in the partition will be used as the target distribution for each of the biorep-level samples being downsampled together.
biorepToSizeControls the number of contacts each biorep-level sample in the aggregate profile will be downsampled to. If a float from 0-1, downsamples to approximately that fraction of the original size. If an integer greater than 1, downsamples to that number of contacts.
conditionDownsamplePairsIf true, condition-level samples will be downsampled in a manner controlled by the following parameters.
conditionCisStrataFor downsampling, allows defining a partition over distance strata for contacts mapping to the same chromosome which will be used to homogenize the number of contacts within that strata across the condition-level samples being downsampled together.
conditionReadConjunctsFor downsampling, selects which fields will be used to partition condition-level contacts for downsampling.
conditionDownsampleToMeanDistributionIf true, then during downsampling, the mean fraction of contacts in each block in the partition will be used as the target distribution for each of the condition-level samples being downsampled together.
conditionToSizeControls the number of contacts each condition-level sample in the aggregate profile will be downsampled to. If a float from 0-1, downsamples to approximately that fraction of the original size. If an integer greater than 1, downsamples to that number of contacts.
Merging
mergeTechrepToBiorepIf true, techreps with the same condition, biorep, and aggregate profile will be merged to create a biorep-level sample. The techrep-level samples will be retained for further processing as well. If specified, techrep-level downsampling will occur prior to the merge, while deduplication occurs after the merge.
mergeBiorepToConditionIf true, bioreps with the same condition and aggregate profile will be merged to create a condition-level sample. The biorep-level samples will be retained for further processing as well. If specified, biorep-level downsampling will occur prior to the merge, while deduplication occurs after the merge.
mergeConditionIf true, conditions-level samples with the same aggregate profile will be merged to create a new sample. The condition-level samples will be retained for further processing as well. If specified, condition-level downsampling will occur prior to the merge, while deduplication occurs after the merge.
Creating contact matrices
matrixA code block defining which contact matrix formats will be produced.
matrix.makeMcoolFileFormatPart of the
matrixcode block. If true, then an mcool-format multi-resolution cooler file will be created. Creating this file format is necessary for calling insulation and compartment scores with Hich due to its dependence on thecoolerlibrary.matrix.makeHicFileFormatPart of the
matrixcode block. If true, then a hic-format Hi-C file will be created. This is not necessary for Hich, but an advantage of the .hic format over the .mcool format is that it allows retrieving expected and o/e values.matrix.resolutionsPart of the
matrixcode block. A list of resolutions to produce.juicerToolsPreParamsNot part of the
matrixcode block. Additional parameters passed to juicer tools pre command.coolerCloadParamsNot part of the
matrixcode block. Additional parameters passed to cooler cload pairs which is used to generate the highest resolution cool-format file that serves as the input used to create the mcool-format file.coolerZoomifyParamsNot part of the
matrixcode block. Additional parameters passed to cooler zoomify to create the mcool-format file from the cool format file.
Generating multiQC reports on Hi-C contacts
general.qcAfterList of steps after which a step-specific multiQC report will be generated using multiQC’s
pairtoolsmodule.
Sample selection strategies
Sample selections strategies are used to specify which samples feature calling is performed on.
sampleSelectionStrategiesHich-config block. Keys are names of sample selection strategies. Each strategy is a hashmap, where keys are sample attribute names, values are the acceptable values of the sample attributes in order for the sample to be selected.
Calling HiCRep SCC scores
HiCRep SCC scores will be called on all pairs of samples in the sample selections strategy, using all combinations of resolutions, h, dBPMax, bDownSample specified in each parameterization. This will be output as a single TSV file associating the input sample pairs, chromosome, and parameterization and the resulting SCC score.
hicrepHich-config block. Keys are names of hicrep parameterizations. Values are the parameter names and values to be used for that call to hicrep, which can include
resolutions,chroms,exclude,chromFilter,h,dBPMax,bDownSample, andsampleSelectionStrategy.
Calling compartment scores
Compartment scores (bounded by [-1, 1], with positive values being more gene dense than negative values) will be generated for each parameterization on the samples matching its sample selection strategy.
compartmentsHich-config block. Keys are names of compartment-calling parameterizations. Values are the parameter names and values to be used for that parameterization of compartment calling, which can include
resolution,hichCompartmentsParams, andsampleSelectionStrategy.
Calling insulation scores
Insulation scores will be generated for each parameterization on the samples matching its sample selection strategy.
insulationHich-config block. Keys are names of insulation score-calling parameterizations. Values are the parameter names and values to be used for that parameterization of insulation score calling, which can include
sampleSelectionStrategy.
Calling loops
Mustache loop calls will be generated for each parameterization on the samples matching its sample selection strategy.
loopsHich-config block. Keys are names of Mustache loop-calling parameterizations. Values are the parameter names and values to be used for that parameterization of loop calling, which can include
sampleSelectionStrategy.
Calling differential loop enrichments (diffloops)
Mustache diffloop calls will be generated for each parameterization on all pairs of samples matching its sample selection strategy.
differentialLoopsHich-config block. Keys are names of Mustache diffloop-calling parameterizations. Values are the parameter names and values to be used for that parameterization of diffloop calling, which can include
sampleSelectionStrategy.
Recent outputs
latest
latestSambam
latestPairs
latestMatrix
Hich sample attributes built automatically (not typically manually specified by user)
isSingleCell