Hich CLI Utilities

hich compartments [options] REFERENCE MATRIX RESOLUTION

Calls compartment scores on Open2C-format Hi-C data.

Options:

  • --chroms: Chrom/contig names to use. If given, contigs not named here will not be used.

  • --exclude-chroms: Chrom/contig names to exclude. If given, contigs named here will not be used, even if specified in --chroms.

  • --keep-chroms-when: Python code that references a string chrom and evaluates to True if that chromosome name should be used.

  • --n_eigs: Number of eigenvectors to call. These will be sign-flipped so that positive scores correspond to regions of higher %GC content to create compartment scores. Default: 1

Arguments:

  • REFERENCE: Genome fasta reference used to determine %GC content.

  • MATRIX: A .cool or .mcool file used as input to cooltools eigs-cis

  • RESOLUTION: Resolution at which compartment scores are called.

Example:

hich compartments --keep-chroms-when "chrom.startswith('chr')" hg38_noalts.fa.gz k562.mcool 10000

hich digest [options] REFERENCE DIGEST

Creates restriction enzyme fragment index in BED format.

Options:

  • --output: Output file. Compression autodetected by file extension. If None, prints to stdout.

  • --startshift: Fixed distance to shift start of each fragment. Default: 0

  • --endshift: Fixed distance to shift end of each fragment. Default: 0

  • --cutshift: Fixed distance to shift end of each fragment. Default: 1

Arguments:

  • REFERENCE: Reference genome to use as basis for digest

  • DIGEST: Space-delimited list of enzyme names and/or kit name

Examples:

hich digest --output hg38_arima.bed hg38_noalts.fa.gz Arima
hich digest --output hg38_hic3.bed hg38_noalts.fa.gz DpnII DdeI

Supported enzymes: + All of ~800 REBASE enzymes used by biopython’s restriction module, as well as Arima, Phase Plant, Phase Animal, Phase Microbiome, Phase Human, Phase Fungal.

hich downsample [options] INPUT_PAIRS_PATH OUTPUT_PAIRS_PATH

Memory-efficient downsampling of a 4DN .pairs file. Requires an empirical distribution over the input .pairs dataset (--orig-stats) and one over the intended output dataset (--target-stats).

Options:

  • --conjuncts: PairsSegment traits that define the category for each record (space-separated string list). Default: record.chr1 record.chr2 record.pair_type stratum

  • --cis-strata: PairsSegment cis distance strata boundaries (space-separated string list). Default: 10 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000 2000000 5000000

  • --orig-stats: Stats file generated by hich stats or hich stats-aggregate containing original count distribution. Can be produced with hich stats. Computed from conjuncts and cis_strata if not supplied. Overrides default conjuncts and cis_strata if they are supplied.

  • --target-stats: Stats file generated by hich stats or hich stats-aggregate containing target count distribution.

  • --to-size: Float on [0.0, 1.0] for fraction of records to sample, or positive integer number of counts to sample. If a target stats file is supplied, further downsamples it to the given count.

Arguments:

  • INPUT_PAIRS_PATH: A 4DN .pairs file used as downsampling input

  • OUTPUT_PAIRS_PATH: Filename for downsampled output

hich fragtag [options] FRAGFILE OUT_PAIRS IN_PAIRS

Extract logs for individual tasks from work directory and compile into single .json database and report

Options:

  • --batch_size: Number of .pairs segments to read in a single batch. Default: 1000000

Arguments:

  • OUT_PAIRS: Output path for tagged .pairs file

  • IN_PAIRS: Input path for .pairs file to be tagged

hich gather [options]

Options:

  • --report, -r: Path of output HTML report

  • --db, -d: Path of output JSON database

  • --gather, -g: Glob expression for filename to gather

  • --work, -w: Root of work directory to gather from

  • --if-exists, -e: Behavior when database exists. Options: use-unchanged (default), overwrite, append

  • --silent, -s: Report messages on conflicts and errors

hich hicrep [options] PATHS

Compute HiCRep scores on multiple parameterizations and combinations of files

Options:

  • --resolutions: List of resolutions

  • --chroms, --include_chroms: Restrict to these chromosomes

  • --exclude, --exclude_chroms: Do not call on these chromosomes, has priority over --chroms

  • --chrom-filter: Python code specifying a condition for selecting chromosomes using chrom and size (chromsize) as variables

  • --h: Smoothing parameters for HiCRep (list)

  • --d-bp-max: dBPMax parameters for HiCRep (list)

  • --b-downsample: bDownsample parameters for HiCRep (list, boolean)

  • --nproc: Number of processes to use for calling HiCRep scores

  • --output: Output filename

Arguments:

  • PATHS: List of input .mcool or .hic files to use as input to HiCRep

hich reshape [options]

Modify contents of 4DN .pairs file

Options:

  • --read_from

  • --output_to

  • --parse: Format is –update [FROM_COL] [TO_COL] ‘[PATTERN]’ as in –pattern ‘readID’ ‘cellID’ ‘{cellID}:{ignore}’

  • --placeholder: Format is –placeholder [COL] [PLACEHOLDER] which replaces every column value with the placeholder string

  • --regex: Format is –placeholder [FROM_COL] [TO_COL] [REGEX] [GROUP_INDEX] which extracts the group index specified (0=whole pattern) from the given regex from FROM_COL and sets it as the value in TO_COL

  • --drop: Column to drop

  • --select: Space-separated list of output column names to output in the order specified

  • --batch-size: Number of records per batch

hich stats_aggregate [options] STATS-PATHS

Aggregate hich stats files called over .pairs with same conjuncts

Options:

  • --to-group-mean

  • --to-group-min

  • --to-size

  • --prefix

  • --outlier

  • --stats-paths

Arguments:

  • STATS-PATHS

hich stats [options]

Classify pairs and count the events.

Output has conjuncts as headers, one row per event, and a column “count” containing the count of each event.

Can read 4DN .pairs format from plaintext or from a variety of compressed formats with Python’s smart_open package.

Example:

hich stats –conjuncts “chr1 chr2” –cis-strata “10000 20000” my_pairs_file.pairs.gz

Options:

  • --conjuncts: PairsSegment traits that define the category for each record (comma-separated string list)

  • --cis-strata: PairsSegment cis distance strata boundaries for use with ‘stratum’ conjunct (comma-separated string list)

  • --output: Output file for tab-separated stats file. If not given, outputs to stdout.

Arguments:

  • PAIRS