Hich CLI Utilities
==================

``hich compartments [options] REFERENCE MATRIX RESOLUTION``
.................................................................

Calls compartment scores on Open2C-format Hi-C data.

Options:

+ ``--chroms``: Chrom/contig names to use. If given, contigs not named here will not be used.
+ ``--exclude-chroms``: Chrom/contig names to exclude. If given, contigs named here will not be used, even if specified in ``--chroms``.
+ ``--keep-chroms-when``: Python code that references a string ``chrom`` and evaluates to True if that chromosome name should be used.
+ ``--n_eigs``: Number of eigenvectors to call. These will be sign-flipped so that positive scores correspond to regions of higher %GC content to create compartment scores. Default: 1

Arguments:

+ ``REFERENCE``: Genome fasta reference used to determine %GC content.
+ ``MATRIX``: A .cool or .mcool file used as input to `cooltools eigs-cis <https://cooltools.readthedocs.io/en/latest/cli.html#cooltools-eigs-cis>`_
+ ``RESOLUTION``: Resolution at which compartment scores are called.

Example:

.. code-block:: bash
    
    hich compartments --keep-chroms-when "chrom.startswith('chr')" hg38_noalts.fa.gz k562.mcool 10000

``hich digest [options] REFERENCE DIGEST``
.........................

Creates restriction enzyme fragment index in BED format. 

Options:

+ ``--output``: Output file. Compression autodetected by file extension. If None, prints to stdout.
+ ``--startshift``: Fixed distance to shift start of each fragment. Default: 0
+ ``--endshift``: Fixed distance to shift end of each fragment. Default: 0
+ ``--cutshift``: Fixed distance to shift end of each fragment. Default: 1

Arguments:

+ ``REFERENCE``: Reference genome to use as basis for digest
+ ``DIGEST``: Space-delimited list of enzyme names and/or kit name

Examples:

.. code-block:: bash
    
    hich digest --output hg38_arima.bed hg38_noalts.fa.gz Arima
    hich digest --output hg38_hic3.bed hg38_noalts.fa.gz DpnII DdeI

Supported enzymes:
+ All of ~800 REBASE enzymes used by `biopython's restriction module <https://github.com/biopython/biopython/blob/master/Doc/cookbook/Restriction/Restriction.md>`_, as well as ``Arima``, ``Phase Plant``, ``Phase Animal``, ``Phase Microbiome``, ``Phase Human``, ``Phase Fungal``.

``hich downsample [options] INPUT_PAIRS_PATH OUTPUT_PAIRS_PATH``
.......................................................................

Memory-efficient downsampling of a 4DN .pairs file. Requires an empirical distribution over the input .pairs dataset (``--orig-stats``) and one over the intended output dataset (``--target-stats``).

Options:

+ ``--conjuncts``: PairsSegment traits that define the category for each record (space-separated string list). Default: ``record.chr1 record.chr2 record.pair_type stratum``
+ ``--cis-strata``: PairsSegment cis distance strata boundaries (space-separated string list). Default: ``10 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000 2000000 5000000``
+ ``--orig-stats``: Stats file generated by hich stats or hich stats-aggregate containing original count distribution. Can be produced with hich stats. Computed from conjuncts and cis_strata if not supplied. Overrides default conjuncts and cis_strata if they are supplied.
+ ``--target-stats``: Stats file generated by hich stats or hich stats-aggregate containing target count distribution.
+ ``--to-size``: Float on [0.0, 1.0] for fraction of records to sample, or positive integer number of counts to sample. If a target stats file is supplied, further downsamples it to the given count.

Arguments:

+ ``INPUT_PAIRS_PATH``: A 4DN .pairs file used as downsampling input
+ ``OUTPUT_PAIRS_PATH``: Filename for downsampled output


``hich fragtag [options] FRAGFILE OUT_PAIRS IN_PAIRS``
......................................................

Extract logs for individual tasks from work directory and compile into single .json database and report

Options:

+ ``--batch_size``: Number of .pairs segments to read in a single batch. Default: 1000000

Arguments:

+ ``OUT_PAIRS``: Output path for tagged .pairs file
+ ``IN_PAIRS``: Input path for .pairs file to be tagged

``hich gather [options]``
.........................

Options:

+ ``--report, -r``: Path of output HTML report
+ ``--db, -d``: Path of output JSON database
+ ``--gather, -g``: Glob expression for filename to gather
+ ``--work, -w``: Root of work directory to gather from
+ ``--if-exists, -e``: Behavior when database exists. Options: ``use-unchanged (default), overwrite, append``
+ ``--silent, -s``: Report messages on conflicts and errors

``hich hicrep [options] PATHS``
.................................

Compute HiCRep scores on multiple parameterizations and combinations of files

Options:

+ ``--resolutions``: List of resolutions
+ ``--chroms, --include_chroms``: Restrict to these chromosomes
+ ``--exclude, --exclude_chroms``: Do not call on these chromosomes, has priority over ``--chroms``
+ ``--chrom-filter``: Python code specifying a condition for selecting chromosomes using ``chrom`` and ``size`` (chromsize) as variables
+ ``--h``: Smoothing parameters for HiCRep (list)
+ ``--d-bp-max``: dBPMax parameters for HiCRep (list)
+ ``--b-downsample``: bDownsample parameters for HiCRep (list, boolean)
+ ``--nproc``: Number of processes to use for calling HiCRep scores
+ ``--output``: Output filename

Arguments:

+ ``PATHS``: List of input .mcool or .hic files to use as input to HiCRep

``hich reshape [options]``
.........................

Modify contents of 4DN .pairs file

Options:

+ ``--read_from``
+ ``--output_to``
+ ``--parse``: Format is --update [FROM_COL] [TO_COL] '[PATTERN]' as in --pattern 'readID' 'cellID' '{cellID}:{ignore}'
+ ``--placeholder``: Format is --placeholder [COL] [PLACEHOLDER] which replaces every column value with the placeholder string
+ ``--regex``: Format is --placeholder [FROM_COL] [TO_COL] [REGEX] [GROUP_INDEX] which extracts the group index specified (0=whole pattern) from the given regex from FROM_COL and sets it as the value in TO_COL
+ ``--drop``: Column to drop
+ ``--select``: Space-separated list of output column names to output in the order specified
+ ``--batch-size``: Number of records per batch

``hich stats_aggregate [options] STATS-PATHS``
.........................

Aggregate hich stats files called over .pairs with same conjuncts

Options:

+ ``--to-group-mean``
+ ``--to-group-min``
+ ``--to-size``
+ ``--prefix``
+ ``--outlier``
+ ``--stats-paths``

Arguments:

+ ``STATS-PATHS``

``hich stats [options]``
.........................

Classify pairs and count the events.

Output has conjuncts as headers, one row per event, and a column "count" containing the count of each event.

Can read 4DN .pairs format from plaintext or from a variety of compressed formats with Python's smart_open package.

Example:
    hich stats --conjuncts "chr1 chr2" --cis-strata "10000 20000" my_pairs_file.pairs.gz

Options:

+ ``--conjuncts``: PairsSegment traits that define the category for each record (comma-separated string list)
+ ``--cis-strata``: PairsSegment cis distance strata boundaries for use with 'stratum' conjunct (comma-separated string list)
+ ``--output``: Output file for tab-separated stats file. If not given, outputs to stdout.


Arguments:

+ ``PAIRS``