Hich CLI Utilities ================== ``hich compartments [options] REFERENCE MATRIX RESOLUTION`` ................................................................. Calls compartment scores on Open2C-format Hi-C data. Options: + ``--chroms``: Chrom/contig names to use. If given, contigs not named here will not be used. + ``--exclude-chroms``: Chrom/contig names to exclude. If given, contigs named here will not be used, even if specified in ``--chroms``. + ``--keep-chroms-when``: Python code that references a string ``chrom`` and evaluates to True if that chromosome name should be used. + ``--n_eigs``: Number of eigenvectors to call. These will be sign-flipped so that positive scores correspond to regions of higher %GC content to create compartment scores. Default: 1 Arguments: + ``REFERENCE``: Genome fasta reference used to determine %GC content. + ``MATRIX``: A .cool or .mcool file used as input to `cooltools eigs-cis `_ + ``RESOLUTION``: Resolution at which compartment scores are called. Example: .. code-block:: bash hich compartments --keep-chroms-when "chrom.startswith('chr')" hg38_noalts.fa.gz k562.mcool 10000 ``hich digest [options] REFERENCE DIGEST`` ......................... Creates restriction enzyme fragment index in BED format. Options: + ``--output``: Output file. Compression autodetected by file extension. If None, prints to stdout. + ``--startshift``: Fixed distance to shift start of each fragment. Default: 0 + ``--endshift``: Fixed distance to shift end of each fragment. Default: 0 + ``--cutshift``: Fixed distance to shift end of each fragment. Default: 1 Arguments: + ``REFERENCE``: Reference genome to use as basis for digest + ``DIGEST``: Space-delimited list of enzyme names and/or kit name Examples: .. code-block:: bash hich digest --output hg38_arima.bed hg38_noalts.fa.gz Arima hich digest --output hg38_hic3.bed hg38_noalts.fa.gz DpnII DdeI Supported enzymes: + All of ~800 REBASE enzymes used by `biopython's restriction module `_, as well as ``Arima``, ``Phase Plant``, ``Phase Animal``, ``Phase Microbiome``, ``Phase Human``, ``Phase Fungal``. ``hich downsample [options] INPUT_PAIRS_PATH OUTPUT_PAIRS_PATH`` ....................................................................... Memory-efficient downsampling of a 4DN .pairs file. Requires an empirical distribution over the input .pairs dataset (``--orig-stats``) and one over the intended output dataset (``--target-stats``). Options: + ``--conjuncts``: PairsSegment traits that define the category for each record (space-separated string list). Default: ``record.chr1 record.chr2 record.pair_type stratum`` + ``--cis-strata``: PairsSegment cis distance strata boundaries (space-separated string list). Default: ``10 20 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000 2000000 5000000`` + ``--orig-stats``: Stats file generated by hich stats or hich stats-aggregate containing original count distribution. Can be produced with hich stats. Computed from conjuncts and cis_strata if not supplied. Overrides default conjuncts and cis_strata if they are supplied. + ``--target-stats``: Stats file generated by hich stats or hich stats-aggregate containing target count distribution. + ``--to-size``: Float on [0.0, 1.0] for fraction of records to sample, or positive integer number of counts to sample. If a target stats file is supplied, further downsamples it to the given count. Arguments: + ``INPUT_PAIRS_PATH``: A 4DN .pairs file used as downsampling input + ``OUTPUT_PAIRS_PATH``: Filename for downsampled output ``hich fragtag [options] FRAGFILE OUT_PAIRS IN_PAIRS`` ...................................................... Extract logs for individual tasks from work directory and compile into single .json database and report Options: + ``--batch_size``: Number of .pairs segments to read in a single batch. Default: 1000000 Arguments: + ``OUT_PAIRS``: Output path for tagged .pairs file + ``IN_PAIRS``: Input path for .pairs file to be tagged ``hich gather [options]`` ......................... Options: + ``--report, -r``: Path of output HTML report + ``--db, -d``: Path of output JSON database + ``--gather, -g``: Glob expression for filename to gather + ``--work, -w``: Root of work directory to gather from + ``--if-exists, -e``: Behavior when database exists. Options: ``use-unchanged (default), overwrite, append`` + ``--silent, -s``: Report messages on conflicts and errors ``hich hicrep [options] PATHS`` ................................. Compute HiCRep scores on multiple parameterizations and combinations of files Options: + ``--resolutions``: List of resolutions + ``--chroms, --include_chroms``: Restrict to these chromosomes + ``--exclude, --exclude_chroms``: Do not call on these chromosomes, has priority over ``--chroms`` + ``--chrom-filter``: Python code specifying a condition for selecting chromosomes using ``chrom`` and ``size`` (chromsize) as variables + ``--h``: Smoothing parameters for HiCRep (list) + ``--d-bp-max``: dBPMax parameters for HiCRep (list) + ``--b-downsample``: bDownsample parameters for HiCRep (list, boolean) + ``--nproc``: Number of processes to use for calling HiCRep scores + ``--output``: Output filename Arguments: + ``PATHS``: List of input .mcool or .hic files to use as input to HiCRep ``hich reshape [options]`` ......................... Modify contents of 4DN .pairs file Options: + ``--read_from`` + ``--output_to`` + ``--parse``: Format is --update [FROM_COL] [TO_COL] '[PATTERN]' as in --pattern 'readID' 'cellID' '{cellID}:{ignore}' + ``--placeholder``: Format is --placeholder [COL] [PLACEHOLDER] which replaces every column value with the placeholder string + ``--regex``: Format is --placeholder [FROM_COL] [TO_COL] [REGEX] [GROUP_INDEX] which extracts the group index specified (0=whole pattern) from the given regex from FROM_COL and sets it as the value in TO_COL + ``--drop``: Column to drop + ``--select``: Space-separated list of output column names to output in the order specified + ``--batch-size``: Number of records per batch ``hich stats_aggregate [options] STATS-PATHS`` ......................... Aggregate hich stats files called over .pairs with same conjuncts Options: + ``--to-group-mean`` + ``--to-group-min`` + ``--to-size`` + ``--prefix`` + ``--outlier`` + ``--stats-paths`` Arguments: + ``STATS-PATHS`` ``hich stats [options]`` ......................... Classify pairs and count the events. Output has conjuncts as headers, one row per event, and a column "count" containing the count of each event. Can read 4DN .pairs format from plaintext or from a variety of compressed formats with Python's smart_open package. Example: hich stats --conjuncts "chr1 chr2" --cis-strata "10000 20000" my_pairs_file.pairs.gz Options: + ``--conjuncts``: PairsSegment traits that define the category for each record (comma-separated string list) + ``--cis-strata``: PairsSegment cis distance strata boundaries for use with 'stratum' conjunct (comma-separated string list) + ``--output``: Output file for tab-separated stats file. If not given, outputs to stdout. Arguments: + ``PAIRS``