Configuration¶
Parameters are loaded from a YAML file passed via --params / -p.
Required Keys¶
target: Target IP name (for exampleMLLN)model_name: Model label in output project namingsweep_fraction: Fraction of training rows used during sweepgtf_file: GTF patheval_chr: Evaluation chromosometest_chr: Test chromosomeoutput_dir: Base output root
If any required key is missing, runtime will fail while loading params.
Common Keys¶
n_sweeps: Number of sweep runslogging:wandbortensorboard— wandb is supported by both backends; tensorboard is GANDALF only
Genomic/Data Keys¶
dataset: QuantNado dataset pathgtf_file: GTF annotation file pathwindows_bed: BED file of TSS windows — optional; auto-generated from GTF if omittedeval_chr: Evaluation chromosometest_chr: Test chromosomemin_target: Minimum target sample countmin_features: Minimum feature IP countexclude_ips: List of IP names to exclude from modellingprefixes: List of assay prefixes to include from sample nameswindow_size: Genomic window size around TSS in bpstep_size: Step size for sliding window in bptile_size: Size of each tile in bp; set larger thanstep_sizefor overlapping tileschunk_size_rows: optional — limit the number of rows processed per chunk during signal extraction; useful for memory-constrained environments. Defaults to the full dataset if omitted.
Sample Naming Convention¶
Sample names in the QuantNado dataset must follow a specific format to be parsed correctly:
{ASSAY}-{CELL_TYPE}_{IP}_{REPLICATE}
Assay types (required): Only samples with prefixes listed in prefixes are included. Defaults are CAT, ChIP, and CM.
Cell type: Multi-part names can use dashes within the cell type (trailing digits are stripped).
IP: Immunoprecipitation target name (e.g., MLLN, H3K4me3, etc.).
Replicate: Numeric replicate identifier.
Example valid names:
- ChIP-K562_MLLN_1 — ChIP assay, K562 cells, MLLN IP, rep 1
- CAT-hESC-BMP4_H3K27ac_2 — CAT assay, hESC-BMP4 cells, H3K27ac IP, rep 2
- CM-THP1_CEBPD_3 — CUT&RUN (CM) assay, THP1 cells, CEBPD IP, rep 3
IPs listed in exclude_ips (default: ["AF4C", "MLLC"]) will be filtered out.
Derived Runtime Paths¶
At runtime the code computes:
PROJECT = <model_name>_<target>RES_DIR = <output_dir>/<PROJECT>FIG_DIR = <RES_DIR>/figuresLOGGING_DIR = <RES_DIR>/logsDATA_DIR = <RES_DIR>/dataEVAL_DIR = <RES_DIR>/evaluate