Quick Start¶
1. Prepare Parameters¶
Create or edit a params file:
tabnado-init
# tabnado parameters
# Required keys
target: MLLN
model_name: GANDALF
sweep_fraction: 0.2
gtf_file: data/gencode.vM25.annotation.gtf.gz
eval_chr: chr8
test_chr: chr9
output_dir: results
# Optional keys
dataset: data/dataset
# windows_bed: data/tss_windows.bed
n_sweeps: 10
logging: wandb
min_target: 1
min_features: 10
exclude_ips: ["AF4C", "MLLC"]
prefixes: ["CAT", "ChIP", "CM"]
window_size: 3000
step_size: 100
tile_size: 100
# chunk_size_rows: 1000000 # optional: limit rows per chunk for memory-constrained loading
2. Run Full Pipeline¶
tabnado-run --params params.yaml
3. Check Outputs¶
Look under:
results/<MODEL_NAME>_<TARGET>/
You should see:
best_hyperparameters.jsonfinal_model/figures/scatter_test.pngfigures/embeddings_umap.pngshap/shap_mean_abs.csvfigures/shap_clustermap.pngshap/spatial_shap_by_offset_<target>.csvfigures/shap_spatial_heatmap_<target>.pngfigures/shap_offset_line_<target>.png
4. Run on Slurm with a Container¶
Pull the image once on the cluster login node:
apptainer pull tabnado.sif docker://ghcr.io/cchahrour/tabnado:latest
If your cluster uses singularity instead of apptainer, use the same commands with singularity.
Note on quantnado:
- If
data_diralready contains cached parquet splits (dataset_train.parquet,dataset_eval.parquet,dataset_test.parquet), the pipeline can run from those cached files. - If those parquet files do not exist,
tabnado-dataneedsquantnadoavailable to build them from the raw dataset store.
Example single-job submission script (run_tabnado.sbatch):
#!/bin/bash
#SBATCH --job-name=tabnado
#SBATCH --partition=compute
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=24:00:00
#SBATCH --output=logs/%x-%j.out
set -euo pipefail
cd /path/to/tabnado
DATA_DIR=/path/to/dataset
apptainer exec \
--bind "$PWD:$PWD" \
--bind "$DATA_DIR:$DATA_DIR" \
--pwd "$PWD" \
tabnado.sif \
tabnado-run --params params.yaml
Submit with:
sbatch run_tabnado.sbatch
Example array job for multiple parameter files:
#!/bin/bash
#SBATCH --job-name=tabnado-array
#SBATCH --array=0-3
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=24:00:00
#SBATCH --output=logs/%x-%A_%a.out
set -euo pipefail
cd /path/to/tabnado
DATA_DIR=/path/to/dataset
apptainer exec \
--bind "$PWD:$PWD" \
--bind "$DATA_DIR:$DATA_DIR" \
--pwd "$PWD" \
tabnado.sif \
tabnado-run --params experiments/params_${SLURM_ARRAY_TASK_ID}.yaml