mllm_shapx - Experiments Runner ============================== Overview -------- ``mllm_shapx`` is the configuration-driven experiment runner used in this monorepo to execute reproducible SHAP runs (exact and sampling-based) over curated dataset shards. It focuses on: - Loading dataset shards (Hugging Face datasets) - Expanding a JSON config into a sweep of concrete runs - Executing each run with checkpoint/resume support - Writing structured artifacts to disk - Optional Weights & Biases logging + artifact upload Below is a diagram showing DAG of runner operation: .. image:: _static/dag.png :alt: Runner Operation DAG :width: 30% :align: center :class: padded-image Where it lives -------------- In this repository, the code lives under: - ``experiments/mllm_shapx/`` The CLI entrypoint is: - ``python -m mllm_shapx.cli`` Environment / setup ------------------- The runner is designed to work in the monorepo environment. If you run ``mllm_shapx`` without installing the ``mllm_shap`` package into the active environment, set ``MLLM_SHAP_SRC`` to point at the package sources so imports resolve: .. code-block:: bash export MLLM_SHAP_SRC=../../mllm_shap/src export LOG_LEVEL=INFO A minimal example environment file is provided at: - ``experiments/mllm_shapx/.example.env`` Running the CLI --------------- Validate a config ^^^^^^^^^^^^^^^^^ .. code-block:: bash uv run python -m mllm_shapx.cli validate --config experiments/mllm_shapx/configs/mc_minimal.json Optionally, validate and also test that the dataset shard can be fetched/read: .. code-block:: bash uv run python -m mllm_shapx.cli validate \ --config experiments/mllm_shapx/configs/mc_minimal.json \ --check-dataset Run experiments ^^^^^^^^^^^^^^^ .. code-block:: bash uv run python -m mllm_shapx.cli run --config experiments/mllm_shapx/configs/mc_minimal.json Resume an interrupted run ^^^^^^^^^^^^^^^^^^^^^^^^^ Resume uses per-run ``checkpoint.json`` plus the presence of already-written sample results on disk: .. code-block:: bash uv run python -m mllm_shapx.cli run \ --config experiments/mllm_shapx/configs/mc_minimal.json \ --resume Batching multiple configs ^^^^^^^^^^^^^^^^^^^^^^^^^ A helper script exists for sequential runs with retry and logging: - ``experiments/mllm_shapx/run_configs.sh`` For cluster execution, the repository includes Slurm wrappers under: - ``experiments/run_mllm_shapx.sbatch`` - ``experiments/run_mllm_shapx_exact.sbatch`` Outputs and artifacts --------------------- For each expanded (concrete) run, ``mllm_shapx`` creates a run directory: .. image:: _static/artifact_layout.png :alt: Artifact Layout :width: 70% :align: center :class: padded-image Key files: - ``spec.json``: fully-resolved configuration for the concrete run (after sweep expansion) - ``checkpoint.json``: resume state (completed indices, next index) - ``samples/*.json``: per-row serialized results (attributions + metadata) - ``summary/aggregate_metrics.json``: run-level aggregates (runtime and attribution summaries) Configuration model (high level) -------------------------------- The runner parses a JSON config into an ``ExperimentSet`` dataclass (see ``experiments/mllm_shapx/config.py``). At a high level: - Top-level includes ``experiment_set_id``, ``output_root``, ``connector``, ``device`` - ``dataset`` selects the dataset repo/subset/split and loading knobs (parquet, trust_remote_code) - ``selection`` controls which rows to process (start index, max samples, shuffle seed) - ``modality`` controls input/output modality combinations - ``shap`` chooses the SHAP mode + embedding similarity + reducer + normalizer - ``experiments`` is a list of variants; each variant may expand into many runs (sweeps) Important enums (authoritative) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ See ``experiments/mllm_shapx/constants.py``: - Explainers: ``exact``, ``limited_mc``, ``standard_mc``, ``limited_cc``, ``standard_cc``, ``limited_neyman``, ``standard_neyman``, ``hierarchical`` - Connectors: ``liquid_audio`` (LiquidAudio), ``hf_text`` (Transformers text-only) - Similarities: ``CosineSimilarity``, ``TfIdfCosineSimilarity``, ``EuclideanSimilarity`` - Modes: ``CONTEXTUAL`` and ``STATIC`` - Dataset subsets: ``single_sentence``, ``multi_lingual``, ``multi_sentence`` Weights & Biases integration ---------------------------- If enabled in the config, W&B logging is initialized once per run and: - logs per-sample metrics during execution - uploads summary JSON and (optionally) sample directories as artifacts An example of a run logged to W&B: .. image:: _static/wandb.png :alt: W&B Example :width: 70% :align: center :class: padded-image Troubleshooting --------------- - Import errors for ``mllm_shap``: set ``MLLM_SHAP_SRC`` or install the package into the environment. - Stuck/partial runs: use ``--resume``; to restart cleanly remove the run’s ``checkpoint.json``. - More verbosity: set ``LOG_LEVEL=DEBUG``.