Swarm - public/projects/MozzaVID/_README

This dataset is shared in conjunction with a manuscript, titled "MozzaVID: Mozzarella Volumetric Image Dataset"

The data is provided in three versions/sizes (Small, Base, Large), reflecting the setup proposed in the paper. Use the "models" folder to recreate reported results.

For details, see below liks:
1. Webpage of the project - https://papieta.github.io/MozzaVID/
2. Manuscript - https://arxiv.org/abs/2412.04880
3. GitHub repository (data loaders, examples) - https://github.com/PaPieta/MozzaVID/tree/main

--------

Unique scans can be explored through the "raw_dataset" folder. It contains 25 subfolders, one for each coarse-grained class. Inside the subfolders are .tiff files containing cleaned up CT scans. Each scan is ~5.1 GB in size, (2156, 1601, 1601)px saved with uint8 data type. The scans are cropped to conain only the cheese microstructure (no surrounding air), and their intensity unified. 

The names should be interpreted as {cheese_idx}_{sample_idx}_{scan_idx}.tiff, so e.g sample 1_3_4.tiff is:
* Cheese idx/coarse-grained class: 1
* Sample idx/fine-graiend class: 3
* Scan/local tomography idx: 4

---------

To download the data, click on a chosen file/folder, then click the download button in the top right corner.

Alternatively, the data can be easily fetched through the command line, for example with:
> wget https://archive.compute.dtu.dk/downloads/public/projects/MozzaVID/[Small, Base, Large].zip

---------

If you are low on disk space, you can stream the dataset splits during training using our WebDataset setup on HuggingFace (check our GitHub for details):
https://huggingface.co/datasets/dtudk/[MozzaVID_Small, MozzaVID_Base, MozzaVID_Large]

If you use our data, please consider citing our work:

@misc{pieta2024b,
      title={MozzaVID: Mozzarella Volumetric Image Dataset}, 
      author={Pawel Tomasz Pieta and Peter Winkel Rasmussen and Anders Bjorholm Dahl and Jeppe Revall Frisvad and Siavash Arjomand Bigdeli and Carsten Gundlach and Anders Nymark Christensen},
      year={2024},
      howpublished={arXiv:2412.04880 [cs.CV]},
      eprint={2412.04880},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.04880}, 
}
#	Change	User	Description
#3	16559	hmkj	Update, copying raw_dataset from private to public depot
#2	16555	tsal	Updating to Oct 2025 version
#1	16539	hmkj	Readme to public
_README_ #3