Typst Image Dataset

This dataset was generated with a fork of tex2typ and the hoang-quoc-trung/fusion-image-to-latex-datasets dataset, which itself is a compilation of LaTeX labels and images of equations.

The hoang-quoc-trung dataset is difficult to work with in that it has the image data stored in a large compressed RAR archive, which does not permit efficient random read access. Additionally, it appears to have a larger number of corrupted filenames inside the archive, which has been mended in this dataset.

This dataset instead opts to use a WebDataset for convenient and efficient storage of the image files and associated metadata.

The code used to generate this dataset can be found at here: https://github.com/JeppeKlitgaard/DTU-02456-Deep-Learning-Project (this is currently private but should be released after examination. If this is not the case prod me at huggingface@jeppe.science)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support