@j824h indeed, I was not planning to parse the PDF, that seems like a tedious route
@guenp Copyright also concerns.
Unlike the abstract, the article content is not in public domain by default.
I think we can only repost the articles explicitly licensed for re-use.
https://arxiv.org/help/bulk_data#bulk-full-text-access
@j824h huh good to know.. how does arxiv-vanity deal with this limitation?
@guenp arXiv itself does not contain Figures & Tables data, so generally you will need to process the PDF file from https://export.arxiv.org/pdf/*
Defining which visible part of the document is a figure might be again non-trivial.