This dataset originates from the AMETHYST project. It comprises a collection of PDFs and images that undergo Machine Learning and NLP processing to extract tables containing information about Epoxy/Amine (EA) compounds and their properties.
The folders containing the PDFs are named after the journal's DOI prefix of the journal from which they were downloaded.
The list of downloaded DOIs is located in the "DOIs" folder.[10.1016, 10.1038, 10.1295, 10.3390].
The images of the epoxy-amine properties tables are in the zip file. This file contains all the images, and the folders 'R' contain the relevant tables, while 'IR' contains the irrelevant ones.
Each image is named after the reference (DOI) of the publication from which it originates, followed by _num1_num2, where num1 is the page number in the document, and num2 is the relative position of the table on the page. EX : num2=1 if it is the second image on the page, and 0 if it is the first.