Version
version1

Set of deteriorated versions of the publicly available non-commercial IMDB database, comprising different amounts of duplicates.

The datasets were extracted from PostgreSQL databases including relations titles, name_basics, title_episode, title_ratings, and title_principals from https://datasets.imdbws.com/  IMDB database version downloaded on April 7th, 2024. these databases were deteriorated on purpose to experiment the Red2Hunt method that generates a redundant-free database from any relational operational database comprising surrogate keys and duplicates.

Date de publication
03/06/2024
Auteur(s)
Mathilde MARCY, Jean-Marc PETIT
Taille totale et nombre de fichier(s)
70Go - 7 fichiers dump
URL pour visualiser le dataset
Télécharger le dataset (En cliquant sur ce lien, vous acceptez la licence associée à ce dataset)
Version
version1

A set of computer generated cave system and tunnel system

  • available at various resolutions,
  • as watertight triangulations and/or point-clouds,
  • under the PLY and 3DTiles file formats.
Date de publication
16/04/2024
Auteur(s)
TeaTime and LIRIS (VCity team)
Voir toutes les versions du dataset
Taille totale et nombre de fichier(s)
3GO
URL pour visualiser le dataset
Télécharger le dataset (En cliquant sur ce lien, vous acceptez la licence associée à ce dataset)
Version
v1

This dataset has been generated from 3 constructions models, transferred from Autodesk Revit to NVIDIA Isaac Sim. It contains 8751 samples of RGB images associated with the semantic segmentation masks and label files for 13 classes (rectangular_sheath, circular_sheath, pipe, air_vent, fan_coil, stair, wall, floor, pipe_accessory, framework, radiant_panel, climate_engineering_equipment, ceiling, handrail, roof, cable_tray, pole).

Date de publication
20/12/2023
Auteur(s)
Mathis Baubriaud
Taille totale et nombre de fichier(s)
14.8GO
URL pour visualiser le dataset
Télécharger le dataset (En cliquant sur ce lien, vous acceptez la licence associée à ce dataset)
Version
version2

Backends declared in the pyproject.toml files

This dataset contains CSV and SQLite files with data about projects backends extracted from Metadata about every file uploaded to PyPI :

  • extract-pyproject-all-versions.csv, extract-pyproject-all-versions.db : for the projects having a pyproject.toml file and having been uploaded after 2018, get the project_name, max project_version, max uploaded_on, list of distinct project_version, list of distinct uploaded_on, list of distinct path, ...
  • extract-pyproject-latest.csv, extract-pyproject-latest.db : for each project found in extract-pyproject-all-versions, get the data of the latest uploaded_on date (1)
  • pyproject_backends.csv, pyproject_backends.db : the build backend found in extract-pyproject-latest.db for each project only in the pyproject.toml file on the root of the project (2)

Source code for the data extraction.

(1) There are several pyproject.toml files for some projects (e.g poetry), often in test folders
(2) The test is quite basic, but there are few projects that have several pyproject.toml file matching this test

PyPI metadata further analysis

After the publication of the first charts, I wanted to know how many projects had no source package, how many projects had no pyproject.toml to complete the first statistics.

This dataset contains CSV and SQLite files extracted from the same source (parquet files from "Metadata about every file uploaded to PyPI"):

  • extract-project-releases-2018-and-later.csv, extract-project-releases-2018-and-later.db : extract the metadata of the projects uploaded to since 2018 : get the project_name, project_version, project_release, release type (source or wheel), ...

These files weight 1.1 and 1.3 Go respectively.

Source code for the second data extraction.

Date de publication
30/12/2023
Auteur(s)
Françoise CONIL
Voir toutes les versions du dataset
Taille totale et nombre de fichier(s)
2,8 Go, 8 files
Télécharger le dataset (En cliquant sur ce lien, vous acceptez la licence associée à ce dataset)
Version
version1

This dataset holds

  • a 3D lasergrammetric dataset of the so called "Creux des Elaphes" cave as authored by EDYTEM / USMB / CNRS (under the laser point cloud data LAS format)
  • its conversion to the 3DTiles format
Date de publication
24/11/2023
Auteur(s)
EDYTEM / USMB / CNRS and LIRIS (VCity team)
Taille totale et nombre de fichier(s)
781Mo for the original LAZ file ( 94 465 067 RGB points)
URL pour visualiser le dataset
Télécharger le dataset (En cliquant sur ce lien, vous acceptez la licence associée à ce dataset)
Version
version1

This dataset originates from the AMETHYST project. It comprises a collection of PDFs and images that undergo Machine Learning and NLP processing to extract tables containing information about Epoxy/Amine (EA) compounds and their properties.

Date de publication
20/09/2023
Auteur(s)
Aymar TCHAGOUE, Véronique EGLIN, ,Jean-Marc PETIT, Jannick DUCHET, Sébastien PRUVOST, Jean-Francois GERARD
Taille totale et nombre de fichier(s)
4.3 GB, 4 dossiers, 1612 files
URL pour visualiser le dataset
Télécharger le dataset (En cliquant sur ce lien, vous acceptez la licence associée à ce dataset)
Version
version1

FruitBin contains more than 1M images and 40M instance-level 6D pose annotations over both symmetric and asymmetric fruits with or without texture. Rich annotations and metadata (including 6D pose, segmentation mask, point cloud, 2D and 3D bounding boxes, occlusion rate) allow the tuning of the proposed dataset for benchmarking the robustness of object instance segmentation and 6D pose estimation models (with respect to variations in terms of  lighting, texture, occlusion, camera pose and scenes). We further propose three scenarios presenting significant challenges of 6D pose estimation models: new scene generalization; new camera viewpoint generalization; and occlusion robustness. We show the results of these three scenarios for two 6D pose estimation baselines making use of RGB or RGBD images. To the best of our knowledge, FruitBin is the first  dataset for the challenging task of fruit bin picking and the biggest large-scale dataset for 6D pose estimation with the most comprehensive challenges, tunable over scenes, camera poses and occlusions.

License : CC BY-NC-SA

Date de publication
08/06/2023
Auteur(s)
Guillaume Duret, Mahmoud Ali, Nicolas Cazin, Alexandre Chapin, Florence Zara, Emmanuel Dellandrea, Jan Peters, Liming Chen
Voir toutes les versions du dataset
Taille totale et nombre de fichier(s)
Raw data ~7x250 Go, Benchmarks ~7x20 Go
URL pour visualiser le dataset
Télécharger le dataset (En cliquant sur ce lien, vous acceptez la licence associée à ce dataset)
Version
version1

Estimating fluid dynamics is classically done through the simulation and integration of numerical models solving the Navier-Stokes equations, which is computationally complex and time-consuming even on high-end hardware. This is a notoriously hard problem to solve, which has recently been addressed with machine learning, in particular graph neural networks (GNN) and variants trained and evaluated on datasets of static objects in static scenes with fixed geometry. We attempt to go beyond existing work in complexity and introduce a new model, method and benchmark. We propose EAGLE, a large-scale dataset of ∼1.1 million 2D meshes resulting from simulations of unsteady fluid dynamics caused by a moving flow source interacting with nonlinear scene structure, comprised of 600 different scenes of three different types. To perform future forecasting of pressure and velocity on the challenging EAGLE dataset, we introduce a new mesh transformer. It leverages node clustering, graph pooling and global attention to learn long-range dependencies between spatially distant data points without needing a large number of iterations, as existing GNN methods do. We show that our transformer outperforms state-of-the-art performance on, both, existing synthetic and real datasets and on EAGLE. Finally, we highlight that our approach learns to attend to airflow, integrating complex information in a single iteration.

Date de publication
30/01/2023
Auteur(s)
Steeven Janny, Aurélien Bénéteau, Madiha Nadri, Julie Digne, Nicolas Thome, Christian Wolf
Voir toutes les versions du dataset
URL pour visualiser le dataset
Télécharger le dataset (En cliquant sur ce lien, vous acceptez la licence associée à ce dataset)
Version
version1

A collection of urban data graphs in RDF/OWL formats derived from CityGML Grand Lyon Open data

Date de publication
12/12/2022
Auteur(s)
Diego Vinasco-Alvarez, John Samuel, Sylvie Servigne, Gilles Gesquière
URL pour visualiser le dataset
Télécharger le dataset (En cliquant sur ce lien, vous acceptez la licence associée à ce dataset)
Version
version1

We provide a large-scale dataset of textured meshes with over 343k stimuli generated from 55 source models quantitatively characterized in terms of geometric, color, and semantic complexity to ensure their diversity. The dataset covers a wide range of compression-based distortions applied on the geometry, texture mapping and texture image. The database can be used to train no-reference quality metrics and develop rate-distortion models for meshes.
From the established dataset, we carefully selected a challenging subset of 3000 stimuli that we annotated in a large-scale subjective experiment in crowdsourcing based on the double stimulus impairment scale (DSIS) method. Over 148k quality scores were collected from 4513 participants. To the best of our knowledge, it is the largest quality assessment dataset of textured meshes associated with subjective scores and Mean Opinion Scores (MOS) to date. This database is valuable for training and benchmarking quality metrics.
Quality scores of the remaining stimuli in the dataset (i.e. those not involved in the subjective experiment) were predicted (Pseudo-MOS) using a quality metric called Graphics-LPIPS, based on deep learning, trained and tested on the subset of annotated stimuli.

This dataset was created at the LIRIS lab, Université de Lyon. It is associated with the following reference. Please cite it, if you use the dataset.

Yana Nehmé, Johanna Delanoy, Florent Dupont, Jean-Philippe Farrugia, Patrick Le Callet, Guillaume Lavoué, Textured mesh quality assessment: Large-scale dataset and deep learning-based quality metric, ACM Transactions on Graphics, Volume 42, Issue 3, Article No. 31, pp 1–20, 2023.

 

Date de publication
08/03/2022
Auteur(s)
Yana Nehmé, Johanna Delanoy, Florent Dupont, Jean-Philippe Farrugia, Patrick Le Callet, Guillaume Lavoué
Taille totale et nombre de fichier(s)
76,.9GB
URL pour visualiser le dataset
Télécharger le dataset (En cliquant sur ce lien, vous acceptez la licence associée à ce dataset)