Abstract
Task-oriented grasping presents a significant challenge in robotics, limited by the deficiencies of current datasets. Existing datasets typically offer SE(3) grasp poses that lack the detailed, task-specific ground truth essential for effective manipulation. To address this issue, the SenseTask dataset has been developed as a robust and meticulously crafted solution tailored to task-oriented grasping. The SenseTask dataset delivers 170 high-fidelity 3D models spanning 14 categories, each equipped with precise 3D annotations for task-specific parts. These annotations, created using a custom tool, are in the form of a segmentation mask for each task over the entire mesh of the object, enabling the generation of optimized SE(3) grasp poses for functional interactions. The dataset covers four pivotal tasks related to various grasping actions and includes 480K grasp instances generated using a grasp generation pipeline and performs non-maximum suppression for optimal grasp selection. The dataset's strength is validated by fine-tuning the Mask2Former and CLIPSeg models on SenseTask, achieving mean IoU scores of 92.44 and 82.22, respectively, along with a classification accuracy of 99.89 for Mask2Former. This performance underscores the dataset's power to enable precise, context-driven grasping solutions. The SenseTask dataset serves as a valuable resource for task-oriented grasping research, offering enhanced flexibility for training diverse models. With its high-fidelity models, rich annotations, and extensive grasp instances, it paves the way for breakthroughs in robotic manipulation across diverse domains.
Key Highlights
- 170 High-Fidelity 3D Models - Comprehensive collection of meticulously crafted 3D object models with precise geometric details and realistic textures
- 14 Object Categories - Diverse range including Bottle, Cup, Fork, Hammer, Jar, Knife, Mug, Pan, Pliers, Roller, Scissors, ScrewDriver, Spoon, EyeGlasses
- 480,000+ Grasp Instances - Extensive collection of optimized SE(3) grasp poses generated through advanced pipeline with non-maximum suppression
- 4 Task Types - Stabilizing, Pouring, Tool Usage, Opening/Closing
- Task-Specific Annotations - Precise 3D annotations for task-specific parts in the form of segmentation masks
- Validated Performance - Mask2Former achieves 92.44% mIoU and 99.89% accuracy, CLIPSeg achieves 82.22% mIoU
Keywords
Functional analysis
Robotic manipulation
Task-oriented grasping
3D object dataset
Computer vision
Machine learning
Robotics dataset
Grasp planning
Dataset Statistics
High-Fidelity 3D Models
Comprehensive collection of meticulously crafted 3D object models with precise geometric details and realistic textures, enabling accurate simulation and real-world application. Format: GLB/OBJ, Quality: High-Fidelity, Textures: Included.
Object Categories
Diverse range of everyday objects spanning multiple functional categories, each carefully selected to represent common manipulation scenarios in robotics applications. Coverage: 14 Categories including Bottle, Cup, Fork, Hammer, Jar, Knife, Mug, Pan, Pliers, Roller, Scissors, ScrewDriver, Spoon, EyeGlasses.
Task Types
Four fundamental manipulation tasks covering essential robotic interactions: stabilizing objects, pouring liquids, using tools, and opening/closing containers. Tasks: Stabilizing, Pouring, Tool Usage, Opening/Closing. Annotations: Task-Specific.
Grasp Instances
Extensive collection of optimized SE(3) grasp poses generated through advanced pipeline with non-maximum suppression, ensuring high-quality task-oriented grasping solutions. Total Grasps: 480,000+, Generation: Automated Pipeline, Optimization: NMS Applied.
Mask2Former Performance
State-of-the-art segmentation model fine-tuned on SenseTask, achieving exceptional mean Intersection over Union (mIoU) scores demonstrating the dataset's quality and utility. mIoU: 92.44%, Accuracy: 99.89%, Model: Mask2Former.
CLIPSeg Performance
Vision-language model performance on SenseTask, showcasing the dataset's effectiveness for prompt-based segmentation and task-oriented understanding. mIoU: 82.22%, Model: CLIPSeg, Type: Vision-Language.
Resources
Access paper, code, dataset, and additional materials for the SenseTask dataset.
Dataset
Download the complete dataset from Google Drive
170 high-fidelity 3D models with annotations
Citation
If you use the SenseTask dataset in your research, please cite:
@article{beigy_sensetask,
title={SenseTask: 3D Object Dataset with Task-Specific Annotations for Robotic Grasping},
author={Beigy, AliReza and Azimmohseni, Farbod and Tale Masouleh, Mehdi and Kalhor, Ahmad},
journal={Available at SSRN 5929537},
year={2025},
doi={10.2139/ssrn.5929537},
url={https://doi.org/10.2139/ssrn.5929537}
}
About
The SenseTask dataset is developed by the Human and Robot Interaction Lab at the University of Tehran. This dataset serves as a valuable resource for task-oriented grasping research, offering enhanced flexibility for training diverse models. With its high-fidelity models, rich annotations, and extensive grasp instances, it paves the way for breakthroughs in robotic manipulation across diverse domains.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Website: https://sensetask.net/