Premise: Deep learning has become increasingly important in the analysis of digitized herbarium collections, which comprise millions of scans that provide valuable resources for studying plant evolution and biodiversity. However, leveraging deep learning algorithms to analyze these scans presents significant challenges, partly due to the heterogeneous nature of the non-plant material that forms the background of the scans. We hypothesize that removing such backgrounds can improve the performance of these algorithms.
Methods: We propose a novel method based on deep learning to segment and generate plant masks from herbarium scans and subsequently remove the non-plant backgrounds. The semi-automatic preprocessing stages involve the identification and removal of non-plant elements, substantially reducing the manual effort required to prepare the training dataset.
Results: The results highlight the importance of effective image segmentation, which achieved an F1 score of up to 96.6%. Moreover, when used in classification models for plant morphological trait identification, the images resulting from segmentation improved classification accuracy by up to 3% and F1 score by up to 7% compared to non-segmented images.
Discussion: Our approach isolates plant elements in herbarium scans by removing background elements to improve classification tasks. We demonstrate that image segmentation significantly enhances the performance of plant morphological trait identification models.
Keywords: deep learning; herbarium scans; semantic segmentation; trait classification.
© 2025 The Author(s). Applications in Plant Sciences published by Wiley Periodicals LLC on behalf of Botanical Society of America.