A bimodal image dataset for seed classification from the visible and near-infrared spectrum

Maksim Kukushkin1,2, Martin Bogdan2, Simon Goertz3, Jan-Ole Callsen3, Eric Oldenburg4, Matthias Enders3,†, Thomas Schmid1,5,†
1Martin Luther University Halle-Wittenberg, 2Leipzig University, 3NPZ Innovation GmbH, 4Norddeutsche Pflanzenzucht Hans-Georg Lembke KG, 5Lancaster University in Leipzig

these authors contributed equally to this work


A brief video introduction to labor-intensive manual seed purity testing. The goal of the KIRa project is to train Machine Learning models to automate this process.

BiSID-5k - bimodal dataset for seed classification

Multi-Class

BiSID-5k is a high-quality, multi-class seed image dataset designed for advanced research in computer vision and agriculture. It features 5,000 expertly captured images—500 for each of ten seed species, including Black grass (Alopecurus myosuroides L.), Common knotgrass (Polygonum aviculare L.), Common hemp-nettle (Galeopsis tetrahit L.), Bistort (Bistorta officinalis L.), Red campion (Silene dioica L.), Red clover (Trifolium pratense L.), Herb robert (Geranium robertianum L.), Common chickweed (Stellaria media L.), Meadow foxtail (Alopecurus pratensis L.) and Winter oilseed rape (Brassica napus L.). Each seed is aquired in both visible and near-infrared spectra, providing rich, multimodal data for robust analysis, classification, and machine learning applications.

Multi-Modal

Our dataset features a powerful bimodal design, pairing high-spatial resolution RGB images with the rich spectral information of hyperspectral (HS) data. Beyond these, the hyperspectral data unlocks even more possibilities, enabling the derivation of multispectral (MS) and spectroscopic modalities for further analysis of seed characteristics.

Experiments

Baselines

First, we trained Linear Regression, Decision Tree, Random Forest, and Multi-Layer Perceptron (MLP) models on the spectroscopic version of the BiSID-5k dataset to evaluate classification performance. Additionally, we trained various ResNet architectures (ResNet-18, ResNet-34, and ResNet-50) on the RGB, multispectral (MS), and hyperspectral (HS) modalities of the BiSID-5k dataset for performance evaluation.

Impact of different spectral and spatial resolutions

To measure impact of spectral and spatial resolutions on the classification performance, we trained each ResNet model on the BiSID-5k dataset with different spectral and spatial resolutions.

Acknowledgements

We want to thank Denys Chaldykin (NPZ Innovation GmbH) for operating the seed image acquisition pipeline and seed handling. This work is supported by funds from the German Federal Ministry of Food and Agriculture (BMEL), based on a decision of the Parliament of the Federal Republic of Germany. The German Federal Office for Agriculture and Food (BLE) provides coordinating support for artificial intelligence (AI) in agriculture as the funding organisation, grant number 28DK116C20. The manuscript was created as part of the research project "KIRa - KI-gestützte Plattform zur Klassifikation und Sortierung von Pflanzensamen: Bewertung der Saatgutreinheit am Musterfall Raps" (engl. "AI-supported platform for classifying and sorting plant seeds: Evaluation of seed purity using oilseed rape as a model case"). Further information about the project can be found at npz-innovation.de/projectKIRA.html.

Citation


      @misc{kukushkin2025bisid5k_dataset,
      doi = {10.25532/OPARA-810},
      url = {https://opara.zih.tu-dresden.de/handle/123456789/1410},
      author = {Kukushkin, Maksim and Bogdan, Martin and Goertz, Simon and Callsen, Jan-Ole and Oldenburg, Eric and Enders,
      Matthias and Schmid, Thomas},
      title = {BiSID-5k: A Bimodal Image Dataset for Seed Classification from the Visible and Near-Infrared Spectrum},
      publisher = {Universität Leipzig},
      year = {2025},
      copyright = {Creative Commons Attribution 4.0 International}
      }