By combining transformer-based amodal segmentation models with diffusion model–generated synthetic data, the study achieved remarkable accuracy in fruit size estimation. The transformer-driven Amodal Mask2Former model reduced size estimation errors by nearly 50 %, setting a new benchmark for automated phenotyping under complex occlusion conditions and paving the way for smarter greenhouse automation.
Accurately estimating fruit size directly on plants is essential for precision agriculture, enabling data-driven crop management and improving yield prediction. Traditional fruit detection and measurement in greenhouses remain challenging due to leaf occlusion, particularly in creeping cultivation systems where manual monitoring is labor-intensive. Although convolutional neural networks (CNNs) have long dominated agricultural image analysis, they often fail to infer occluded fruit regions.
Recently, transformer-based vision architectures—originally developed for natural language processing—have demonstrated exceptional capacity for image understanding, motivating their use in crop phenotyping. Simultaneously, generative diffusion models have emerged as powerful tools for data augmentation, capable of producing realistic and diverse training images.
Due to these challenges and opportunities, accurate estimation of fruit size under occlusion requires integrating advanced vision transformers and generative modeling techniques.
Read more at News Wise