Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, supplemental material.

  • Chen M Zhang K He Z Jing Y Wang X (2024) RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search Proceedings of the VLDB Endowment 10.14778/3681954.3681959 17 :11 (2735-2749) Online publication date: 1-Jul-2024 https://dl.acm.org/doi/10.14778/3681954.3681959
  • Wei J Peng B Lee X Palpanas T (2024) DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search Proceedings of the VLDB Endowment 10.14778/3665844.3665854 17 :9 (2241-2254) Online publication date: 1-May-2024 https://dl.acm.org/doi/10.14778/3665844.3665854
  • Gao J Long C (2024) RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search Proceedings of the ACM on Management of Data 10.1145/3654970 2 :3 (1-27) Online publication date: 30-May-2024 https://dl.acm.org/doi/10.1145/3654970
  • Show More Cited By

Index Terms

Information systems

Data management systems

Database management system engines

Database query processing

Query optimization

Information systems applications

Data mining

Nearest-neighbor search

Recommendations

Complementary hashing for approximate nearest neighbor search.

Recently, hashing based Approximate Nearest Neighbor (ANN) techniques have been attracting lots of attention in computer vision. The data-dependent hashing methods, e.g., Spectral Hashing, expects better performance than the data-blind counterparts, e.g.,...

Efficient approximate nearest neighbor search with integrated binary codes

Nearest neighbor search in Euclidean space is a fundamental problem in multimedia retrieval. The difficulty of exact nearest neighbor search has led to approximate solutions that sacrifice precision for efficiency. Among such solutions, approaches that ...

Order preserving hashing for approximate nearest neighbor search

In this paper, we propose a novel method to learn similarity-preserving hash functions for approximate nearest neighbor (NN) search. The key idea is to learn hash functions by maximizing the alignment between the similarity orders computed from the ...

Information

Published in.

cover image Proceedings of the ACM on Management of Data

UC Santa Barbara, United States

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, author tags.

  • $au$-monotonic
  • edge occlusion rule
  • proximity graph
  • Research-article

Funding Sources

  • Hong Kong RGC
  • NSF of Hunan Province
  • NSF of Guangdong Province

Contributors

Other metrics, bibliometrics, article metrics.

  • 4 Total Citations View Citations
  • 688 Total Downloads
  • Downloads (Last 12 months) 616
  • Downloads (Last 6 weeks) 57
  • Peng Y Lin S Chen Q Wang S Xu L Ren X Li Y Xu J (2024) ChatGraph: Chat with Your Graphs 2024 IEEE 40th International Conference on Data Engineering (ICDE) 10.1109/ICDE60146.2024.00424 (5445-5448) Online publication date: 13-May-2024 https://doi.org/10.1109/ICDE60146.2024.00424
  • Song Y Wang K Yao B Chen Z Xie J Li F (2024) Efficient Reverse $k$ Approximate Nearest Neighbor Search Over High-Dimensional Vectors 2024 IEEE 40th International Conference on Data Engineering (ICDE) 10.1109/ICDE60146.2024.00325 (4262-4274) Online publication date: 13-May-2024 https://doi.org/10.1109/ICDE60146.2024.00325

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

best research papers on nearest neighbor search

  • Automated reasoning
  • Cloud and systems
  • Computer vision
  • Conversational AI
  • Information and knowledge management
  • Machine learning
  • Operations research and optimization
  • Quantum technologies
  • Search and information retrieval
  • Security, privacy, and abuse prevention
  • Sustainability
  • Publications
  • Conferences
  • Code and datasets
  • Academics at Amazon
  • Amazon Research Awards
  • Amazon Trusted AI Challenge
  • Research collaborations

More-efficient approximate nearest-neighbor search

New approach speeds graph-based search by 20% to 60%, regardless of graph construction method..

https://www.amazon.science/blog/more-efficient-approximate-nearest-neighbor-search

Many of today’s machine learning (ML) applications involve nearest-neighbor search: data are represented as points in a high-dimensional space; a query (say, a photograph or text string to be matched to a data point) is embedded in that space; and the data points closest to the query are retrieved as candidate solutions.

Often, however, computing the distance between the query and every point in the dataset is prohibitively time consuming, so model builders instead use approximate nearest-neighbor search techniques. One of the most popular of these is graph-based approximation, in which the data points are organized into a graph . The search algorithm traverses the graph, regularly updating a list of the points nearest the query that it has encountered so far.

Hache-cache.jpeg

In a paper we presented at this year’s Web Conference, we describe a new technique that makes graph-based nearest-neighbor search much more efficient. The technique is based on the observation that, when calculating the distance between the query and points that are farther away than any of the candidates currently on the list, an approximate distance measure will usually suffice. Accordingly, we propose a method for computing approximate distance very efficiently and show that it reduces the time required to perform approximate nearest-neighbor search by 20% to 60%.

Graph-based search

Broadly speaking, approximate k -nearest-neighbor search algorithms — which find the k neighbors nearest the query vector — fall into three categories: quantization methods, space-partitioning methods, and graph-based methods. On several benchmark datasets, graph-based methods have yielded the best performance so far.

Given the embedding of a query, q , graph-based search picks a point in the graph, c , and explores all its neighbors — that is, the nodes with which it shares edges. The algorithm calculates those nodes’ distance from the query and adds the closest ones to the list of candidates. Then, from those candidates, it selects the one closest to the query and explores its neighbors, updating the list as necessary. This procedure continues until the distances between the unexplored graph nodes and the query vector begin increasing — an indication that the algorithm is leaving the neighborhood of the true nearest neighbor.

Past research on graph-based approximation has concentrated on methods for assembling the underlying graph. Some methods, for instance, add connections between a given node and distant nodes, to help ensure that the search doesn’t get stuck in a local minimum; some methods concentrate on pruning highly connected nodes to prevent the same node from being visited over and over. Each of these methods has its advantages, but none is a clear winner across the board.

We instead focus on a technique that will work with all graph construction methods, since it increases the efficiency of the search process itself. We call that technique FINGER, for f ast in ference for g raph-based approximated nearest neighbor s e a r ch.

Approximating distance

Consider the case of a query vector, q , a node whose neighbors are being explored, c , and one of c ’s neighbors, d , whose distance from q we wish to compute.

kNNS.jpeg

Both q and d can be represented as the sums of projections along c and “residual vectors” perpendicular to c . This is, essentially, to treat c as a basis vector of the space.

If the algorithm is exploring neighbors of c , that means it has already calculated the distance between c and q . In our paper, we show that, if we take advantage of that existing calculation, along with certain manipulations of node vectors’ values, which can be precomputed and stored, estimating the distance between q and d is simply a matter of estimating the angle between their residual vectors.

And that angle, we argue, can be reasonably approximated from the angles between the residual vectors of c ’s immediate neighbors — those that share edges with c in the graph. The idea is that, if q is close enough to c that c is worth exploring, then if q were part of the graph , it would probably be one of c ’s nearest neighbors. Consequently, the relationships between the residual vectors of c ’s other neighbors tell us something about the relationships between the residual vector of one of those neighbors — d — and q ’s residual vector.

To evaluate our approach, we compared FINGER’s performance to that of three prior graph-based approximation methods on three different datasets. Across a range of different recall10@10 rates — or the rate at which the model found the query’s true nearest neighbor among its 10 top candidates — FINGER searched more efficiently than all of its predecessors. Sometimes the difference was quite dramatic — 50%, on one dataset, at the high recall rate of 98%, and almost 88% on another dataset, at the recall rate of 86%.

  • The Web Conference

Related content

esci-schema.jpg

Work with us

View from space of a connected network around planet Earth representing the Internet of Things.

Subscribe to the PwC Newsletter

Join the community, edit social preview.

best research papers on nearest neighbor search

Add a new code entry for this paper

Remove a code repository from this paper.

best research papers on nearest neighbor search

Mark the official implementation from paper authors

Add a new evaluation result row.

TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK REMOVE

Remove a task

Add a method, remove a method, edit datasets, efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.

30 Mar 2016  ·  Yu. A. Malkov , D. A. Yashunin · Edit social preview

We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional search structures, which are typically used at the coarse search stage of the most proximity graph techniques. Hierarchical NSW incrementally builds a multi-layer structure consisting from hierarchical set of proximity graphs (layers) for nested subsets of the stored elements. The maximum layer in which an element is present is selected randomly with an exponentially decaying probability distribution. This allows producing graphs similar to the previously studied Navigable Small World (NSW) structures while additionally having the links separated by their characteristic distance scales. Starting search from the upper layer together with utilizing the scale separation boosts the performance compared to NSW and allows a logarithmic complexity scaling. Additional employment of a heuristic for selecting proximity graph neighbors significantly increases performance at high recall and in case of highly clustered data. Performance evaluation has demonstrated that the proposed general metric space search index is able to strongly outperform previous opensource state-of-the-art vector-only approaches. Similarity of the algorithm to the skip list structure allows straightforward balanced distributed implementation.

Code Edit Add Remove Mark official

Tasks edit add remove, datasets edit.

best research papers on nearest neighbor search

Results from the Paper Edit Add Remove

Methods edit add remove.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

📚 Awesome papers and technical blogs on vector DB (database), semantic-based vector search or approximate nearest neighbor search (ANN Search, ANNS). Vector search is the key component of large-scale information retrieval, cross-modal retrieval, LLMs-based RAG, vector databases.

matchyc/vector-search-papers

Folders and files.

NameName
18 Commits

Repository files navigation

Vector search, approximate nearest neighbor search papers.

License

A curated collection of awesome papers in the field of vector search, known as approximate nearest neighbor search (ANN search, ANNS). This repository aims to gather high-quality research papers, articles, and resources that provide valuable insights and advancements. This technology is a critical component in vector databases, retrieval-augmented generation (RAG), large-scale information retrieval, recommendation systems, drug discovery, image search, etc.

The latest update: 2024-8-13

Table of contents, what is vector search and its applications.

First of all, what is vector search, and why is it so important in the booming age of AI?

simple explanation:

  • what-is-vector-search
  • a-gentle-introduction-to-vector-search
  • Explanation in Quora
  • k-nn-vs-approximate-nearest-neighbors

Applications:

  • 5-use-cases-for-vector-search
  • Introduction to Vector Search for Developers
Title Url High-Level Category Remarks
On Efficient Retrieval of Top Similarity Vectors MIPS MIPS for top-1
In-Storage Acceleration of Graph-Traversal-Based Approximate Nearest Neighbor Search NAND-Flash acceleration Using storage compute
DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries multi-vector
Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement Survey
Graph-based Nearest Neighbor Search: From Practice to Theory Theoretical
FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search Graph-based
HVS: hierarchical graph structure based on Voronoi diagrams for solving approximate nearest neighbor search Graph-based
DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node Graph-based SSD-based
Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs Graph-based
SONG: Approximate Nearest Neighbor Search on GPU Graph-based
Graph-based Nearest Neighbor Search: Promises and Failures Graph-based
Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination Graph-based
A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search Survey
Fast approximate nearest neighbor search with the navigating spreading-out graph Graph-based
Non-metric Similarity Graphs for Maximum Inner Product Search Graph-based
Understanding and Improving Proximity Graph-based Maximum Inner Product Search Graph-based
Learning to Route in Similarity Graphs Graph-based+DeepLearning(GCN)
Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data Graph-based
Fast Approximate Nearest Neighbor Search with a Dynamic Exploration Graph using Continuous Refinement Graph-based
Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases Graph-based
Scaling Graph-Based ANNS Algorithms to Billion-Size Datasets: A Comparative Analysis Graph-based
SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search Graph-Tree-based SSD-based
Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search Graph-based
Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search Graph-based
Fusion of graph-based indexing and product quantization for ANN search Graph-based
Towards Efficient Index Construction and Approximate Nearest Neighbor Search in High-Dimensional Spaces Graph-based
Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data Graph-based
Scaling Graph-Based ANNS Algorithms to Billion-Size Datasets: A Comparative Analysis Survey
Automating Nearest Neighbor Search Configuration with Constrained Optimization Learning
Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation Graph-based
Norm Adjusted Proximity Graph for Fast Inner Product Retrieval Graph-based
On Efficient Retrieval of Top Similarity Vectors Graph-based
SONG: Approximate Nearest Neighbor Search on GPU GPU
RTNN: Accelerating Neighbor Search Using Hardware Ray Tracing GPU
Billion-scale similarity search with GPUs GPU
Fast neural ranking on bipartite graph indices Neural Rank
Fast Item Ranking under Neural Network based Measures Neural Rank
Non-metric Similarity Graphs for Maximum Inner Product Search MIPS
Möbius Transformation for Fast Inner Product Search on Graph MIPS
Understanding and Improving Proximity Graph-based Maximum Inner Product Search MIPS
Reinforcement Routing on Proximity Graph for Efficient Recommendation Learning
From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective Learning
Constructing Tree-based Index for Efficient and Effective Dense Retrieval Learning
Reverse Maximum Inner Product Search: Formulation, Algorithms, and Analysis MIPS
FARGO: Fast Maximum Inner Product Search via Global Multi-Probing LSH
SRS: solving -approximate nearest neighbor queries in high dimensional Euclidean space with a tiny index LSH
From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective LSH
LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index LSH
HD-index: pushing the scalability-accuracy boundary for approximate kNN search in high-dimensional spaces LSH
Falconn++: A Locality-sensitive Filtering Approach for Approximate Nearest Neighbor Search LSH
Deep Semantic-Preserving Ordinal Hashing for Cross-Modal Similarity Search LSH
Supervised Hierarchical Deep Hashing for Cross-Modal Retrieval LSH
A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search Survey
Transformer Memory as a Differentiable Search Index Model-as-Index
Recommender Systems with Generative Retrieval Model-as-Index
SPREADING VECTORS FOR SIMILARITY SEARCH Learning + Dimensionality Reduction
Model-enhanced Vector Index Fusion Retrieval
GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning Prune edges with learning
Low-Precision Quantization for Efficient Nearest Neighbor Search scalar quantization

Please note that some entries may require access or membership to view the full content.

How to Contribute

We welcome contributions to expand and improve this collection. If you have any papers or resources that you believe should be included, please follow these guidelines:

  • Fork the repository.
  • Add your paper/resource to the appropriate category or create a new category if needed.
  • Include a link to the paper/resource (if available) or any relevant information.
  • Submit a pull request.

MIT license.

best research papers on nearest neighbor search

SOAR: New algorithms for even faster vector search with ScaNN

April 10, 2024

Philip Sun and Ruiqi Guo, Software Engineers, Google Research

Quick links

  • Vertex AI Vector Search
  • ScaNN for AlloyDB index
  • Copy link ×

Efficient vector similarity search is critical for many machine learning (ML) applications at Google. It’s commonly used to search over embeddings , which are vector representations of real-world entities, such as images, websites, or other media. These vector representations allow computers to mathematically compare and find similarities between objects, for example grouping together portraits of the same subject. Once the dataset of embeddings becomes too large for the brute-force approach of comparing the query to every embedding in the dataset, more efficient vector similarity search methods become necessary for further scaling.

The desire to highlight Google’s innovations in vector search algorithms are what led to the open-sourcing of the ScaNN vector search library in 2020 (more in this prior Research Blog post ). Since then, larger datasets and the popularization of vector search in novel use cases (such as retrieval-augmented generation ) have both driven demand for more scalable vector search algorithms.

ScaNN has been actively maintained and improved since its release, but today we are particularly excited to announce a major algorithmic advancement to ScaNN: Spilling with Orthogonality-Amplified Residuals (SOAR). In “ SOAR: Improved Indexing for Approximate Nearest Neighbor Search ,” presented at NeurIPS 2023 , we describe how introducing some redundancy to ScaNN’s vector index leads to improved vector search efficiency, all while minimally impacting index size and other key vector index metrics. SOAR helps ScaNN step up to meet the ever-growing scaling and performance demands placed upon vector search libraries.

Originally featured in the 2020 ScaNN open-source release blog post , the above illustration shows how vector search over embeddings can help answer natural-language queries over a hypothetical literary database. Today we are showcasing research that allows ScaNN to perform vector search even faster.

SOAR and redundancy

The key intuition to SOAR is to introduce some mathematically crafted and implementation-optimized redundancy to ScaNN’s approximate vector search routine. Redundancy is the concept of using multiple replicas, which can each function as backup when another replica fails, to decrease the chance of total failure of a given system. Redundancy is used in many engineering disciplines, from having multiple systems that can adjust wing flaps to ensure an airplane’s stability, to storing copies of data across multiple physical drives to protect against data loss. On the other hand, redundancy may provide a false sense of security when its replicas are vulnerable to common, correlated failures — for example, storing multiple copies of data in the same datacenter still exposes them to the common threat of natural disaster in the area, and is therefore less robust than geographically distributed replicas of that data.

SOAR2-Map

One common example of redundancy is replicating data across multiple datacenters around the world to decrease the chance of data loss. SOAR uses redundancy to decrease the risk of failing to find a nearest-neighbor vector.

In the context of approximate vector search, “failure” means failing to find the true nearest neighbors for a query, and instead retrieving less similar vectors. In accordance with the above themes, SOAR’s redundancy makes it less likely for ScaNN to miss the nearest neighbors, and SOAR’s specific mathematical formulation, discussed below, minimizes correlated failures that would otherwise harm search efficiency. As a result, SOAR improves the search accuracy achievable at a fixed search cost, or equivalently decreases search cost needed to achieve the same search accuracy.

SOAR: Mathematical details

We first describe ScaNN’s approximate search paradigm without SOAR in order to highlight SOAR’s differences. We focus on how ScaNN solves the maximum inner product search (MIPS) variant of vector similarity search, which defines vector similarity as the query’s inner product with a database vector. This is both because SOAR targets MIPS and because MIPS is adaptable to a variety of other similarities, including cosine and Euclidean distance.

Before ScaNN can answer vector search queries, it first goes through an indexing phase, where the input dataset is pre-processed to construct data structures that facilitate efficient vector search. A crucial part of this indexing step is clustering via k -means ; before SOAR, the clustering was performed such that each vector in the dataset was assigned to exactly one k -means cluster. Each cluster has a cluster center, which acts as a coarse approximation for the vectors assigned to the cluster; the vectors in a cluster are only evaluated if their cluster center is among the N closest centers to the query vector (with N a parameter; higher N gives greater search accuracy at the expense of greater computational cost).

An animation demonstrating how clustering was performed prior to SOAR to perform efficient vector search in ScaNN.

To illustrate when such an algorithm has difficulty finding the nearest neighbors, consider a vector x assigned to a cluster with center c . Denote the difference between x and c as the residual, or r . For a query q , the difference between the query-vector similarity ⟨ q , x ⟩ and the estimated similarity ⟨ q , c ⟩ is ⟨ q , x ⟩ - ⟨ q , c ⟩ = ⟨ q , x - c ⟩ = ⟨ q , r ⟩, which is maximized when r is parallel to q , as illustrated below:

SOAR4-UnderestimateTri

When the query q is parallel to r, the estimated inner product ⟨q, c⟩ greatly underestimates the true inner product ⟨q, x⟩. SOAR helps find the nearest neighbors in these cases, which otherwise tend to diminish vector search accuracy.

In such situations, if x is a nearest neighbor to q , x is typically hard to find, because even though ⟨ q , x ⟩ is high, ⟨ q , r ⟩ is also high, resulting in the query-center similarity ⟨ q , c ⟩ being low, so this particular cluster is likely to be pruned and not searched further. There are a number of ways to mitigate this problem. For example, computing a higher-quality clustering tends to decrease the magnitude of r, which lowers the average estimation error, while anisotropic vector quantization (AVQ) “shapes” the error such that it tends to be largest when the query is dissimilar to x , and therefore less likely to impact results.

SOAR addresses this issue by taking a completely different approach: allowing vectors to be assigned to more than one cluster. Intuitively, this is effective by the principle of redundancy: secondary assignments may act as “backup clusters” that facilitate efficient, accurate vector search when the primary assignment performs poorly (when q is highly parallel with r of the primary assignment).

This redundancy arises from the fact that the second assignment provides a new vector-center difference r ’. As long as this r ’ isn’t near-parallel with q when r is near-parallel with q , this secondary center should help ScaNN locate the nearest neighbors to q . However, SOAR goes a step further than this naïve redundancy, and modifies the assignment loss function for secondary assignments to explicitly optimize for independent , effective redundancy: it aims to find secondary clusters whose r ’ are perpendicular to r , so that when q is near-parallel to r and the primary center has high error, q will be near-orthogonal to r ’ and the secondary center will have low error. The effect of SOAR’s modified loss is visualized below:

SOAR not only introduces redundancy via secondary assignments, but does so via a modified loss to maximize the effectiveness of redundancy. Unmodified loss (shown first) often leads to ineffective secondary assignments like c’ with high error. SOAR’s modified loss (later) selects c’’, a much better choice.

There are additional details to SOAR’s implementation and theoretical analysis, but the core idea behind SOAR is as simple as described above: assign vectors to multiple clusters with a modified loss. This technique is named SOAR because prior research has described multiple-assignment as “spilling,” and the modified loss encourages orthogonal (perpendicular) residuals, leading to the name Spilling with Orthogonality-Amplified Residuals.

Experimental results

SOAR enables ScaNN to maintain its existing advantages, including low memory consumption, fast indexing speed, and hardware-friendly memory access patterns, while endowing ScaNN with an additional algorithmic edge. As a result, ScaNN makes the best tradeoff among the three major metrics for vector search performance, as highlighted below for the ann-benchmarks glove-100 dataset . The only libraries to come near ScaNN’s querying speed require over 10× the memory and 50× the indexing time. Meanwhile, ScaNN achieves querying throughputs several times higher than libraries of comparable indexing time.

SOAR6-ResultsHero

ScaNN with SOAR makes by far the best query speed / indexing speed trade-off among all libraries benchmarked, all while having the smallest memory footprint.

ScaNN also achieves state-of-the-art performance in the two tracks of Big-ANN 2023 benchmarks to which it’s applicable. These benchmarks involve larger datasets, which, in fact, further increase SOAR’s efficacy. The effect of dataset size on SOAR, as well as benchmarks on up to billion-vector datasets, are all discussed further in the paper .

SOAR7-ResultsBar

For both the out-of-distribution and streaming tracks of Big-ANN 2023, SOAR allows ScaNN to achieve the highest result in each track’s respective ranking metric. The website’s leaderboard is pending update here .

SOAR provides ScaNN a robust “backup” route to identify nearest neighbors when ScaNN’s traditional clustering-based approach has the most difficulty. This allows ScaNN to perform even faster vector search, all while maintaining low index size and indexing time, leading to the all-around best set of tradeoffs among vector search algorithms.

We invite the community to use ScaNN to solve its vector search challenges. ScaNN is open-sourced on GitHub and can be easily installed via Pip . In addition, ScaNN vector search technology is available in Google Cloud products: Vertex AI Vector Search leverages ScaNN to offer a fully managed, high-scale, low-latency, vector similarity matching service, and AlloyDB recently launched ScaNN for AlloyDB index — a vector database on top of the popular PostgreSQL-compatible database. We are excited to see how more efficient vector search enables the next generation of machine learning applications.

Acknowledgements

This post reflects the work of the entire ScaNN team: David Simcha, Felix Chern, Philip Sun, Ruiqi Guo, Sanjiv Kumar and Zonglin Li. We’d also like to thank Apurv Suman, Dave Dopson, John Guilyard, Rasto Lenhardt, Susie Flynn, and Yannis Papakonstantinou.

  • Algorithms & Theory
  • Conferences & Events
  • Data Mining & Modeling

Other posts of interest

best research papers on nearest neighbor search

August 16, 2024

  • Data Mining & Modeling ·
  • Machine Intelligence ·

best research papers on nearest neighbor search

June 4, 2024

best research papers on nearest neighbor search

May 30, 2024

  • Conferences & Events ·
  • Natural Language Processing

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Comparative Analysis Between K-Nearest Neighbor (KNN) and Deep Learning Classifiers for Emotion Classification in Virtual Reality Using Electrodermography (EDG) and Heart Rate

  • Conference paper
  • First Online: 03 September 2024
  • Cite this conference paper

best research papers on nearest neighbor search

  • Aaron Frederick Bulagang 40 ,
  • James Mountstephens 41 &
  • Jason Teo 42  

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1199))

Included in the following conference series:

  • International Conference on Advances in Computational Science and Engineering

Virtual Reality as a stimulus to evoke emotion in emotion recognition studies has been emerging in recent years, the objective of this research is to do a comparative analysis between classifiers such as Deep learning and KNN with Heart Rate and electrodermography signals. The difference in classifiers applied plays an important part when it comes to generating the result in accuracy. In this research, 30 participants volunteered in the experiment of using VR as stimuli while their HR and EDG signals were recorded. The recorded signals are then classified using KNN and Deep Learning classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Liao et al (2019) Arousal evaluation of VR affective scenes based on HR and SAM. In: IEEE MTT-S 2019 international microwave biomedical conference IMBioC 2019—Proceedings

Google Scholar  

Schuurmans et al (2020) Validity of the Empatica E4 wristband to measure heart rate variability (HRV) parameters: a comparison to electrocardiography (ECG). J Med Syst 44(11)

Ali et al (2018) Emotion recognition involving physiological and speech signals: a comprehensive review. In: Studies in systems, decision and control, vol 109. Springer International Publishing, pp 287–302

Gruden et al (2019) Assessing drivers’ physiological responses using consumer grade devices. Appl Sci 9(24)

Alarcao et al (2017) Emotions recognition using EEG signals: a survey. IEEE Trans Affect Comput 10(3):374–393

Menezes et al (2017) Towards emotion recognition for virtual environments: an evaluation of EEG features on benchmark dataset. Pers Ubiquitous Comput 21(6):1003–1013

Egger et al (2019) Emotion recognition from physiological signal analysis: a review. Electron Notes Theor Comput Sci 343:35–55

Download references

Acknowledgements

The Fundamental Research Grant Scheme (FRGS) from the Ministry of Higher Education, Malaysia (Kementerian Pengajian Tinggi, Malaysia) has funded this work [grant reference FRGS/1/2019/ICT02/UMS/01/1 (FRG0512)].

Author information

Authors and affiliations.

Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia

Aaron Frederick Bulagang

Creative Advanced Machine Intelligence Research Centre, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia

James Mountstephens

Evolutionary Computing Laboratory, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Jason Teo .

Editor information

Editors and affiliations.

Technology Park Malaysia, Asia Pacific University of Technology and Innovation, Kuala Lumpur, Malaysia

Vinesh Thiruchelvam

Faculty of Computing and Informatics, Creative Advanced Machine Intelligence Research Centre, Universiti Malaysia Sabah, Kota Kinabalu, Malaysia

Rayner Alfred

Higher Colleges of Technology, Abu Dhabi, Abu Dhabi, United Arab Emirates

Zamhar Iswandono Bin Awang Ismail

Department of Informatics, Mulawarman University, Samarinda, Indonesia

Haviluddin Haviluddin

School of Engineering and Technology, Sunway University, Petaling Jaya, Selangor, Malaysia

Aslina Baharum

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Bulagang, A.F., Mountstephens, J., Teo, J. (2024). Comparative Analysis Between K-Nearest Neighbor (KNN) and Deep Learning Classifiers for Emotion Classification in Virtual Reality Using Electrodermography (EDG) and Heart Rate. In: Thiruchelvam, V., Alfred, R., Ismail, Z.I.B.A., Haviluddin, H., Baharum, A. (eds) Proceedings of the 4th International Conference on Advances in Computational Science and Engineering. ICACSE 2023. Lecture Notes in Electrical Engineering, vol 1199. Springer, Singapore. https://doi.org/10.1007/978-981-97-2977-7_41

Download citation

DOI : https://doi.org/10.1007/978-981-97-2977-7_41

Published : 03 September 2024

Publisher Name : Springer, Singapore

Print ISBN : 978-981-97-2976-0

Online ISBN : 978-981-97-2977-7

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Corpus ID: 41762443

An Approach to Improving Nearest Neighbor Search

  • Published 2015
  • Computer Science

Figures from this paper

figure 1

27 References

What is the nearest neighbor in high dimensional spaces, efficient reverse k-nearest neighbor search in arbitrary metric spaces, similarity search: a matching based approach, efficient similarity search and classification via rank aggregation, similarity search in high dimensions via hashing, exploring bit-difference for approximate knn search in high-dimensional databases, optimal multi-step k-nearest neighbor search, towards meaningful high-dimensional nearest neighbor search by human-computer interaction, on the surprising behavior of distance metrics in high dimensional spaces, sting: a statistical information grid approach to spatial data mining, related papers.

Showing 1 through 3 of 0 Related Papers

  • Open access
  • Published: 05 September 2024

Nondestructive detection of saline-alkali stress in wheat ( Triticum aestivum L.) seedlings via fusion technology

  • Ying Gu 1 , 2 ,
  • Guoqing Feng 2 ,
  • Peichen Hou 2 ,
  • Yanan Zhou 2 ,
  • He Zhang 2 , 3 ,
  • Xiaodong Wang 2 ,
  • Bin Luo 2 &
  • Liping Chen 1 , 2  

Plant Methods volume  20 , Article number:  136 ( 2024 ) Cite this article

Metrics details

Wheat ( Triticum aestivum L.) is an important grain crops in the world, and its growth and development in different stages is seriously affected by saline-alkali stress, especially in seedling stage. Therefore, nondestructive detection of wheat seedlings under saline-alkali stress can provide more comprehensive technical support for wheat breeding, cultivation and management.

This research focused on moisture signal prediction and classification of saline-alkali stress in wheat seedlings using fusion techniques. After collecting and analyzing transverse relaxation time and Multispectral imaging (MSI) information of wheat seedlings, four regression models were used to predict the moisture signal. K-Nearest Neighbor (KNN) and Gaussian-Naïve Bayes (GNB) models were combined with fivefold cross validation to classify the prediction of wheat seedling stress. The results showed that wheat seedlings would increase the bound water content through a certain mechanism to enhance their saline-alkali stress. Under the same Na concentration, the effect of alkali stress on moisture, growth and spectrum of wheat seedlings is stronger than salt stress. The Gradient Boosting Decision Regression Tree model performs the best in predicting wheat moisture signals, with a coefficient of determination (R2P) of 0.98 and a root mean square error of 109.60. It also had a short training time (1.48 s) and an efficient prediction speed (1300 obs/s). The KNN and GNB demonstrated significantly enhanced predictive performance when classifying the fused dataset, compared to using single datasets individually. In particular, the GNB model performing best on the fused dataset, with Precision, Recall, Accuracy, and F1-score of 90.30, 88.89%, 88.90%, and 0.90, respectively.

Conclusions

Under the same Na concentration, the effects of alkali stress on water content, spectrum, and growth of wheat were stronger than that of salt stress, which was more unfavorable to the growth of wheat. The fusion of low-field nuclear magnetic resonance and MSI technology can improve the classification of wheat stress, and provide an effective technical method for rapid and accurate monitoring of wheat seedlings under saline-alkali stress.

Graphical Abstract

best research papers on nearest neighbor search

Introduction

Saline-alkali stress caused by soil salinization, as a prominent abiotic stressor, has a detrimental impact on crop growth, consequently affecting global agricultural economy [ 14 , 25 , 40 ]. Globally, over 1 billion hectares of land are affected by soil salinity, with more than 2 million hectares added annually [ 32 , 50 ]. Wherein, as an important food crop, wheat ( Triticum aestivum L.) is also facing the negative effects of saline-alkali stress [ 19 , 49 ]. Saline-alkali stress leads to excessive accumulation of salt and alkaline substances in soil, which affects the growth and physiological metabolism of wheat seedlings, and then affects the yield and quality of crops. In China, most wheat regions are in the period of soil salinity return following the emergence of wheat seedlings, and the soil salt content reaches the maximum. The wheat seedling stage is the weakest stage of saline-alkali stress in wheat life, and the stress at this time has a significant impact on the growth of wheat seedlings [ 39 ]. Therefore, accurate and nondestructive detection of wheat seedling response to saline-alkali stress is of great significance for effectively evaluating crop growth status and formulating reasonable saline alkali resistance strategies.

In the existing studies, traditional physiological and biochemical analysis methods are usually used to detect and evaluate the growth status of crops under saline alkali stress [ 22 , 36 ]. For example, the growth status of crops is evaluated by measuring chlorophyll content, membrane permeability, soluble sugar content, proline content and other indicators [ 9 , 33 ]. These methods with high accuracy and specificity can directly reflect the physiological and biochemical state of crops, and are of great value for understanding the mechanism of crop response to saline alkali stress and formulating corresponding management measures [ 36 ]. However, these methods have some limitations in reflecting crop response, such as destructive, time-consuming and laborious, and unable to carry out continuous monitoring [ 13 ]. Therefore, it is very important to seek an accurate, rapid and nondestructive technical method to monitor crops under saline-alkali stress.

In this context, this study combined low-field nuclear magnetic resonance (LF-NMR) and multispectral imaging (MSI) technology to achieve the nondestructive detection of wheat seedling moisture signal and prediction of saline-alkali stress. LF-NMR is a fast, accurate and nondestructive method, which uses the spin relaxation characteristics of hydrogen nuclei in the magnetic field to explain the distribution and migration of water in the sample [ 6 , 26 , 30 ]. Moreover, MSI is a newly developed technology that combines spectroscopy and traditional imaging to simultaneously gather spectral and spatial information [ 46 ]. They were developed to measure the morphological characteristics of the inspected objects initially and have been widely used for crop visualization [ 31 ]. The fast, accurate, and nondestructive characteristics of LF-NMR and MSI technologies have all crop stakeholders eagerly awaiting the introduction of these novel detection techniques to boost crop productivity overall. LF-NMR provides the moisture content and distribution state of crops from the microscopic perspective, while MSI technology provides a wider range of spectral information, which can reflect the overall growth state of crops. By combining the two technologies, we aim to fill the existing technical gap in the field of crop growth state assessment under saline-alkali stress, and provide a new and efficient monitoring method for crop growth status.

The study involved the following three purposes: (1) To analyze the moisture phase state and multispectral information of wheat seedlings under saline-alkali stress by LF-NMR and MSI technology, and explore the application potential of two nondestructive detection methods under saline-alkali stress in wheat. (2) To carry out the regression prediction of moisture signal quantity of wheat seedlings under saline-alkali stress based on MSI data, and compare the performance of different regression models in predicting moisture signal amplitude. (3) To classify wheat seedlings under different saline-alkali stress through different models, and evaluate the performance of different models in the accuracy of sample classification by fusing LF-NMR and MSI datasets. This work provided a new insight into monitoring the wheat growth status under saline-alkali stress, and the combination LF-NMR and multispectral imaging opened up new possibilities for improving crop resistance to multiple environmental stresses.

Materials and methods

Plant material and experimental design.

The experimental wheat material used in this research was Jimai 22, which is widely planted in China. The experimental samples were screened in terms of similar size, quality, with no surface damage. Before the experiment, the wheat was disinfected with 75% alcohol for 5 min and rinsed 3 times with distilled water. Wheat seeds were treated with different stress treatments from the germination stage. After 5 days of germination, the wheat was transplanted into a black hydroponics box with different stress treatment solutions for culture. On the basis of previous studies, the concentration with high stress and low mortality was selected, and the concentration suitable for saline-alkali stress identification in wheat seedling stage was 100 mmol/L [ 20 ]. Three experimental treatments were set as control (CK, distilled water), A [100 mmol/L neutral salt (NaCl: Na 2 SO 4  = 9:1, pH = 6.68)], and B [100 mmol/L alkaline salt (NaHCO 3 : Na 2 CO 3  = 9:1, pH = 8.9)], and there were four replications for each treatment. The wheat seedlings were incubated in an MGC Series Intelligent Light incubator (MGC-450BP-2L, Shanghai Yiheng Scientific Instruments Co., Ltd., Shanghai, China) at 25 ± 1 °C with 80% relative humidity and 12 h/12 h alternating light and dark period. Relevant indicators of different treatment groups of wheat were collected every 2 days.

T2 acquisition and processing

The LF-NMR instrument (AniMR, Shanghai Newmark Electronic Technology Co., Ltd., Shanghai, China) was used to collect the transverse relaxation time (T2) of LF-NMR. The basic parameters of the instrument were as follows: Magnetic field intensity: (0.25 ± 0.05) t; Resonance frequency: 8.5–12.8 MHz; Magnetic field uniformity: less than 10 ppm (φ60 mm × 100 mm); Magnet temperature: 32℃; Probe coil diameter: 15 mm. The Carr–Purcell–Meiboom–Gill (CPMG) sequence pulse sequence in the NMR spectrum analysis software was used to determine the T2 of the sample. According to the previous test results [ 7 , 15 ], the main sampling parameters were set as follows: 90° hard pulse width (P1) = 8 μs; 180° hard pulse width (P2) = 12 μs; Sampling frequency(SW) = 200 kHz; Analog gain (RG1) = 43.5 db; Digital gain (DRG1) = 3; Number of signal sampling points (TD) = 907,218; Repeated sampling times (NS) = 16; Waiting time for repeated sampling (TW) = 5000 μs; Echo number (NECH) = 18,000.

The LF-NMR spectrum analysis is a quantitative analysis and detection method using the spectral signal obtained by the Fourier transform of NMR signal. The CPMG collected from the experiment was imported into NMR spectrum inversion software, and the Simultaneous Iterative Reconstruction Technique (SIRT) was used for inversion operation. T2 was obtained after inversion. The inversion parameters were set as follows: minimum relaxation time: 0.01 ms; Maximum relaxation time: 10,000; Number of participating inversion points: 200; Number of iterations: 10,000. In order to eliminate the influence of inconsistent initial quality of test samples on the test results, all signal amplitude data were normalized, and then the data were imported into SPSS 23 for one-way analysis of variance. The OriginPro 2022 was used for drawing in this paper. During the test, the collection of CPMG was repeated three times, and the average value was taken. CPMG of wheat seedlings were collected on the 5th, 7th, 9th, 11th and 15th day of seedling growth. The instrument would be calibrated before each collection, and the water on the sample surface would be wiped gently with absorbent paper to avoid moisture affecting the results (Fig.  1 ).

figure 1

T2 relaxation data acquisition and analysis process

Multispectral image acquisition and processing

The multispectral images of all wheat seedling samples were taken by a VideometerLab 4 instrument (Videometer A/S, DK-2700 Herlev, Hørsholm 12B, 3.sal, Denmark). The instrument consists of a sphere containing 19 light emitting diodes in the wavelengths 375, 405, 435, 450, 470, 505, 525, 570, 590, 630, 645, 660, 700, 780, 850, 870, 890, 940 and 970 nm. All images were acquired in one sequence, with a resolution of 4096 × 3000 pixels, and a pixel size of 0.03 mm per pixel. Surface reflectance was recorded by the involvement of a standard monochrome charge coupled device chip [ 45 ]. Before acquiring multispectral images, the system was fully calibrated radiometrically and geometrically by using three successive plates: a white one for reflectance correction, a dark one for background correction and a doted one for geometric pixel position aligning calibration, followed by a light setup calibration [ 15 , 23 ].

The multispectral images obtained contained not only wheat seedlings but also some other interference, such as the background board and surrounding debris (Fig.  2 ). Therefore, it was necessary to remove these objects before extracting the spectral information of individual wheat seedlings. The images were processed using Videometerlab software version 3.22. Background removal in images of complete wheat seedlings was achieved through Normalized Canonical Discriminant Analysis (nCDA), and the seedlings were segmented using a simple threshold. Morphological traits and main spectral features were then extracted from the segmented wheat seedling images. The morphological traits were divided into shape features, color features, and binary features. The shape features included BetaShape_a, BetaShape_b, Compactness Circle, Compactness Ellipse, Vertical Orientation, and Vertical Skewness; color features included CIELab_A, CIELab_B, and CIELab_L; binary features included Area, Length, and Width. The interpretation of morphological shapes was listed in the supplementary file: Supplementary material.

figure 2

Multispectral data acquisition and analysis process

The extracted spectral features represented the average intensity of reflected light at each single wavelength, calculated from all the wheat seedling pixels in the images.

Determination of germination index

The 450 wheat seeds were selected and evenly divided into 9 groups. Each kind of solution (CK, A, and B) was used to cultivate 3 groups of seeds individually. Before the experiment, the wheat was disinfected with 75% alcohol for 5 min and then rinsed three times with distilled water. The wheat seeds were placed on germination paper that had been moistened with an adequate amount of the corresponding solution, ensuring that both the paper and the seeds were sufficiently dampened. The germination paper was changed daily, and the cultivation environment was maintained as described in “ Plant material and experimental design ”. The number of germinated seeds in each group was recorded daily until the 7th day. The germination rate, germination potential, germination index, and average germination time of the wheat were calculated, with the respective formulas shown below [ 28 ].

Detection model and performance evaluation

Prediction model of moisture signal quantity.

Four regression models including Gradient Boosting Regression Tree (GBRT), Support Vector Machine (SVM), Kernel Partial Least Squares Regression (KPLSR) and Back Propagation Neural Network (BPNN) were established to predict the moisture signal of wheat seedlings under saline-alkali stress using MSI data. Before establishing the quantitative model analysis, the correlation analysis between MSI data and signal amplitude A was established by using Spearman and Kendall algorithm, and the multispectral data with high correlation was selected as the input variable of the model.

For the performance evaluation of the regression model, six evaluation criteria were selected: determination coefficient of training set (R2 c ), corrected root mean square error (RMSEC), prediction determination coefficient (R2p), prediction root mean square error (RMSEP), training time (s), and predicting speed (obs/s). Wherein, R2 measures the proportion of variation explained by the model in the total variation, and the value range is from 0 to 1; The closer the R 2 value is to 1, the more variation in the data the model explains, indicating better model fitting. In addition, RMSE measures the prediction error of the model on the training set; The smaller the value of RMSE, the better the performance of the model on the training set. The training time and speed reflect the efficiency of the model in practical application. The calculation formula of R2 and RMSE were as follows.

where i is the data point, n is the number of data points, \(y_{i}\) is the actual value, \(\hat{y}_{i}\) is the predicted value, and \(\overline{y}\) the average value of the actual value.

Classification prediction model

In this study, we discussed the application effect of K-Nearest Neighbor (KNN) and Gaussian-Naïve Bayes (GNB) combined with fivefold cross validation method in the classification and prediction model of salt and alkali stress in wheat seedlings. We used three different datasets: MSI datasets, LF-NMR datasets, and fusion datasets of MSI and LF-NMR. At present, KNN and GNB have shown satisfactory results in the field of classification [ 10 ].

The prediction performance of the classification model was evaluated by four key indicators: Precision, Recall, Accuracy and F1-score. Precision is the ratio of true positive data to all predicted positive data, indicating the classifier’s ability to avoid labeling negative cases as positive. Recall is the ratio of true positive predictions to all actual positive data, indicating the classifier’s ability to identify all positive samples. Accuracy is the percentage of samples that are correctly classified by the model, reflecting the overall effectiveness of the classifier on the given dataset. The F1-score is a metric that combines the trade-off between Precision and Recall, providing a single number that reflects the effectiveness of a classifier, particularly in the presence of rare categories. It is calculated as the harmonic mean of Precision and Recall [ 7 , 21 ]. The four equations of the evaluating indicators were:

where TP, TN, FN, and FP are for true positive, true negative, false negative, and false positive, respectively.

Phenotypic analysis of wheat seedlings under saline-alkali stress

With the increase of culture time, the phenotypic characteristics of wheat seedlings were significantly different between CK, A and B group. The germination rate of CK group reached 95.23% at the 7th day, and the germination potential, germination index and average germination days were significantly different from A and B group (Table  1 ). From the image, wheat leaves became thin and short due to saline alkali stress, and gradually curl (Fig.  3 I). However, when wheat seedlings were cultured to the 9th day, the differences between the groups could not be identified by human eyes. Therefore, we needed to identify whether wheat seedlings were under stress in advance according to multispectral images. Figure  3 II showed the multispectral images of wheat seedlings in the bands of 365, 405, 430, 515 and 630 at the 9th day of culture. We could judge the differences of wheat seedlings under different culture conditions according to the multispectral images.

figure 3

Effect of saline-alkali stress on phenotypes of wheat seedlings. I Phenotypes of wheat seedlings cultured in different environments. II Multispectral images of wheat seedlings in various bands at the 9th day of cultivation. CK: control group, A: salt stress, B: alkali stress

T2 analysis of wheat seedlings under saline-alkali stress

T2 relaxation analysis.

The T2 signal amplitude was directly proportional to the water content of living tissues [ 44 ]. The T2 relaxation spectra of water in different phases within living crop organs exhibited significant differences, demonstrating the multicomponent nature of the T2 relaxation spectra [ 6 , 47 ]. Figure  4 is a comparison diagram of T2 relaxation spectra of wheat in the control group (Fig.  4 CK), salt stress group (Fig.  4 A) and alkali stress group (Fig.  4 B) from 5 to 15 days. In Fig.  4 , the T2 spectrum of wheat seedlings had three obvious peaks. Considering that the peak positions in the T2 spectrum could reflect the binding energy intensity, the internal water of wheat seedlings was divided into three binding types [ 12 ]. The water phases were divided into bound water T21 (0.1 ms < T21 < 1 ms) and signal amplitude A21 from left to right; semi-bound water T22 (1 ms < T22 < 10 ms), signal amplitude A22; free water T23 (10 ms < T23 < 1000 ms), signal amplitude A23. The total signal amplitude was represented by A, i.e., A = A21 + A22 + A23. The contents of A21, A22 and A23 increased as the seedlings matured (Fig.  4 ). Under stress conditions, the differentiation between A22 and A23 became blurred from the 9th day (Fig.  4 A) or the 11th day (Fig.  4 B). In addition, compared with the control group, the stress group showed a significant increase in A21.

figure 4

T 2 inversion spectrum of wheat seedings in three experimental groups

In this study, the T2 relaxation peak areas of wheat seedlings cultured under three conditions for 15 days were counted and analyzed (Table  2 ). Within 15 days, the water content of the three phases in all groups increased, but the growth rate was different. From the 5th day to the 7th day, the total water signal volume of CK, A and B groups increased by 312.58%, 221.05% and 149.50%, respectively. During the culture period from the 5th day to the 15th day, the content of A21 in CK group continued to increase by 919.92%, group A increased by 621.64% and group B increased by 494.71%, the A22 of CK A and B groups increased by 1274.35%, 982.94%, and 935.34% in group B, respectively, and the A23 of CK A and B groups increased by 519.91%, 387.49% and 274.84%, respectively. The average growth rates for A21, A22, and A23 were as follows: for A21, B had the highest rate at 171.63%, followed by A at 170.02%, and CK at 168.83%; for A22, the order was CK at 182.27%, A at 180.00%, and B at 164.17%; finally, for A23, CK was highest at 146.71%, followed by A at 131.72%, and then B at 123.32%. Compared with CK group, the average growth rate of bound water in group B was the largest, followed by group A. While the average growth rate of A22 and A23 was the same, which was the largest in group CK, then group A, and finally group B.

Ratio analysis of bound water and free water

From 5th day, the ratio of free water to combined water of wheat seedlings under CK, A and B was carefully tracked (Table  3 ). The CK group maintained a relatively stable rate, slightly decreasing from 15.03% on the 5th day to 13.05% on the 15th day. Similarly, within the same period, the A group decreased from an initial 16.01% on the 5th day to 10.28% on the 15th day, while the B group experienced a more significant decrease, from 15.67% to 8.70%. In addition, on the 7th day, the ratio of free water to bound water in the CK group decreased because the seedlings were moved from the Petri dish to the incubator for culture, and the seedlings needed to adapt to changes in the environment. From 7th day to 15th day, the proportion of free water and bound water in the control group and salt stress group began to increase gradually. However, due to salt stress, the ratio of free water to bound water in group A was always lower than that in group CK. On the contrary, the proportion of free water and bound water in group B decreased continuously from the 7th day to the 11th day, and increased from the 13th day, which delayed the adaptation mechanism of wheat seedlings.

MSI analysis of wheat seedlings under saline-alkali stress

Morphological characteristics analysis.

Twelve morphological features were extracted from multispectral image. In terms of the average value of shape features, color features and binary features, wheat seedlings growing in different environments were different. For the shape characteristics, the length, width and area of wheat seedlings under saline-alkali stress were significantly different from those of the control group (Fig.  5 I, II, III). The length of wheat seedlings under salt stress and alkali stress had significant difference (Fig.  5 I), but the width and area had no significant difference (Fig.  5 I, II).

figure 5

Shape features of wheat seedings in three experimental groups. All statistical data are represented by mean ± standard deviation ( SD ), with different letters representing the significance between gradient groups ( P  < 0.05, Student’s t-test), and the following graph is the same

In terms of color characteristics, the CIELab_A color characteristics of wheat seedlings were significantly different between group CK and group B (Fig.  6 I). There was no significant difference in the color characteristics of CIELab_B and CIELab_L among the three groups of wheat seedlings (Fig.  6 I, II, III).

figure 6

Color features of wheat seedings in three experimental groups

In terms of binary characteristics, there was no significant difference in Vertical Orientation and Vertical Skewness among the three groups (Fig.  7 I, II). There were significant differences in the characteristics of Compactness Circle between B and the other groups (Fig.  7 III). However, in terms of Compactness Ellipse, Beta Shape_a and Beta Shape_b, there were significant differences between CK and the other groups, but no significant differences between A and B (Fig.  7 IV, V, VI).

figure 7

Binary features of wheat seedings in three experimental groups

Multispectral analysis

The average spectral intensity of wheat seedlings was compared to that of a white board to calculate the relative reflection spectrum. Observations of the wheat seedlings began on the 5th day and were conducted every two days, concluding on the 15th day. In general, the average reflectance spectra of CK, A and B groups of wheat seedlings showed a similar trend (Fig.  8 ). However, with the increase of culture time, the difference between the average reflectance spectra of seedlings in groups A and B and CK gradually increased. When the wavelength was 365 nm (UVA region), the average reflectance spectra of the three groups of wheat seedlings were the minimum in the whole band, and decreased with the growth of seedlings. In the visible region (365–645 nm), the average spectrum showed an “S” type growth. The average spectral value was steep and almost linear in the range from red light to early near infrared (700–780 nm). In the near-infrared region (780–970 nm), the average spectral value tended to be flat. From the 11th day, the average reflectance spectrum of group B was significantly higher than that of the other two groups in 700–780 nm.

figure 8

The average reflectance spectra of wheat seedlings under different treatments were obtained using multispectral imaging system. I – VI represent wheat seedlings cultured for 5, 7, 9, 11, 13, 15 days, respectively. (CK) control group, (A) salt stress, and (B) alkali stress. Error bar indicate means ± standard deviation ( SD )

Estimating moisture signal with MSI

Feature parameter selection.

Spearman and Kendall correlation analyses were conducted to assess the relationship between the signal quantity of T2 relaxation peak A and the spectral reflection intensity of each band, as detailed in Table  4 . At a significance level of P  < 0.05, the 14 characteristic wavelengths were selected by both methods. At a more stringent significance level of P  < 0.01, the 5 characteristic wavelengths (780 nm, 850 nm, 880 nm, 940 nm, and 970 nm) were identified (Fig.  9 ). The 5 wavelengths associated with signal amplitude A and its correlation at a significance level of P  < 0.01, were selected for use in the modeling process.

figure 9

Selecting characteristic wavelengths related to signal quantity A through correlation analysis

The characteristic changes in the electromagnetic radiation absorbed by crops in the near-infrared region (780–2526 nm) were primarily attributed to the stretching and bending vibrations of O–H bonds in water molecules and other molecules. Consequently, alterations in leaf water status could induce corresponding spectral changes in these regions [ 11 , 24 , 38 , 46 ].

Prediction model analysis

Four regression models (BPNN, SVM, KPLSR, and GBRT) were established to predict the moisture signal A of wheat seedlings under saline-alkali stress using MSI data. The predicted R 2 P values for all models were above 0.75, as detailed in Table  5 . Among these models, GBRT demonstrated the best predictive performance, with the R 2 P of 0.98 and the RMSEP of 109.60. Additionally, GBRT had the shortest training time of 1.48 s and the fastest prediction speed of 1300 obs/s. The Fig.  10 illustrated the prediction dataset, highlighting the efficiency of the four models. These results suggested that the combination of MSI and chemometrics could be an excellent non-destructive method for investigating the moisture signal amplitude in wheat seedlings.

figure 10

Use MSI to predict the moisture signal amplitude A. I BPNN, II SVM, III KPLSR, IV GBRT

Stress prediction using MSI and LF-NMR datasets

The Principal Component Analysis (PCA) was used to reduce the dimension of 9 T2 relaxation parameters, and the Random Forest (RF) was used to reduce the dimension of multispectral image features. The characteristic parameters were selected as the input variables of the model according to the importance ranking (Fig.  11 ). It could be seen from Fig.  11 that PCA selected relaxation parameters (TP1, TP3, A21, A23) with scores greater than 0.6 (Fig.  11 I); RF selected characteristic parameters (Width, CIELab_L, Vertical Skewness and Compactness Ellipse) with scores greater than 0.6 (Fig.  11 II).

figure 11

Selection of relaxation parameters of Principal Component Analysis (PCA) ( I ) and Random Forest (RF) ( II ). Ts1, Ts2, Ts3 is the peak start time; Tp1, Tp2, Tp3, is the peak point to peak time; Te1, Te2, Te3 is the peak end time; A21, A22 and A23 are signal amplitudes

Classification model analysis

In this study, we discussed the application effect of KNN and GNB machine learning models in the classification and prediction of saline-alkali stress in wheat seedlings. We used three different datasets: MSI datasets, LF-NMR datasets, and fusion datasets of MSI and NMR. It could be seen from the confusion matrix that both models could classify wheat seedlings under saline-alkali stress, and the classification accuracy of the fusion dataset of CK group was 100% (Fig.  12 ). The predicted Recall, Precision, Accuracy and F1-score of the two models for the three test datasets were all above 75.00% (Table  6 ). In all datasets, GNB model was superior to KNN model in all evaluation indexes, which might be attributed to the advantages of GNB model in processing data with high-dimensional feature space. In addition, the fusion dataset showed better prediction performance on both models, emphasizing the importance of using multi-source data in crop stress prediction.

figure 12

Confusion matrix of two models on different datasets. I – III The K-Nearest Neighbor (KNN) model predicts confusion matrices for MSI dataset, LF-NMR dataset, and fusion dataset, respectively; IV – VI The Gaussian-Naïve Bayes (GNB) model predicts confusion matrices for MSI dataset, LF-NMR dataset, and fusion dataset, respectively

Soil salinization, a prevalent issue in agricultural production, poses a significant challenge to crop cultivation; The saline-alkali stress resulting from soil salinization not only hampers wheat growth but also adversely affects its yield [ 8 ]. Traditional research methods, which can damage crops and are time-consuming, often fail to provide continuous monitoring of crops [ 4 , 19 ]. In this study, we applied salt stress and alkali stress to wheat seedlings and utilized LF-NMR and MSI technology to analyze their responses to saline-alkali stress. This approach demonstrated the potential for accurate and nondestructive detection of crop water status. Furthermore, by employing various regression and classification models, our study not only predicted the quantitative moisture signal amplitude but also achieved qualitative prediction of wheat seedlings under saline-alkali stress.

The germination of wheat seeds was primarily affected by osmotic stress and ion effects caused by salt [ 27 ]. Our results showed that, compared to the control group, saline-alkali stress significantly reduced the germination rate, germination potential, and germination index of wheat seeds (Table  1 ). Analysis of T2 relaxation times revealed that the content of bound water was the lowest among the different types of moisture present. Bound water resided inside wheat cells, combining with proteins through hydrogen bonds. These hydrogen bonds were strong, preventing the free flow of bound water and its participation in metabolic processes. Due to the large hydrogen bonding force, bound water could not move freely within cells and did not engage in metabolism.

Semi-bound water could be adsorbed on other tissues through hydrogen bonding or Coulomb force. Free water which existed in the internal space of wheat by capillary action, had strong fluidity [ 15 , 16 ]. As a good solvent, free water could dissolve many substances and compounds. The higher the ratio of free water to bound water, the stronger the metabolic activity of seedlings. Under different stress conditions, the ratio of free water to bound water fluctuated, highlighting the complex interaction between environmental stress sources and physiological responses of wheat seedlings. Therefore, during the whole culture period, compared with the control group, the ratio of free water to bound water of seedlings under alkali-stress was the lowest, followed by salt stress group. Both salt stress and alkali stress hindered the increase of water content signal amplitude, and alkali stress played a more significant role. This indicated that alkali stress had a significant effect on the water holding capacity of seedlings (Fig.  4 ). The results indicated that wheat seedlings could increase the content of bound water through specific mechanisms, thereby enhancing their tolerance to saline-alkali stress in an adverse environment (Table  2 ). Although all seedlings showed adaptive ability, the efficiency and time of these responses varied with stress types. Compared with alkali stressed seedlings, salt stressed seedlings showed faster recovery in water management (Table  3 ). Liu et al. [ 22 ] draw a conclusion that alkali stress inhibited the growth of wheat more than salt stress at the same Na concentration, which was consistent with the results of this study.

Changes in the water and ion content of wheat exposed to saline-alkali stress could significantly affect its spectral reflectance [ 5 , 5 ]. This study found that when the wavelength was 365 nm, the average reflectance spectrum was the smallest and decreased with the growth of wheat seedlings. This phenomenon might be related to the increase of phenolic compounds in wheat seedlings. With the growth of seedlings, phenolic compounds would increase [ 17 ]. These phenolic compounds (such as flavonoids) had strong absorption capacity in the ultraviolet region [ 35 ]. The spectrum at 365–645 nm showed an “S” type growth, this region was mainly related to the absorption peak of crop chlorophyll, which reflected the absorption capacity of crops to photosynthetic effective radiation. In the spectral range of 700–780 nm, the average spectral value was steep and almost linear, which was a typical feature of seedlings [ 34 ]. The linear growth meant that with the growth of crops, the leaf structure became more mature and thicker, and the light scattering ability was enhanced. In the spectral range of 780–970 nm, the changes of spectral reflectance were mainly related to the internal structure and water content of leaves. The gentle trend showed that the structure and water state of crop leaves had reached a balance state to a certain extent. In addition, from the 11th day (Fig.  8 IV), the average reflectance spectrum in this region of group B was significantly higher than that of the other two groups. This could be due to the alkali stress altering the internal structure and water regulation mechanisms of crop leaves, which in turn affected the reflectance spectrum. Alkali stress might promote the activation of some protective mechanisms, such as the accumulation of osmotic adjustment substances, which helped to maintain the water state of cells, and then affected the spectral reflectance. The photosynthetic characteristics of seedlings under abiotic stress could be used as the best index to determine the ability of crops to deal with saline alkali stress [ 2 ]. This study found that under the same Na concentration, the impact of alkali stress on the wheat’s spectral characteristics was more pronounced than that of salt stress. As Zhang et al. [ 48 ] proposed, multispectral technology could effectively improve the accuracy of stress monitoring.

Long-term continuous monitoring of water status of wheat plants can not only enrich the water transport theory of Soil Plant Atmosphere Continuum (SPAC), but also have important significance in clarifying the adaptation mechanism of crops to the environment, efficient water use and water-saving regulation [ 42 ]. However, the traditional moisture detection has the disadvantages of complex operation, harmful chemical reagents to human body, destructive to samples and so on, which is difficult to be widely used [ 3 ]. With the rapid development of nondestructive testing technology, researchers began to explore the nondestructive detection of crop moisture. Yang et al. [ 41 ] found that there was a consistent linear relationship between nuclear magnetic signal amplitude and moisture content on wet basis during rice seed germination (R 2  = 0.98). Similarly, Yao et al. [ 43 ] found that there was a linear relationship between the pure water content of each organ of wheat and the total signal amplitude A of T2 relaxation spectrum (R 2  = 0.99). Therefore, this study predicted the water signal of wheat based on multispectral data. This study selected the multispectral band with high correlation with water to predict the wheat water signal amplitude A. The results showed that GBRT model performed best in quantitative prediction of water signal, with high accuracy and rapid response ability (Fig.  5 ), which was of great significance for real-time monitoring of crop water status.

Compared with single data source, data fusion significantly improves the performance of prediction model [ 1 ]. For example, the collaborative retrieval model of hyperspectral and multispectral images based on double branch convolution network can effectively use the characteristics of data [ 37 ]. Compared with the yield estimation model based on single sensor data, multi-source data fusion can effectively improve the estimation accuracy of winter wheat yield [ 29 ]. The prediction accuracy of the winter wheat yield estimation model based on multispectral and thermal infrared data fusion was 8% higher than that based on multispectral data alone [ 18 ]. Therefore, in terms of qualitative prediction, we compared the performance of a single LF-NMR or MSI data source with the model fused with LF-NMR and MSI data source, and found that the Precision, Recall, Accuracy and F1-score of the model after data fusion were excellent (Fig.  12 , Table  6 ). It confirmed the effectiveness of information fusion in improving the application of precision agriculture, which meant that data fusion could be used to improve the classification and prediction ability of wheat seedlings under different saline alkali stress levels.

The fusion of LF-NMR and MSI technology, this study provided a new perspective for the nondestructive detection and evaluation of wheat seedlings under saline-alkali stress, and also pointed out the direction of future research. Although this study had achieved positive results, there were still limitations. The selected saline alkali stress level might not fully cover the actual field situation, which might limit the universality of the model in a wide range of applications. More saline-alkali stress levels and adding more variety samples could be explored in future research to increase the robustness and generalization ability of the model. In addition, there might be differences between the predicted moisture amplitude signal of NMR and the prediction effect of actual moisture. In this paper, only the NMR signal was used for measurement, and the drying method was not used for actual calibration. After that, a variety of methods would be used to calibrate the predicted data. The relationship between other biological parameters (such as ion absorption, chlorophyll content, etc.) and water status could also be further explored to comprehensively evaluate the response of crops to stress. Finally, the combination of machine learning model and traditional crop growth model might provide a deeper understanding for predicting crop performance in changing environments.

In this study, we combined LF-NMR and MSI technology to achieve nondestructive detection of wheat seedlings under saline-alkali stress. Under stress, wheat seedlings would increase bound water content through specific mechanisms to enhance their saline-alkali stress tolerance. However, the efficiency and timing of these responses vary with stress types. Compared to alkali stress, salt stress endowed seedlings with a stronger recovery ability in water management. Stress can induce changes in the internal structure and water regulation mechanisms of wheat leaves. The impact of alkali stress on wheat spectral characteristics was more pronounced than that of salt stress. At the same Na concentration, alkaline stress inhibited wheat growth more than salt stress. Model comparison revealed that the GBRT model excelled in predicting wheat moisture signals, with the R 2 P of 0.98 and the RMSEP of 109.60. It also featured a short training time of 1.48 s and a high prediction speed of 1300 obs/s. For qualitative prediction, the KNN and GNB models demonstrated significantly better classification abilities on the fused datasets compared to using only MSI or LF-NMR datasets alone. Notably, the GNB model showed the most outstanding classification prediction effect on the fused dataset, with Precision, Recall, Accuracy, and F1-score of its test set reaching 90.30%, 88.89%, 88.90%, and 0.90, respectively. These findings not only demonstrated the application potential of LF-NMR and MSI information fusion technology in agriculture but also provided an effective method for predicting the moisture signal quantity in wheat seedlings and accurately classifying saline-alkali stress effects.

Data availability

No datasets were generated or analysed during the current study.

Alparone L, Arienzo A, Garzelli A. Spatial resolution enhancement of vegetation indexes via fusion of hyperspectral and multispectral satellite data. Remote Sens. 2024;16:875. https://doi.org/10.3390/rs16050875 .

Article   Google Scholar  

An Y, Gao Y, Tong SZ, et al. Morphological and physiological traits related to the response and adaption of bolboschoenus planiculmis seedlings grown under salt-alkaline stress conditions. Front Plant Sci. 2021;12: 567782. https://doi.org/10.3389/fpls.2021.567782 .

Article   PubMed   PubMed Central   Google Scholar  

Chen M, Li JL, Li W, et al. Dynamic testing and imaging of living maize kernel moisture using low-field nuclear magnetic resonance (LF-NMR). Trans Chin Soc Agric Eng. 2020;36:285–92. https://doi.org/10.11975/j.issn.1002-6819.2020.23.033 .

Cui M-H, Chen X-Y, Yin F-X, et al. Hybridization affects the structure and function of root microbiome by altering gene expression in roots of wheat introgression line under saline-alkali stress. Sci Total Environ. 2022;835: 155467. https://doi.org/10.1016/j.scitotenv.2022.155467 .

Article   CAS   PubMed   Google Scholar  

El-Hendawy S, Al-Suhaibani N, Alotaibi M, et al. Estimating growth and photosynthetic properties of wheat grown in simulated saline field conditions using hyperspectral reflectance sensing and multivariate analysis. Sci Rep. 2019;9:16473. https://doi.org/10.1038/s41598-019-52802-5 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Gu Y, Chen Y, Yue X, et al. Effects of 6-Benzylaminopurine on internal water distribution and growth state of soybean. Trans Chin Soc Agric Eng. 2022;38:303–8. https://doi.org/10.11975/j.issn.1002-6819.2022.05.036 .

Gu Y, Li J, Zhang H, et al. Effect of 6-benzyladenine on soybean seed germination under salt stress and establishment of stress grade prediction model. Plant Stress. 2024;11: 100388. https://doi.org/10.1016/j.stress.2024.100388 .

Article   CAS   Google Scholar  

Guo B, Lu M, Fan Y, et al. A novel remote sensing monitoring index of salinization based on three-dimensional feature space model and its application in the Yellow River Delta of China. Geomat Nat Haz Risk. 2023;14:95–116. https://doi.org/10.1080/19475705.2022.2156820 .

Guo R, Yang ZZ, Li F, et al. Comparative metabolic responses and adaptive strategies of wheat ( Triticum aestivum ) to salt and alkali stress. BMC Plant Biol. 2015;15:170. https://doi.org/10.1186/s12870-015-0546-x .

Guo Y, Cao H, Han S, et al. Spectral-spatial hyperspectralImage classification with k-nearest neighbor and guided filter. IEEE Access. 2018;6:18582–91. https://doi.org/10.1109/ACCESS.2018.2820043 .

Guo Z, Zhai L, Zou Y, et al. Comparative study of Vis/NIR reflectance and transmittance method for on-line detection of strawberry SSC. Comput Electron Agric. 2024;218: 108744. https://doi.org/10.1016/j.compag.2024.108744 .

Hu X, Wu P, Zhang S, et al. Moisture conversion and migration in single-wheat kernel during isothermal drying process by LF-NMR. Drying Technol. 2019;37:803–12. https://doi.org/10.1080/07373937.2018.1459681 .

Izadi MH, Rabbani J, Emam Y, et al. Effects of salinity stress on physiological performance of various wheat and barley cultivars. J Plant Nutr. 2014;37:520–31. https://doi.org/10.1080/01904167.2013.867980 .

Ji J, Zhang J, Wang X, et al. The alleviation of salt stress on rice through increasing photosynthetic capacity, maintaining redox homeostasis and regulating soil enzyme activities by Enterobacter sp. JIV1 assisted with putrescine. Microbiol Res. 2024;280:127590. https://doi.org/10.1016/j.micres.2023.127590 .

Jia C, Wang L, Yin S, et al. Low-field nuclear magnetic resonance for the determination of water diffusion characteristics and activation energy of wheat drying. Drying Technol. 2020;38:917–27. https://doi.org/10.1080/07373937.2019.1599903 .

Jiang M, Wu P, Xing H, et al. Water migration and diffusion mechanism in the wheat drying. Drying Technol. 2021;39:738–51. https://doi.org/10.1080/07373937.2020.1716001 .

Jin Z, Xu Y, Wang M, et al. Changes of phenolic compounds and their antioxidant activities during wheat germination. Food Ferment Indus. 2019;45:199–202. https://doi.org/10.13995/j.cnki.11-1802/ts.017617 .

Lan M, Fei SP, Yu XL, et al. Application of multispectral and thermal infrared data fusion in estimation of winter wheat yield. J Triticeae Crops. 2021;41:1564–72. https://doi.org/10.7606/j.issn.1009-1041.2021.12.15 .

Li X, Li S, Wang J, et al. Exogenous abscisic acid alleviates harmful effect of salt and alkali stresses on wheat seedlings. Int J Environ Res Public Health. 2020;17:3770. https://doi.org/10.3390/ijerph17113770 .

Li YY, Chen B, Yao LR, et al. Evaluation of salt and alkali tolerance and germplasm screening of 283 wheat varieties (lines) during germination. J Agric Sci Technol. 2021;23:25–33. https://doi.org/10.13304/j.nykjdb.2020.0203 .

Liao FB, Feng XQ, Li ZQ, et al. A hybrid CNN-LSTM model for diagnosing rice nutrient levels at the rice panicle initiation stage. J Integr Agric. 2023;23:711. https://doi.org/10.1016/j.jia.2023.05.032 .

Liu D, Ma Y, Rui M, et al. Is high pH the key factor of alkali stress on plant growth and physiology? A case study with wheat ( Triticum aestivum L.) seedlings. Agronomy. 2022;12:12081802. https://doi.org/10.3390/agronomy12081820 .

Liu W, Xu X, Liu C, et al. Rapid discrimination of high-quality watermelon seeds by multispectral imaging combined with chemometric methods. J Appl Spectrosc. 2019;85:1044–9. https://doi.org/10.1007/s10812-019-00757-w .

Liu C, Sun PS, Liu SR. A comparison of spectral reflectance indices in response to water: a case study of Quercus aliena var. acuteserrata. Chinese Journal of Plant Ecology. 2017; 41(08): 850-861. https://doi.org/10.17521/cjpe.2016.0095 .

Lu P, Dai S, Yong L, et al. A soybean sucrose non-fermenting protein kinase 1 gene, GmSNF1, positively regulates plant response to salt and salt-alkali stress in transgenic plants. Int J Mol Sci. 2023;24:12482. https://doi.org/10.3390/ijms241512482 .

Lyndgaard CH, Kistrup AT, Christine. HB. Determination of dry matter content in potato tubers by low-field nuclear magnetic resonance (LF-NMR). Journal of Agricultural and Food Chemistry. 2010; 58(19): 10300-4. https://doi.org/10.1021/jf101319q .

Mourad AMI, Farghly KA, Börner A, et al. Candidate genes controlling alkaline-saline tolerance in two different growing stages of wheat life cycle. Plant Soil. 2023;493:283–307. https://doi.org/10.1007/S11104-023-06232-Y .

Nidal F, Ismail M, Abdelhalem M, et al. Phosphate solubilizing rhizobacteria isolated from jujube ziziphus lotus plant stimulate wheat germination rate and seedlings growth. PeerJ. 2021;9:e11583–e11583. https://doi.org/10.7717/PEERJ.11583 .

Song CY, Geng HW, Fei SP, et al. Study on yield estimation of wheat varieties based on multi-source data. Spectrosc Spectral Anal. 2023;43:2210–9. https://doi.org/10.3964/j.issn.1000-0593(2023)07-2210-10 .

Song P, Yue X, Gu Y, et al. Assessment of maize seed vigor under saline-alkali and drought stress based on low field nuclear magnetic resonance. Biosys Eng. 2022;220:135–45. https://doi.org/10.1016/j.biosystemseng.2022.05.018 .

Takhtkeshha N, Mandlburger G, Remondino F, et al. Multispectral light detection and ranging technology and applications: a review. Sensors. 2024;24:1669. https://doi.org/10.3390/s24051669 .

Tarolli P, Luo J, Park E, et al. Soil salinization in agriculture: mitigation and adaptation strategies combining nature-based solutions and bioengineering. iScience. 2024;27:108830. https://doi.org/10.1016/j.isci.2024.108830 .

Thakur R, Yadav S. Biofilm forming, exopolysaccharide producing and halotolerant, bacterial consortium mitigates salinity stress in Triticum aestivum . Int J Biol Macromol. 2024;262: 130049. https://doi.org/10.1016/j.ijbiomac.2024.130049 .

Ustin SL, Gitelson AA, Jacquemoud S, et al. Retrieval of foliar information about plant pigment systems from high resolution spectroscopy. Remote Sens Environ. 2009;113:S67–77. https://doi.org/10.1016/j.rse.2008.10.019 .

Vodnik D, Vogrin Ž, Šircelj H, et al. Phenotyping of basil ( Ocimum basilicum L.) illuminated with UV-A light of different wavelengths and intensities. Sci Horticult. 2023;309:111638. https://doi.org/10.1016/j.scienta.2022.111638 .

Wang J, Xie H, Han J, et al. Effect of graphene oxide-glyphosate nanocomposite on wheat and rape seedlings: growth, photosynthesis performance, and oxidative stress response. Environ Technol Innov. 2022;27: 102527. https://doi.org/10.1016/j.eti.2022.102527 .

Wang YZ, Xiao ZY. Hyperspectral and multispectral co-inversion of chlorophyll content in maize leaves based on two-branch convolutional network. Trans Chinese Soc Agric Mach. 2024;55:196–202. https://doi.org/10.6041/j.issn.1000-1298.2024.01.018 .

Wu D, He Y, Feng S. Short-wave near-infrared spectroscopy analysis of major compounds in milk powder and wavelength assignment. Anal Chim Acta. 2008;610:232–42. https://doi.org/10.1016/j.aca.2008.01.056 .

Wu D, Zhang F, Sui CY, et al. Exogenous active substances: effect on stress resistance of wheat seedling. Chinese Agric Sci Bull. 2022;38:14–9. https://doi.org/10.11924/j.issn.1000-6850.casb2021-0871 .

Xiao G, Wang M, Li X, et al. TaCHP encoding C1-domain protein stably enhances wheat yield in saline-alkaline field. J Integr Plant Biol. 2023;66:169. https://doi.org/10.1111/jipb.13605 .

Yang H, Zhang L, Ji J, et al. Analysis on water absorption of rice seeds during germination process under polyethylene glycol solution using low-field nuclear magnetic resonance. Trans Chinese Soc Agric Eng. 2018;34:276–83. https://doi.org/10.11975/j.issn.1002-6819.2018.17.036 .

Yang Q, Zhang F, Liu X. Search progress on regulation mechanism for the process of water transport in plants. Acta Ecol Sin. 2011;70:129–46.

Google Scholar  

Yao S, Du G, Mou H, et al. Detection of water distribution and dynamics in body of winter wheat based on nuclear magnetic resonance. Trans Chinese Soc Agric Eng. 2014;30:177–86. https://doi.org/10.3969/j.issn.1002-6819.2014.24.021 .

Yao S, Mou H, Du G, et al. Water imbibition and germination of wheat seed with nuclear magnetic resonance. Trans Chinese Soc Agric Mach. 2015;46:266–74. https://doi.org/10.6041/j.issn.1000-1298.2015.11.036 .

Younas S, Mao Y, Liu C, et al. Measurement of water fractions in freeze-dried shiitake mushroom by means of multispectral imaging (MSI) and low-field nuclear magnetic resonance (LF-NMR). J Food Compos Anal. 2020;96:103694. https://doi.org/10.1016/j.jfca.2020.103694 .

Younas S, Mao Y, Liu C, et al. Measurement of water fractions in freeze-dried shiitake mushroom by means of multispectral imaging (MSI) and low-field nuclear magnetic resonance (LF-NMR). J Food Compos Anal. 2021;96:103694. https://doi.org/10.1016/j.jfca.2020.103694 .

Yue X, Bai Y, Wang Z, et al. Low-field nuclear magnetic resonance of maize seed germination process under salt stress. Trans Chinese Soc Agric Eng. 2020;36:292–300. https://doi.org/10.11975/j.issn.1002-6819.2020.24.034 .

Zhang J, Yu H, Dang J. Research on inversion model of wheat polysaccharide under high temperature and ultraviolet stress based on dual-spectral technique. Spectrosc Spectral Anal. 2023;43:2705–9.

CAS   Google Scholar  

Zhang K, Tang J, Wang Y, et al. The tolerance to saline-alkaline stress was dependent on the roots in wheat. Physiol Mol Biol Plants. 2020;26:947–54. https://doi.org/10.1007/s12298-020-00799-x .

Zhao X, Xi H, Yu T, et al. Spatio-temporal variation in soil salinity and its influencing factors in desert natural protected forest areas. Remote Sens. 2023;15:5054. https://doi.org/10.3390/rs15205054 .

Download references

Acknowledgements

The authors would like to acknowledge the generous guidance provided by the National Key Research and Development Program of China and the Reform and Development Project of Beijing Academy of Agriculture and Forestry (Research and development of non-destructive testing technology and equipment for wheat and millet spikelet number based on deep learning). The authors would like to thank the editor and anonymous reviewers for their helpful suggestions on the quality improvement in this paper.

This work was supported by the National Key Research and Development Program of China (No. 2022YFD2002301) and the Reform and Development Project of Beijing Academy of Agriculture and Forestry (Research and development of non-destructive testing technology and equipment for wheat and millet spikelet number based on deep learning).

Author information

Authors and affiliations.

College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang, 110866, China

Ying Gu & Liping Chen

Intelligent Equipment Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100089, China

Ying Gu, Guoqing Feng, Peichen Hou, Yanan Zhou, He Zhang, Xiaodong Wang, Bin Luo & Liping Chen

College of Agriculture, Northeast Agricultural University, Harbin, 150006, China

You can also search for this author in PubMed   Google Scholar

Contributions

Ying Gu: Conceptualization, Investigation, Methodology, Supervision, Software, Data curation, Visualization, Writing—original draft, Writing—review and editing. Guoqing Feng: Methodology, Resources, Data curation, Software. Peichen Hou: Methodology, Resources. Yanan Zhou: Writing—review and editing, Funding acquisition. He Zhang: Visualization, Writing—review and editing. Xiaodong Wang: Methodology, Resources. Bin Luo: Project administration, Funding acquisition, Supervision, Writing—review and editing. Liping Chen: Methodology, Supervision, Writing—review and editing.

Corresponding authors

Correspondence to Bin Luo or Liping Chen .

Ethics declarations

Ethics approval and consent to participate.

All authors agreed to publish this manuscript.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Gu, Y., Feng, G., Hou, P. et al. Nondestructive detection of saline-alkali stress in wheat ( Triticum aestivum L.) seedlings via fusion technology. Plant Methods 20 , 136 (2024). https://doi.org/10.1186/s13007-024-01248-6

Download citation

Received : 06 June 2024

Accepted : 27 July 2024

Published : 05 September 2024

DOI : https://doi.org/10.1186/s13007-024-01248-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Low-field nuclear magnetic resonance
  • Multispectral imaging
  • Wheat seedlings
  • Saline-alkali stress
  • Nondestructive testing

Plant Methods

ISSN: 1746-4811

best research papers on nearest neighbor search

This week: the arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Machine Learning

Title: k-nearest neighbour classifiers: 2nd edition (with python examples).

Abstract: Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.
Comments: 22 pages, 15 figures: An updated edition of an older tutorial on kNN
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: [cs.LG]
  (or [cs.LG] for this version)
  Focus to learn more arXiv-issued DOI via DataCite
: Focus to learn more DOI(s) linking to related resources

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

  1. Improved nearest neighbor search using auxiliary information and

    best research papers on nearest neighbor search

  2. (PDF) A Comprehensive Survey and Experimental Comparison of Graph-Based

    best research papers on nearest neighbor search

  3. (PDF) Winner-Update Algorithm for Nearest Neighbor Search

    best research papers on nearest neighbor search

  4. Labeled Nearest Neighbor Search and Metric Spanners via Locality

    best research papers on nearest neighbor search

  5. (PDF) A Survey on Nearest Neighbor Search Methods

    best research papers on nearest neighbor search

  6. The Best Nearest Neighbor Search Algorithm YATSI TPR FPR Precision

    best research papers on nearest neighbor search

VIDEO

  1. The game that inspired THAT'S NOT MY NEIGHBOR!

  2. Dr. Sinert

  3. A Fast Nearest Neighbor Search Scheme over Outsourced Encrypted Medical Images

  4. Improved Approximate Nearest Neighbor Search

  5. Liudmila Prokhorenkova: Graph-based nearest neighbor search: practice and theory

  6. Paths Beyond Local Search: A Tight Bound for Randomized Fixed-Point Computation

COMMENTS

  1. [2101.12631] A Comprehensive Survey and Experimental Comparison of

    Approximate nearest neighbor search (ANNS) constitutes an important operation in a multitude of applications, including recommendation systems, information retrieval, and pattern recognition. In the past decade, graph-based ANNS algorithms have been the leading paradigm in this domain, with dozens of graph-based ANNS algorithms proposed. Such algorithms aim to provide effective, efficient ...

  2. Deep Learning for Approximate Nearest Neighbour Search: A Survey and

    Approximate nearest neighbour search (ANNS) in high-dimensional space is an essential and fundamental operation in many applications from many domains such as multimedia database, information retrieval and computer vision. With the rapidly growing volume of data and the dramatically increasing demands of users, traditional heuristic-based ANNS solutions have been facing great challenges in ...

  3. A Comprehensive Survey and Experimental Comparison of Graph-Based

    Nearest Neighbor Search (NNS) is a fundamental building block in various application domains [7, 8, 38, 67, 70, 80, 108, 117], such as ... major contribution (optimization on one component) in the paper, but instead by another small optimization for another component (e.g., NSSG [37]). ... It is worth noting that we try our best to reimple-ment ...

  4. Recent Approaches and Trends in Approximate Nearest Neighbor Search

    This overview paper reviews recent advances of the state of the art of nearest neighbor search and discusses some trends, and provides advice on the benchmarking pipeline. Nearest neighbor search is a computational primitive whose efficiency is paramount to many applications. As such, the literature recently blossomed with many works focusing on improving its effectiveness in an approximate ...

  5. PDF An Investigation of Practical Approximate Nearest Neighbor Algorithms

    This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimen- ... The metric tree [29, 25, 5] is a data structure that supports efficient nearest neighbor search. We briefly A metric tree organizes a set of points in a spatial hierarchical manner. It is a

  6. New Directions in Approximate Nearest-Neighbor Searching

    distance from q is within a factor of 1 + ε of the distance to the true nearest neighbor. This is called ε-approximate nearest-neighbor searching (ε-ANN). This problem has been the subject of many research papers, and it remains a topic of active study. In this paper, we will survey some of recent techniques on ffit approx-imate nearest ...

  7. Efficient Approximate Nearest Neighbor Search in Multi-dimensional

    Approximate nearest neighbor (ANN) search is a fundamental search in multi-dimensional databases, which has numerous real-world applications, such as image retrieval, recommendation, entity resolution, and sequence matching. Proximity graph (PG) has been ...

  8. Quantitative Comparison of Nearest Neighbor Search Algorithms

    We compare the performance of three nearest neighbor search algorithms: the Orchard, ball tree, and VP-tree algorithms. These algorithms are commonly used for nearest-neighbor searches and are known for their efficiency in large datasets. We analyze the fraction of distances computed in relation to the size of the dataset and its dimension. For each algorithm we derive a fitting function for ...

  9. Approximate Nearest Neighbor Search on High Dimensional Data

    A comprehensive experimental evaluation of many state-of-the-art methods for approximate nearest neighbor search and proposes a new method that achieves both high query efficiency and high recall empirically on majority of the datasets under a wide range of settings. Nearest neighbor search is a fundamental and essential operation in applications from many domains, such as databases, machine ...

  10. Approximate Nearest Neighbor Search in High Dimensions

    2006. TLDR. The problem of finding the approximate nearest neighbor of a query point in the high dimensional space is studied, focusing on the Euclidean space, and it is shown that the <i>c</i> nearest neighbor can be computed in time and near linear space where <i*p</i><sup> ≈ 2.06/<i*c—i> becomes large. Expand. 255.

  11. PDF Revisiting kd-tree for Nearest Neighbor Search

    MAKING kd-TREE COMPETITIVE. We focus on the ubiquitous and well-studied problem of Euclidean. nearest-neighbor search - for any set of points Rd and any ⊂. S. query Rd, find the point in closest to with respect to the q ∈ S q l2 metric. The brute-force solution of scanning the complete set for S. a single becomes infeasible for sets with ...

  12. Comprehensive Guide To Approximate Nearest Neighbors Algorithms

    This is why "Nearest Neighbor" has become a hot research topic, in order to increase the chance of users to find the information they are looking for in reasonable time. The use cases for "Nearest Neighbor" are endless, and it is in use in many computer-science areas, such as image recognition, machine learning, and computational ...

  13. More-efficient approximate nearest-neighbor search

    In a paper we presented at this year's Web Conference, we describe a new technique that makes graph-based nearest-neighbor search much more efficient. The technique is based on the observation that, when calculating the distance between the query and points that are farther away than any of the candidates currently on the list, an approximate distance measure will usually suffice.

  14. A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

    reproducibility, all the codes used in the paper are released on GitHub, which can be used as a testing platform for a fair comparison between various hashing algorithms. Index Terms—Approximate nearest neighbor search, hashing. F 1 INTRODUCTION Nearest neighbor search plays an important role in many appli-cations of machine learning and data ...

  15. Efficient and robust approximate nearest neighbor search using

    Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets. ... (1280×640px for best display). ... We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). ...

  16. PDF An Improved Algorithm Finding Nearest Neighbor Using Kd-trees

    An Improved Algorithm Finding Nearest Neighbor Using Kd-trees Rina Panigrahy Microsoft Research, Mountain View CA, USA [email protected] Abstract. We suggest a simple modification to the Kd-tree search algo-rithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest ...

  17. [PDF] Improved Space-Efficient Approximate Nearest Neighbor Search

    A data structure is given that leverages function inversion to improve the query time of the best known near-linear space data structure for approximate nearest neighbor search under Euclidean distance: the ALRW data structure of (Andoni, Laarhoven, Razenshteyn, and Waingarten 2017). Approximate nearest neighbor search (ANN) data structures have widespread applications in machine learning ...

  18. Vector Search, Approximate Nearest Neighbor Search Papers

    📚 Awesome papers and technical blogs on vector DB (database), semantic-based vector search or approximate nearest neighbor search (ANN Search, ANNS). Vector search is the key component of large-scale information retrieval, cross-modal retrieval, LLMs-based RAG, vector databases. - matchyc/vector-search-papers

  19. SOAR: New algorithms for even faster vector search with ScaNN

    SOAR provides ScaNN a robust "backup" route to identify nearest neighbors when ScaNN's traditional clustering-based approach has the most difficulty. This allows ScaNN to perform even faster vector search, all while maintaining low index size and indexing time, leading to the all-around best set of tradeoffs among vector search algorithms.

  20. A Comprehensive Survey on Vector Database: Storage and Retrieval

    to the section on Nearest Neighbor Search. A. Nearest Neighbor Search 1) Brute Force Approach: A brute force algorithm for NNS problem is a very simple and naive algorithm, which scans through all the points in the dataset and computes the distance to the query point, keeping track of the best so far. This algorithm guarantees to find the true ...

  21. Fast Nearest Neighbor Search with Keywords

    Conventional spatial queries, such as range search and nearest neighbor retrieval, involve only conditions on objects' geometric properties. Today, many modern applications call for novel forms of queries that aim to find objects satisfying both a spatial predicate, and a predicate on their associated texts. For example, instead of considering all the restaurants, a nearest neighbor query ...

  22. Comparative Analysis Between K-Nearest Neighbor (KNN) and ...

    The objective of this research is to analyze and compare the results generated using Deep learning and K-Nearest Neighbor (KNN) classifier to find which classifier performed the best concerning emotion classification using HR and EDG signals where previous research used Electroencephalogram (EEG) signals in their experiment.

  23. An Approach to Improving Nearest Neighbor Search

    This paper studies the distances between data points and a given query point Q, and design a process to update the distance values based on the nearest neighbor search in the past, which can be used to efficiently and effectively select data points which are closest to Q. In this paper, we present our research on data analysis and nearest neighbor search problems. A nearest neighbor search ...

  24. Machine learning-assisted design of high-entropy alloys with superior

    K-nearest neighbor (KNN) [71] algorithm determines the group of a new input instance according to the categories of its k nearest neighbor instances. The three elements of k -nearest neighbor algorithm consist of the k value selection, the distance measurement method and classification decision rules.

  25. Nondestructive detection of saline-alkali stress in wheat (Triticum

    Background Wheat (Triticum aestivum L.) is an important grain crops in the world, and its growth and development in different stages is seriously affected by saline-alkali stress, especially in seedling stage. Therefore, nondestructive detection of wheat seedlings under saline-alkali stress can provide more comprehensive technical support for wheat breeding, cultivation and management. Results ...

  26. k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report.