10 апреля 2009 | Автор: Admin | Рубрика: Компьютерная литература » Програм-ние и разработка » Databases and SQL | Комментариев: 0
Similarity Search: The Metric Space Approach (Advances in Database Systems)
9780387291468 | (0387291466) | Springer | 2005 | 11 MB | RS | FF
The area of similarity searching is a very hot topic for both research and commercial applications. Current data processing applications use data with considerably less structure and much less precise queries than traditional database systems. Examples are multimedia data like images or videos that offer queryby-example search, product catalogs that provide users with preference-based search, scientific data records from observations or experimental analyses such as biochemical and medical data, or XML documents that come from heterogeneous data sources on the Web or in intranets and thus does not exhibit a global schema. Such data can neither be ordered in a canonical manner nor meaningfully searched by precise database queries that would return exact matches.
This novel situation is what has given rise to similarity searching, also referred to as content-based or similarity retrieval. The most general approach to similarity search, still allowing construction of index structures, is modeled in metric space. In this book. Prof. Zezula and his co-authors provide the first monograph on this topic, describing its theoretical background as well as the practical search tools of this innovative technology.
The proliferation of information housed in computerized domains makes it vital to find tools to search these resources efficiently and effectively. Ordinary retrieval techniques are inadequate because sorting is simply impossible. Consequently, proximity searching has become a fundamental computation task in a variety of application areas.
Similarity Search focuses on the state of the art in developing index structures for searching the metric space. Part I of the text describes major theoretical principles, and provides an extensive survey of specific techniques for a large range of applications. Part II concentrates on approaches particularly designed for searching in large collections of data. After describing the most popular centralized disk-based metric indexes, approximation techniques are presented as a way to significantly speed up search time at the cost of some imprecision in query results. Finally, the scalable and distributed metric structures are discussed.