Survey of Image Retrieval Systems on Internet
 

1. Introduction

The survey studies and evaluates three image retrieval systems available on the Internet. Netra, Excalibur and ImageMiner are examined. They are compared to IBM QBIC system.

Netra (W.Y. Ma, 1997) provides content-based image retrieval by color, texture, shape and spatial location. Attributes are represented in feature vectors. The system uses Euclidean difference measure to calculate difference between two attribute feature vectors. Users can search a image using multiple attributes and vary the relative importance of each attribute. Search result is ranked by relevance.

Excalibur system provides similar search functionality as Netra. Attributes are also denoted in feature vectors. It uses Shadow Play algorithm to match images.

ImageMiner (T. Hermes, et.al., 1995) provides content-based image retrieval by color, texture, and shape and objects. The system aims to support conceptual query search. Thus the attributes are represented in textual descriptions. Text matching techniques are used to compute the difference between two descriptions.

Comparing with QBIC (W. Niblack, et.al., 1993) in terms of presentation, Netra offers users more advanced query facilities e.g. drawing the interested shapes on the fly; Excalibur does not have keyword search facility; ImageMiner offers query on high-level abstraction i.e. objects in the image.. But search refinement is not available on any of these systems.

Comparing with QBIC in terms of representation, Netra and Excalibur use the similar attributes representation scheme: feature vectors. ImageMiner uses textual description.

The ideal system should require the least amount knowledge about query language from users. Refinement search and query by example should always be there. Advanced query like drawing interested shapes is preferred. At the same time, the response time is another consideration. Last but not the least, the system should provide relevance feedback mechanism so that it can learn from users about the accuracy of search.
 

2. System Descriptions
 
    ImageMiner jointly developed by University of Bremen and IBM
Home Page:  http://www.informatik.uni-bremen.de/grp/ag-ki/projects/iris/iris-start.html   OR
                     http://www.software.ibm.com/data/mediaminer/immn0b15.html

Demo:          http://www.tzi.uni-bremen.de/BV/imageminer/Gui/queryland.html
 
ImageMiner automatically analyzes images and generates textual content descriptions of images. The analysis module processes images in respect to color, texture and contour features. This information triggers the recognition of objects, which can also be composed to complex objects. The retrieval module uses text retrieval products IBM SearchManager to manage the integrated retrieval of images.
 
Color attributes are represented in color rectangles. The system uses a grid with arbitrary size to subdivide the image into so-called grid elements. For every grid element, a color histogram is computed. The color that appears most frequently defines the color of the grid element. Then grid elements with the same color are grouped and the circumscribing rectangles are determined. The results of color-based segmentation are called color rectangles, with attributes such as size respective to the grid size, position respective to the underlying grid size, and the classified color. Example:

Color2 HOR=mid, VER=up, SIZ=XL, SHP=RECT
           COL=BLUE, UL=0--1, LR=44-11, DEN=415-495
 
Texture attributes are represented in texture rectangles. The image is segmented by an arbitrary grid. For every grid element, the system performs some matrix calculations and gets some statistics data. System is trained to learn a mapping between the statistical value and the texture. Grid elements with the same texture are grouped together and the circumscribing rectangle is determined. The results of the texture-based segmentation are called texture rectangles. Example:

Texture3 HOR=left, VER=mid, SIZ=S, SHP=Path
              TEX=clouds, UL=0-3,LR=3-3,DEN=4-4
 
Shape attributes are represented through contour-based shape descriptions. The description specifies the coordinate of the middle point, the size, and the bound coordinate of each region. Example:

Contour0 MID=24-7, SOP=45, UL=0-0,
               LR=44-17, SHP=UND
 
Objects are recognized based on the generated annotations of the three low level features: color, texture and shape. Object annotations could be
 �
 OBJ=sky
 OBJ=stone
 �
 
ImageMiner uses a weighed correlation measure to compute similarity between two feature vectors.

ImageMiner offers a user four levels of detail to combine a query in respect to

Users can query on multiple attributes. Search results have relevance ranking. The search engine supports hybrid queries, which enables mixing Boolean terms with texts.  Users can look for images using conceptual queries like "forest scene", "mountain scene", "ice". As IBM claimed, users can query in 17 languages including Japanese and simplified Chinese.

 
  Netra: developed at University of California
 
Home Page and Demo: http://vivaldi.ece.ucsb.edu/Netra/
 
Netra system supports image retrieval by color, contour (shape) and texture and locations.

Color: The system uses a compact color feature which represent each image region by a subset colors from a color codebook (256-color table). Color feature is represented as

where is I i is the index into the color codebook , P i is the corresponding percentage, and N is the total number of colors in the region. This representation support advanced color query such as "Find all image regions that have 50 % red and 30 % green".

Netra applies distance-based similarity measure. To compute the color distance of image region A and image region B, Netra takes the following steps:

 
    where Na is number of colors in region A, and
,
     (k is the closest color in region B for a color a in region A.) The whole process for computing color similarity can be expressed as
where Na is number of colors in region A, Nb is number of colors in region B.
 

Contour:  Netra supports three types of contour representations, which include curvature function, centroid distance, and complex coordinate functions.

Fourier-based shape description is used in the system.     The shape feature for curvature is :
    where F i denotes the component of Fourier transform coefficients, M is the number of samples in Fourier transform.
    Similarly, the shape feature for centroid is:
    The shape feature for complex coordinate is:
 
A Euclidean metric is used to compute the distance between two shape feature vectors.
 

Texture: Netra represents texture feature based on Gabor filters. Gabor filers are considered as orientation and scale tunable edge and line (bar) detectors, and the statistics of these micro features in a given region are often used to characterize the underlying texture information. Mean Character Difference measure is used to compute the difference between to texture vectors.

For a query consisting of more than one of image features, the intersection of the results of search using individual features is computed and then sorted based on a weighted similarity measure.  Netra uses an implicit ordering of the image features to prune the search space. The first feature that the user specifies is used to narrow down the space, within which a more detailed search is performed to similarity order the retrieval results.

Netra also allows users to specify spatial location to further disambiguate the retrievals. For example, consider a search for snow covered mountain pictures using color and texture. A spatially unconstrained search often finds regions of ocean surf as such regions also have similar texture and color. A search constrained to look for solutions only in the upper half of the image would eliminate or reduce such retrieval instances.
 
 

    Excalibur: developed at Excalibur Technologies

Home Page:    http://www.excalib.com/rev2/products/vrw/vrw.html
Demo:            http://vrw.excalib.com/cgi-bin/sdk/cst/cst2.bat
 
The image retrieval system is used as image search engine by Yahoo, infoseek, and image surfer. The search system supports search by color, shape, texture. Each attribute is represented in a feature vector. The feature vector is constructed with a set of Gaussian derivative filers.
 
The system uses Shadow Play algorithm to match images. The algorithm takes an image imprint (extracting feature vector), and then searches a database to find other images with similar imprint patterns (comparing distance).

Users can alter the relative importance of the image attributes. Users also can specify the brightness at each point in the image. Other query features include the measure of the hue, saturation, and brightness at each point in the image, and the measure of the ratio of the image's width to its height.
 

QBIC: developed jointly by University of Davis and IBM

Home Page and Demo:     http://wwwqbic.almaden.ibm.com/~qbic/qbic.html
          
QBIC (Query By Image Content) system supports query on image contents including colors, textures, shapes and locations of user-specified objects.

Color:  QBIC computes the average Munsell (Miyahara, et.al., 1988) coordinates of each object and image, plus a k element color histogram (k is typically 64 or 256) that gives the percentage of the pixels in each image in each of the k colors.

Texture:  QBIC's texture featuers are based on modified versions of the coarseness, contrast, and directionality features proposed in (H. Tamura, et.al., 1978). Coarseness measures the scale of the texture (pebbles vs. boulders), contrast describes the vividness of the pattern, and directionality describes whether or not the image has a favored direction or is isotropic (grass versus a smooth object).

Shape:  QBIC has used several different sets of shape features. One is based on a combination of area, circularity, eccentricity, major axis orientation and a set of algebraic moment invariants. A second is the turning angles or tangent vectors around the perimeter of an object, computed from smooth splines fit to the perimeter. The result is a list of 64 values of turning angle.

Location:  The location features are the x and y centroid of the object.

Weighed Euclidean distance measure is used in similarity matching of two feature vectors. Special similarity measures are adopted for histogram color and turning angle shape.

In QBIC, returned results are ranked and are shown in order with the best result in the leftmost position, next best in the next position, and so one. Each image returned is displayed as a reduced "thumbnail". The thumbnails are active menu buttons that can be clicked on to initiate the query "Finding images like this one". Each thumbnail image is also attatched with a link to its full-size image.
 

3. Comparisons

We have studied four on-line image retrieval systems, namely ImageMiner, Netra, Excalibur and QBIC. These systems have some common features.

In terms of query types and interface, all systems provide search on color, shape, texture or combination of them. Retrieved images are also ranked by relevance. Query by graphical exmapleall is available in all systems. However, all these systems do not support refinement search. Users cannot search images in a subset of database images. This is partcially because refinement search requires a large temporary storage for each search session and it seems impossible for Internet applications since number of search sessions is unpreditcable. Moreover, all systems do not have a facility to allow users to provide relevance feedback to the system. Especially for those textual descritpion-based image retrieval systems like ImageMiner, user relevace feedback is important for updating the image description to improve subsequent retrieval.
 
In terms of attribute representation and similarity matching, Netra, Excalibur and QBIC presents a similar scheme which is very different from ImageMiner. The former three systems use feature vector to represent the attributes. Distance-based measure is applied to similarity matching in these systems. For ImageMiner, attributes are represented in textual content descriptions and similarity matching is computed in Correlation measure.

ImageMiner, Netra, Excalibur and QBIC also have some distinct features.

In terms of attribute representation and similarity matching, Netra uses Eculidean metric to compute distance for Color and Shape attribute and adopts Mean character distance measure for Texture. Excalibur and QBIC uses Euclidean metric measure for all attributes.
 
Netra and Excalibur allow users to vary the importance of each search attribute but the way they do it is different. Netra implicitly uses the order of attributes user specifies whereas in Excalibur users can explicitly specify the weight percentage of each attribute.

In QBIC, search results are displayed in thumbnails which can be clicked to see the full-size image. It also supports search by keywords and URL. ImageMiner on-line demo does not provide text search but the vendor promises support for this functionality. QBIC also supports search by color layout and color percentage search such as "find all image regions that have 70 % blue and 20 % green".

Netra supports search by location to further disambiguate the retrievals. Netra also supports search by color percentage.

In Excalibur, users can search images by brightness, hue and saturation, and ratio of image's width to height.

ImageMiner supports conceptual query (query on objects in the image) thus semantics of the image can be quried to some degree. ImageMiner also supports search in 17 languages thus can be widely used across countries.
 
In terms of implementation, ImageMiner and Netra demos are java applet-based whereas QBIC and Excalibu uses CGI program. As we know, CGI-based client-server communication enforces browser screen refresh at each client request. But applet-based program make the page retain the same during interaction with users. Refreshing may disturb user attentions. The problem is espeically serious in QBIC system.
 
 

4.  Conclusion

This survey studies and evaluates four image retrieval systems on Internet. They are QBIC, ImageMiner, Netra and Excalibur. Image retrieval on these systems are content-based and similarity-based. Common content attributes including color, texture and shape are supported by all attributes.  The survey examines attribute representation, similarity matching formular in each system. The survey also study the query type and presentation interface offered by these systems. These systems are compared with each other to illustrate the state-of-art in current image retrieval systems on Internet.

Basd on these studies, I think an idea system should provide, or in other words, future image retrieval system developement should consider, the following features:

Query Based on Spatial Relationship

Current systems have focused on the representation of individual image regions and search strategies for single region queries. The next immediate goal should be integrating spatial relationship between regions into the image retrieval system. ImageMiner seems to provide this functionality but it assumes that the individual objects can be matched exactly and con-
centrate only on the matching of their spatial relationships. This is usually not the case for most image database applications where each region is represented by a set of image features and exact match between features is not a reasonable assumption.  An ideal system should integrate the region features and their spatial relationship into a unified representation. Such a strategy could support for high-level object based queries.

User's Relevance Feedback for Query Refinement

Modifying the retrieval results based on user feedback is another interesting consideration. An image retrieval system should be able to incorporate relevance feedback from users. One idea (W. Y. Ma, 1997) is to utilize a nonlinear mapping to learn the appropriate transformation from the original feature space to the new space where visually similar patterns will cluster together. This mapping function can be designed such that it can be dynamically adjusted or refined based on the user relevance feedback to improve retrieval performance.

Visual Thesaurus

Combine various low-level image attributes (color, texture, shape, spatial relationship, etc.) to construct a
visual thesaurus which can index image database based on the co-occurrence properties
among these image attributes.

Besides these considerations, other factors that make an image retrieval system perfect could be:

5. Acknowledgements

Special thanks to Zhang Yi and Zhao Xiangpeng who both helped me locate the available on-line image retrieval systems and their reference sources.
 

6.  References