Keyword Searching -- Overview
The Image database documents support boolean and phrase keyword searching, and subject heading searching (not available in this version).
For keyword searching, phrase searching is the default, and all boolean operators must be explicit. Thus
is NOT the same as
apple AND tree
but is a search for the phrase "apple tree" and is satisfied only by those words occurring in exactly that order with no intervening words.
A boolean search request consists of a group of words or phrases linked by connectors such as and and or that indicate the relationship between them. Examples:
Both words must be present
Either word can be present
Apple must occur within 5 words of pear
Apple must not occur within 5 words of pear
Only apple must be present
The field subjects must contain the word architecture
If you use more than one connector, you should use parentheses to indicate precisely what you want to search for. For example, apple and pear or orange could mean (apple and pear) or orange, or it could mean apple and (pear or orange).
Noise words, such as if and the, are ignored in searches.
Search terms may include the following special characters:
Words and Phrases
You do not need to use any special punctuation or commands to search for a phrase. Simply enter the phrase the way it ordinarily appears. You can use a phrase anywhere in a search request. Example:
apple w/5 fruit salad
If a phrase contains a noise word, the search engine will skip over the noise word when searching for it. For example, a search for statue of liberty would retrieve any document containing the word statue, any intervening word, and the word liberty.
Punctuation inside of a search word is treated as a space. Thus, can't would be treated as a phrase consisting of two words: can and t. 1843(c)(8)(ii) would become 1843 c 8 ii (four words).
Wildcards (* and ?)
A search word can contain the wildcard characters * and ?. A ? in a word matches any single character, and an * matches any number of characters. The wildcard characters can be in any position in a word. For example:
appl* would match apple, application, etc.
*cipl* would match principle, participle, etc. (this is a much slower search)
appl? would match apply and apple but not apples.
ap*ed would match applied, approved, etc.
Use of the * wildcard character near the beginning of a word will slow searches somewhat, particular depending on the number of matching records.
Synonym searching finds synonyms of a word in a search request. For example, a search for fast would also find quick. You can enable synonym searching for all words in a request or you can enable synonym searching selectively by adding the & character after certain words in a request. Example: fast& w/5 search.
The search engine can expand synonyms using synonyms from the search engine's built-in thesaurus, or using synonyms and related words (such as antonyms, related categories, etc.) from the search engine's built-in thesaurus. The Library uses the Wordnet (tm) thesaurus and synonym lists.
Fuzzy searching will find a word even if it is misspelled. For example, a fuzzy search for apple will find appple. Fuzzy searching can be useful when you are searching text that may contain typographical errors, or for text that has been scanned using optical character recognition (OCR). There are two ways to add fuzziness to searches:
- Enable fuzziness for all of the words in your search request using the check box. The level of fuzziness is set to 5 for this database, although it normally can be set from 1 to 10.
- You can also add fuzziness selectively using the % character. The number of % characters you add determines the number of differences the search engine will ignore when searching for a word. The position of the % characters determines how many letters at the start of the word have to match exactly. Examples:
Word must begin with ba and have at most one difference between it and banana.
Word must begin with b and have at most two differences between it and banana.
Phonic searching looks for a word that sounds like the word you are searching for and begins with the same letter. For example, a phonic search for Smith will also find Smithe and Smythe.
To ask the search engine to search for a soecific word phonically, put a # in front of the word in your search request. Examples: #smith, #johnson
You can also check the Phonic searching box in the search form to enable phonic searching for all words in your search request. Phonic searching is somewhat slower than other types of searching and tends to make searches over-inclusive, so it is usually better to use the # symbol to do phonic searches selectively.
Stemming extends a search to cover grammatical variations on a word. For example, a search for fish would also find fishing. A search for applied would also find applying, applies, and apply. There are two ways to add stemming to your searches:
- Check the Stemming box in the search form to enable stemming for all of the words in your search request. Stemming does not slow searches noticeably and is almost always helpful in making sure you find what you want. Stemming is enabled by default for this database.
- If you want to add stemming selectively, add a ~ at the end of words that you want stemmed in a search. Example: apply~
Variable Term Weighting
When the search engine sorts search results after a search, by default all words in a request count equally in counting hits. However, you can change this by specifying the relative weights for each term in your search request, like this:
apple:5 and pear:1
This request would retrieve the same documents as apple and pear but, the search engine would weight each appearance of the word apple five times as heavily as pear when sorting the results.
Field information is available so that you can perform searches limited to a particular field. For example, you could search for apple in the Subjects field like this:
subjects contains apple
address contains pacific ave
Useful fields available in this database include: description, subjects, date, address.
Use the AND connector in a search request to connect two expressions, both of which must be found in any document retrieved. For example:
apple pie and poached pear would retrieve any document that contained both phrases.
(apple or banana) and (pear w/5 grape) would retrieve any document that (1) contained either apple OR banana, AND (2) contained pear within 5 words of grape.
Use the OR connector in a search request to connect two expressions, at least one of which must be found in any document retrieved. For example, apple pie or poached pear would retrieve any document that contained apple pie, poached pear, or both.
Use the W/N connector in a search request to specify that one word or phrase must occur within N words of the other. For example, apple w/5 pear would retrieve any document that contained apple within 5 words of pear. The following are examples of search requests using W/N:
(apple or pear) w/5 banana
(apple w/5 banana) w/10 pear
(apple and banana) w/10 pear
Some types of complex expressions using the W/N connector will produce ambiguous results and should not be used. The following are examples of ambiguous search requests:
(apple and banana) w/10 (pear and grape)
(apple w/10 banana) w/10 (pear and grape)
In general, at least one of the two expressions connected by W/N must be a single word or phrase or a group of words and phrases connected by OR. Example:
(apple and banana) w/10 (pear or grape)
(apple and banana) w/10 orange tree
The search engine uses two built in search words to mark the beginning and end of a file: xfirstword and xlastword. The terms are useful if you want to limit a search to the beginning or end of a file. For example, apple w/10 xlastword would search for apple within 10 words of the end of a document.
NOT and NOT W/N
Use NOT in front of any search expression to reverse its meaning. This allows you to exclude documents from a search. Example:
apple sauce and not pear
NOT standing alone can be the start of a search request. For example, not pear would retrieve all documents that did not contain pear.
If NOT is not the first connector in a request, you need to use either AND or OR with NOT:
apple or not pear
not (apple w/5 pear)
The NOT W/ ("not within") operator allows you to search for a word or phrase not in association with another word or phrase. Example:
apple not w/20 pear
Unlike the W/ operator, NOT W/ is not symmetrical. That is, apple not w/20 pear is not the same as pear not w/20 apple. In the apple not w/20 pear request, the search engine searches for apple and excludes cases where apple is too close to pear. In the pear not w/20 apple request, the search engine searches for pear and excludes cases where pear is too close to apple.
Numeric Range Searching
A numeric range search is a search for any numbers that fall within a range. To add a numeric range component to a search request, enter the upper and lower bounds of the search separated by ~~ like this:
address contains (950~~1000 pacific)
This request would find any document containing a phrase in the address field which consists of a number from 950 to 1000 followed by the word pacific.
To cover NO and SO Pacific might look like
Numeric range searches only work with positive integers. A numeric range search includes the upper and lower bounds (so 12 and 17 would be retrieved in the above example).
For purposes of numeric range searching, decimal points and commas are treated as spaces and minus signs are ignored. For example, -123,456.78 would be interpreted as: 123 456 78 (three numbers). Using alphabet customization, the interpretation of punctuation characters can be changed. For example, if you change the comma and period from space to ignore, then 123,456.78 would be interpreted as 12345678.