Monday, May 29, 2006

Search Engines

Search Engine is a program that searches documents for specified keywords and returns a list of the documents where the keywords were found. Although search engine is really a general class of programs, the term is often used to specifically describe systems like Google, Alta Vista and Excite that enable users to search for documents on the World Wide Web and USENET newsgroups.

Typically, a search engine works by sending out a robot, spider or crawler program to fetch as many documents as possible. A robot is a piece of software that automatically follows hyperlinks from one document to the next around the Web. Another program, called an indexer then reads these documents and creates an index based on the words contained in each document. Each search engine uses a proprietary algorithm to create its indices such that, ideally, only meaningful results are returned for each query. Broadly, there are two types of search engines:

1. Individual: Individual search engines compile their own searchable databases on the web.
2. Meta: Metasearchers do not compile databases. Instead, they search the databases of multiple sets of individual engines simultaneously.

Searching Techniques
Keywords are the words or phrase that search engines use to search for the relevant site. In addition to keywords, the user can use various techniques or operators for more accurate results.

1. Use quotes (“ ”): When more than one keyword is entered for search, the search engine takes them to be different and unrelated words. The results displayed may therefore not be accurate. Quotes operator can be used to search the exact term in the same order and thereby filter the information sought.

2. Use (+) or AND: The (+) or AND operators can be used to search more than one word appearing on a webpage not necessarily in the same order. Space must however, be provided after the first word as a matter of syntax.

3. Use (-) or NOT: The (-) or NOT operators can be used to search webpages where the first word is appearing independent of the second word. Space must however, be provided after the first word as a matter of syntax.

4. Use OR: The OR operator if placed between the keywords will display sites containing the either words alone. Space must however, be provided after the first word as a matter of syntax.

5. Use Wildcards (*): The Wildcard (*) operator if followed by atleast four characters would display sites containing the words, which have words whose initial characters are same as the characters entered before wildcard.

6. Use (~): The (~) operator can be used to search not only for a particular keyword, but also for its synonyms.

7. Use (Site: Site name): If one knows the website one wants to search but is not sure where the information is located within that site, one can use a search engine to search only that domain. This can be done by typing the keyword and following it by the word "site" and a colon followed by the domain name in which the search is to be performed.

The most popular search engine today is www.google.com. It has a database of over 8 billion webpages. Some of the facilities provided by the search engine are discussed as under:

1. Cached Links: Google takes a snapshot of each page examined as it crawls the web and caches these as a backup in case the original page is unavailable.

2. Calculator: Google's offers built-in calculator function which can be used to solve math problems involving basic arithmetic, more complicated math, units of measure and conversions, and physical constants.

3. Definitions: Google also provides the facility to see the definition for a word or phrase. This can be done by simply typing the word "define," then a space, and then the word(s) one wants to be defined.

4. File Types: Google provides file type search in 12 file formats other than the HTML file format. Google now searches Microsoft Office, PostScript, Corel WordPerfect, Lotus 1-2-3, and other file formats. The new file types will simply appear in Google search results whenever they are relevant to the user query. Google also offers the user the facility to "View as HTML", allowing users to examine the contents of these file formats even if the corresponding application is not installed. The "View as HTML" option also allows users to avoid viruses, which are sometimes carried in certain file formats.

5. Froogle: Froogle is the product search service provided by Google to search the information regarding particular products. These product search results are linked to the sites of merchants who participate in Froogle.

6. Local Search: Google Local enables one to search the entire web for just those stores and businesses in a specific neighborhood. This can be done by including a city or zip code in the search and Google displays relevant results from that region at the top of the search results.

7. News Headlines: When searching on Google one may see links at the top of the results marked "News". These links connects one to reports culled from numerous news services Google continuously monitors. The links appear if the terms one enter are words currently in the news and clicking on them will take one directly to the news service providers’ website.

8. Spell Checker: The Google spell checker software automatically analyses the keyword (s) entered and suggests common spellings for the keyword (s).

9. Webpage Translation: Google breaks the language barrier with this translation feature. Using machine translation technology, Google facilitates English speakers access to a variety of non-English web pages. This feature is currently available for pages published in Italian, French, Spanish, German, and Portuguese. If the search has non-English results, there will be a link to a version of that page translated into English.

10. Submit your site: Google provides the facility to the users to submit their websites to the Google’s Index. One may also add comments or keywords that describe the content of that particular page or site.

Microsoft has also recently entered the market of search engines and has launched its own web search technology to challenge Google's long dominance of the field with results tailored to a user's location and answers from its Encarta Encyclopedia. The Microsoft search engine, offered in 11 languages, is available on the "test" site (http://search.msn.com). Redmond-based Microsoft has long offered a search engine on its MSN website, but the technology behind was powered by subsidiaries of Yahoo. The company has admitted that it had erred by not developing their own search technology earlier. But now they have devoted $100-million in an aggressive catch-up effort. The company is also committed to clearly separate paid search results from those based purely on the relevancy. Microsoft’s site has more than five billion web pages at present.

With two big giants fighting out for the market leadership the Netizens are sure going to emerge as the final beneficiary.