ENGL 102
Writing and Research
Home Page >> Information Sheet >> Using a Search Engine

Everyone knows that finding information on the Internet is easy. You go to the webpage, whatever it might be, put your search term in the little box, and click the button. Then, miraculously, your results show up on your screen in a matter of seconds. However, while the surface appearance of these information sources may look the same, the inner workings can be very different, and smart researchers need to know these differences, so that they can use them to their advantage.

Defining Directories     Defining Search Engines     Search Engine "Tricks"     Google Scholar    Summary

Most people are very confused about the difference between a search engine and a directory. Heck, most people don't care, but they should. Understanding the difference can help explain why the results from one "search engine" can be so different from the results of another. Directories are selected groups of webpages that are presented in an alphabetical listing, usually grouped by subject. Somebody, usually a newly hired worker, is paid to surf to various webpages and determine which ones should be included in the directory. Sometimes the decision to include a page is based on its content, but it's just as often decided on the basis of flashy graphics or other questionable criteria. After all, these surfers are NOT experts in these areas and do not necessarily have the knowledge to decide what should be included and what should not be included.

Yahoo is the best known of the directories. It is not really a search engine. When you search Yahoo, you do not search the whole Internet. You only get the results that Yahoo the company has decided to include and show you, for whatever reason. Unfortunately, more and more often, the choice rests on what company has paid the most money to be included. Most directories today are extremely commercialized. That's a good thing if you are searching for something to purchase. However, if you are researching for a paper and need information, it's not very helpful at all. Here's a hint: when you are on the homepage, if there's a link for "submit a site" and that link leads you to a page that asks for money to be included in the directory, it's not a good source for researching a paper.

There are some directories that are helpful. The Open Directory (dmoz.org) and About (about.com) are both directories that were set up with the notion that experts in the areas should compile their lists. In other words, it's like having the PhD in entomology give you a list of his/her favorite websites about insects. These directories tend to de-emphasize the commerce and re-emphasize the scholarship, but they are still limited to the preferences of one person. These directories can be a nice starting point, but they aren't a substitute for a good complete search engine and an academic database.

Search Engine:
This is a program that searches a group of documents or webpages to find matches for its user. Some websites use search engines to search only on their particular website. However, when researchers discuss search engines, they mean search engines that try to index the entire Internet. An Internet search engine sends out little programs called "spiders" that go from webpage to webpage, following the links on those pages. Each page that is found is sent back to the search engine's main database. Thus, a search engine is a form of database, but its contents are constantly changing as new webpages are found and read, and the maintainer/owner of the database does not have control over what content is included the way a traditional database owner does. Most internet search engines have gotten to the size where they've read billions of pages of information, and have even begun reading and including not only HTML webpages, but also documents created and saved as word processing files, Powerpoint presentations, etc.

Internet search engines are typically public tools, available to anyone who knows the URL (Uniform Resource Locator), or web address of the search engine. The webpages that they "spider" and put into their database are public documents, available for any researcher to use free of charge. Unlike a database, where a choice is made about what to include, the Internet search engine includes everything. That's both good and bad. All webpages are not equal. Somebody who has a personal interest in Spiders can post their own webpages talking about their favorite spiders. A PhD in entomolygy can also have his pages posted on the web, full of the information that s/he has researched and learned. Of course you would want to use the scientist's webpages for your research paper, but the typical search engine is going to list both of these pages about the same, because they are both about the same topic.

The search engine can't judge by quality, at least not usually. However, the new generation of search engines like Google and Teoma are doing much better about how they process and rank pages. Search engines rank pages based on formulas called algorithms. These algorithms used to be fairly easy to beat: they mainly counted how soon and how often you used the search term in the webpage. The sooner you used it, and the more often you used it, the higher your page would rank on the list of results (this is important because most people don't go beyond the first page of results). Google and Teoma have both begun using different algorithms that rank a page based on other, more quality focused criteria.

Specialized Search Engines function like a regular search engine, but limit the sites that they "spider" in order to pull up more focused, specialized results. If you are researching a medical topic, you can use a general search engine and get 60,000 matches, or use a specialized medical search engine and get 600. At first glance you might think that you want the bigger number, but the smaller number of matches that are more likely to be what you need/want is actually the better choice. The best source that I've found yet for finding specialized search engines is Beaucoup(www.beaucoup.com ) See if it has any search engines focused on your subject area. You will not only save yourself time, but you will usually get better results as well.

Search Engine "Tricks":
People think that they know how to search in a search engine, but there are some important "tricks" that will help you do even better.
Google Scholar:
Google Scholar is similar Google, but it's also a lot like a database, and the differences often confuse people who try to use it. Then, because it doesn't work like regular Google, they decide that it's bad and don't use it again.

Google Scholar is a search engine that finds references to ACADEMIC quality material only. This means that it limits its search to materials published in Academic Journals or sponsored by Academic Institutions. Personal websites are not going to be part of these results. For example:

There are several elements of this entry that I'd like to point out:
As I indicated up above, whenever possible, Google Scholar tries to take you to full-text of the article. But many times, the full-text is not available for free. But you should have enough information that you can go to the databases that you have access to, or request an inter-library loan to get that article. Yes, it takes a bit longer, but at least you know that you want/need the information.

Remember that Google ranks webpages in terms of how popular they are. So the results of a Google Scholar search are ranked in terms of how often the source has been used as a citation by other sources. The more often it is cited, the higher it ranks. This has the unfortunate effect of making older material rank higher than newer material, so if your instructor has limited you to only newer material, you might not be able to use the results on the first few pages. On the other hand, if a paper has been cited by 710 other academic writers, it's most likely a pretty solid source!! (and if you click on the "Cite by xxx" link, those are often listed with newer information closer to the top).

Where you find your information will make a difference in determining how good of information it is and how to cite it. A good research plan for electronic information would look like this:

Remember to go to an academic library as well. Not everything is found on the Internet, you know. An academic library will have current, appropriate material for a college/professional paper. A public library most likely will not have the appropriate level of material, since it isn't focused on academic research.