Writing and Research
Home Page >> Information
Sheet >> Using a Search Engine
Everyone knows that finding information on the Internet is easy. You go to the webpage, whatever it might be, put your search term in the little box, and click the button. Then, miraculously, your results show up on your screen in a matter of seconds. However, while the surface appearance of these information sources may look the same, the inner workings can be very different, and smart researchers need to know these differences, so that they can use them to their advantage.
- Most people are very confused about the difference between a search engine and a directory. Heck, most people don't care, but they should. Understanding the difference can help explain why the results from one "search engine" can be so different from the results of another. Directories are selected groups of webpages that are presented in an alphabetical listing, usually grouped by subject. Somebody, usually a newly hired worker, is paid to surf to various webpages and determine which ones should be included in the directory. Sometimes the decision to include a page is based on its content, but it's just as often decided on the basis of flashy graphics or other questionable criteria. After all, these surfers are NOT experts in these areas and do not necessarily have the knowledge to decide what should be included and what should not be included.
- Yahoo is the best known of the directories. It is not really a search engine. When you search Yahoo, you do not search the whole Internet. You only get the results that Yahoo the company has decided to include and show you, for whatever reason. Unfortunately, more and more often, the choice rests on what company has paid the most money to be included. Most directories today are extremely commercialized. That's a good thing if you are searching for something to purchase. However, if you are researching for a paper and need information, it's not very helpful at all. Here's a hint: when you are on the homepage, if there's a link for "submit a site" and that link leads you to a page that asks for money to be included in the directory, it's not a good source for researching a paper.
- There are some directories that are helpful. The Open Directory (dmoz.org) and About (about.com) are both directories that were set up with the notion that experts in the areas should compile their lists. In other words, it's like having the PhD in entomology give you a list of his/her favorite websites about insects. These directories tend to de-emphasize the commerce and re-emphasize the scholarship, but they are still limited to the preferences of one person. These directories can be a nice starting point, but they aren't a substitute for a good complete search engine and an academic database.
- Search Engine:
- This is a program that searches a group of documents or webpages to find matches for its user. Some websites use search engines to search only on their particular website. However, when researchers discuss search engines, they mean search engines that try to index the entire Internet. An Internet search engine sends out little programs called "spiders" that go from webpage to webpage, following the links on those pages. Each page that is found is sent back to the search engine's main database. Thus, a search engine is a form of database, but its contents are constantly changing as new webpages are found and read, and the maintainer/owner of the database does not have control over what content is included the way a traditional database owner does. Most internet search engines have gotten to the size where they've read billions of pages of information, and have even begun reading and including not only HTML webpages, but also documents created and saved as word processing files, Powerpoint presentations, etc.
- Internet search engines are typically public tools, available to anyone who knows the URL (Uniform Resource Locator), or web address of the search engine. The webpages that they "spider" and put into their database are public documents, available for any researcher to use free of charge. Unlike a database, where a choice is made about what to include, the Internet search engine includes everything. That's both good and bad. All webpages are not equal. Somebody who has a personal interest in Spiders can post their own webpages talking about their favorite spiders. A PhD in entomolygy can also have his pages posted on the web, full of the information that s/he has researched and learned. Of course you would want to use the scientist's webpages for your research paper, but the typical search engine is going to list both of these pages about the same, because they are both about the same topic.
- The search engine can't judge by quality, at least not usually. However, the new generation of search engines like Google and Teoma are doing much better about how they process and rank pages. Search engines rank pages based on formulas called algorithms. These algorithms used to be fairly easy to beat: they mainly counted how soon and how often you used the search term in the webpage. The sooner you used it, and the more often you used it, the higher your page would rank on the list of results (this is important because most people don't go beyond the first page of results). Google and Teoma have both begun using different algorithms that rank a page based on other, more quality focused criteria.
- Specialized Search Engines function like a regular search engine, but limit the sites that they "spider" in order to pull up more focused, specialized results. If you are researching a medical topic, you can use a general search engine and get 60,000 matches, or use a specialized medical search engine and get 600. At first glance you might think that you want the bigger number, but the smaller number of matches that are more likely to be what you need/want is actually the better choice. The best source that I've found yet for finding specialized search engines is Beaucoup(www.beaucoup.com ) See if it has any search engines focused on your subject area. You will not only save yourself time, but you will usually get better results as well.
- Search Engine "Tricks":
- People think that they know how to search in a search engine, but there are some important "tricks" that will help you do even better.
- Use "phrase searches" whenever possible.
A "phrase search" is simply to take your topic (if it's more than one word) and put it inside quotation marks.
A Google search for death penalty (no quotation marks) finds 27,900,000 hits
A phrase search for "death penalty" finds 18, 200,000 hits
At first it seems like the phrase search works worse, but not if you remember how search engines work. You aren't going to look at 27 OR 18 million matches. So more is not better when it comes to search engine results. You actually want to use the terms that are going to start limiting and reducing the number of matches to an amount that you can actually review.
- Use Multiple word searches.
A search engine is NOT a database or card catalog. Those tools require short, simple terms to find information on your topic. Search Engines actually work best when you use MORE terms.
As you saw up above, a phrase search for "death penalty" finds 18, 200, 000 hits.
You are actually interested in researching the connection between the death penalty and the injunction against cruel and unusual punishment in the 8th Amendment of the Constitution.
If you do a Google search for "death penalty" "cruel and unusual punishment", you get 178,000 hits. THAT'S closer to a workable number of sources, and they are almost all specifically relevant to what you want to research.
- Use Advanced Search Options:
- All search engines offer Advanced Search options. These allow you to limit the search in terms of language (example: English only), how current the page is, certain file formats (only pdfs, for example), or even what domain the website is in.
Domains are the end part of the URL -- .com, .org, .edu, .mil, etc. For academic research, students are usually advised to limit their searches to .org (non-profit organizations), .edu (educational institutions) and .gov (government sites).
- One "hidden" feature of most search engines is what are called "qualifiers." These are symbols that you can use to either eliminate or emphasize a term in your results.
The - symbol put before a term tells the search engine to leave out any page that also uses this term. Using our previous search as an example, if you find that many of your sources are also discussing the death penalty as a form of torture. You don't want to discuss torture, so you could do a search for "death penalty" "cruel and unusual punishment" -torture and you get 133,000 hits.
The + symbol put before a term tells the search engine to put special emphasis on the pages where that term is used more than pages where it is used less. Google does not use the + symbol, but other search engines do.
- Google Scholar:
- Google Scholar is similar Google, but it's also a lot like a database, and the differences often confuse people who try to use it. Then, because it doesn't work like regular Google, they decide that it's bad and don't use it again.
- Google Scholar is a search engine that finds references to ACADEMIC quality material only. This means that it limits its search to materials published in Academic Journals or sponsored by Academic Institutions. Personal websites are not going to be part of these results. For example:
- There are several elements of this entry that I'd like to point out:
- the title takes you to either the abstract or the full text of the article.
- the green information under the title gives you the author and the basic URL for where Google found the information
- The black information under the author's name gives you the basic journal publication/academic source information and the first few words of the abstract/text.
- the very bottom light blue information tells you how many other sources have cited this article, a link to search about the article, and a secondary source to find the abstract/text of the article. All of these bottom elements are clickable links.
- As I indicated up above, whenever possible, Google Scholar tries to take you to full-text of the article. But many times, the full-text is not available for free. But you should have enough information that you can go to the databases that you have access to, or request an inter-library loan to get that article. Yes, it takes a bit longer, but at least you know that you want/need the information.
Remember that Google ranks webpages in terms of how popular they are. So the results of a Google Scholar search are ranked in terms of how often the source has been used as a citation by other sources. The more often it is cited, the higher it ranks. This has the unfortunate effect of making older material rank higher than newer material, so if your instructor has limited you to only newer material, you might not be able to use the results on the first few pages. On the other hand, if a paper has been cited by 710 other academic writers, it's most likely a pretty solid source!! (and if you click on the "Cite by xxx" link, those are often listed with newer information closer to the top).
- Where you find your information will make a difference in determining how good of information it is and how to cite it. A good research plan for electronic information would look like this:
- Search an academic database like WilsonWeb to find out what professionals are writing about your topic
- Search an expert-based directory like About.com or dmoz.org to see what an expert has found on the Internet
- Search a good search engine like Google or Teoma to get an idea of what else is out there on the Internet beyond what the expert listed in the directory.
- Don't forget to use Google Scholar as well.
Remember to go to an academic library as well. Not everything is found on the Internet, you know. An academic library will have current, appropriate material for a college/professional paper. A public library most likely will not have the appropriate level of material, since it isn't focused on academic research.