Full Circle Resource Kits

Getting StartedDatabase ChoicesQuestion to QueryOperatorsHoming InBrowsingWeb 2.0EvaluationEthical Use

Just Added

Home > Getting Started > Featured Article

Time Traveling on the Internet

Carl Heine and Dennis O'Connor

searching the past and presentInformation on the Internet is either live or archived. It is very common to encounter both types of information while searching and browsing. Stepping from the Internet Present into the Internet Past is sometimes difficult to recognize. Fortunately, in the majority of cases this type of time travel has no significant consequences.

Occasionally, however, those who time travel without knowing it, encounter problems. These problems stem from not recognizing that information can be live and/or archived.

For example, the three links below show the BBC Home Web page. The first link leads to the live BBC news page; the other two links display archived copies of earlier editions. Depending on the precise moment these links are clicked, the information retrieved may look nearly identical or be considerably different.

live information

BBC

archived information

Google

archived information

Ask

 

The archived information block at the top of the page should tip the user off that they are viewing news from the "past."

Compare the url of the live BBC page: http://www.bbc.co.uk/

  • ...to the Google archived URL: http://64.233.167.104/search?q=cache:bAe-5jBVrukJ:www.bbc.co.uk/
  • ...to the Ask archive: http://www.askcache.com/webcp?q=bbc+news&t=bbc%2Bnews&r=bbc%2Bnews&cache=
  • 00*23bat1bsdflzd&qlang=3&url=http%3A%2F%2Fnews.bbc.co.uk%2Fdefault.stm&page=1&o=0&l=dir&ws=1&ax=1

 

The longer URLs are saved copies that will not change until the search engine crawlers revisit the BBC home page to copy updated information and refresh the search engine databases. Google & Ask crawl the BBC page on a different schedule yielding different results in each system's cache. Most fast-changing pages, like the BBC's, are crawled frequently but not necessarily daily. It is possible to find cached pages that have not been crawled for weeks.

Tripped up by Time Travel

Searching Google (or any other search engine) is really about searching its cache

Searching Google (or any other search engine) is really about retrieving the most current copy of a page. The most current copy is called the cache. The results snippet, which is an abstract of the cached page, is from the Internet past. Everything about the snippet is cached. The only way to the present is to click the title link. This takes the search from the past into the present. You leave the cache and jump to the current edition of the live page webpage.

This also explains why clicking a link from a search engine can hand you a PAGE NOT FOUND error message. This just means the url has changed since the last time the search engine made a copy. (Hint: Truncate the URL and find the local site search tools and you may find what you were looking for!)

Google Cache

Everything in this snippet goes back to the cache: the only way to the present is to click the first link!

troubleshooting kit

 

Troubleshooting

Most of the time, collecting information from the past is not a problem. Unless up-to-the-minute information or statistics are needed, a search engine will return the greatest number of documents and sources. But there are cases in which searching a cache becomes problematic.

  1. Should a Web page's URL change, a cached copy's live link will no longer work. The "broken" copy in the cache will be retained only until its crawlers try to revisit the page. If they cannot find the page, the cache is vacated, wiping out all traces of the former information. A different database may still have the information as long as its crawlers have not tried to revisit the live page. Broad-topic databases like archive.org and subject-specific ones like mathforum.org will retain archived copies long after Google and others have deleted theirs.

  2. If a "Page Not Found" is encountered when clicking on a snippet's live title link, go back and click the Cached copy instead, if available.

  3. Not all archived information is accessible with a search engine. In the BBC example above for instance, when we were developing this article, Yahoo did not link to its cache for the BBC home page. MSN and Ask intermittently provided a link--it is entirely possible when you read this the link to the Ask cache won't work. That means it is possible to retrieve information based on what's stored in the cache, only to find that the live page no longer contains that information. This actually happens a lot. The only way to find the information cited in the snippet is to search the live site, hoping it has a search engine, subject directory or links to previous versions of the page.

Other Resources than can help

To help students grasp the methods, advantages and disadvantages of live and archived searching, check out these resources that help to improve strategic choices:

Choose an activity Article > Live Searching: About Browsing
  There is only one way to search live pages: browsing. Browsing requires knowing what keywords may lead in the direction of the information needed. Browsing also depends heavily on luck, which makes it a more difficult search method than using a search engine.
Choose an activity Curriculum > Searching the Cache: Three Choices
  There are three ways to search a cache: search engine, subject directory and browsing. Each has advantages in certain situations.

Back