Chapter 5
Electronic commerce information’s search and selection
In physical markets, consumer search activities include reading advertisements, calling vendors, and visiting stores. In a virtual marketplace, all these activities seem to converge into Web searches and Web browsing. Not surprisingly, search services are the first market infrastructure to be built in the electronic marketplace.
5.1 Consumer Searches and Electronic Commerce
Similar to searches in physical markets, online searches can also be carried out either sequentially or simultaneously. Surfing through different Web stores is a sequential search while price search based on price database will be a case of simultaneous search. In either case, online search offers a tremendous advantage over physical search. Besides the lowered costs for time and transportation, computer-based search allows consumers to remember and compare information gathered from many stores. Furthermore, online searches enable consumers to process a wide range of information other than price—e.g. location and name of vendors, terms of sales, quality and performance variables, brand names, sizes and other product characteristics, etc. Comparing prices alone will strain the capacity to process information in physical markets, especially if shopping involves many products. Online search technologies will automate this process and allow consumers to engage in more sophisticated and efficient searches.
The search and information transmission mechanisms used in the electronic marketplace are too new for researchers to have determined their efficiency. In fact, there are contradicting predictions about what that will be. One view is that by using computer technologies such as search engines and intelligent software agents, consumers may be able to search the whole information space at no cost. For example, suppose you want to buy a product. Using a computer program, you initiate a search mechanism that searches all the Web pages on the Internet for a product that matches your needs. The search generates a table of names of sellers, prices, locations and product specifications as well as other relevant information such as seller reputation, past sales records, etc. You then choose a seller among the candidates, and initiate a purchase order. While this scenario is close to one of zero search cost, which would produce an efficient market, there are many reasons why the electronic marketplace may not actually be so efficient. In the first place, sellers may not provide relevant information. Secondly, search algorithms or techniques may not be sufficient to gather all the relevant information. This may be because of access difficulties—as some Web sites do not allow access—or because all searches inevitably select and process information based on prescribed criteria which may have non-technical problems. Lastly, economic analyses indicate that a non-zero search cost, however small it may be, results in noncompetitive pricing. Using electronic media may reduce search costs to an arbitrarily small amount, but the cost will still be non-zero. In mathematical models, a reduction in search costs is quite different from an elimination of search costs. In this regard, it may be reasonable to assume that the problems associated with information will persist in electronic commerce as they do in physical markets.
Consumers may behave differently in the electronic marketplace than in physical markets where search costs are in general positive. This positive—however small— search cost results in higher than competitive prices—popularly known as 'Diamond paradox' (Diamond 1971). Should search costs be always positive? Admittedly, there are shoppers to whom searches seem to be enjoyable instead of 'costing something'. On the Internet, 'surfers' often resemble those shoppers who happily visit stores and take a look at various merchandises. Armed with ever-present, powerful archiving programs, online surfers will be able to gather information while enjoying themselves. When they process this information for purchasing decision, the net cost of search may indeed be zero—or certainly not positive—debunking the paradoxical result of monopoly price equilibrium under positive search costs (Stahl 1996)
5.2 Feature: Efficiency
5.2 .1 Search Market Efficiency
A search market consists of three components: content providers, selection process, and access. By separating these components, which may occur almost simultaneously in a typical search process, we can compare different types of search activities and evaluate their efficiencies.
(1) Content providers. The contents provided by sellers largely defines the informational space a search can occupy. Understandably, some product information may not yet be available in digital format. Information that does already exist includes primary sources such as company web pages and secondary sources such as bot-generated indexes and evaluation databases. Secondary sources often filter and reduce the amount of information but add the expertise of the information brokers.
(2) Selection. The process of electronic selection consists of various forms of information query based on keywords or subjects. Interactive queries result in individualized sorts. A non-interactive selection process includes classified ads, directories, or other types of information brokers, where entries are organized by some preselected criteria and presented as such. Internet searches on Lycos or Yahoo are this selection process.
(3) Access. Through selection, consumers have a list of information sites that fit their search criteria. But, to actually view these documents, selected information must be downloaded or accessed by visiting the Web sites. The access occurs in two stages: connecting and retrieving processes.
An ideal search market, therefore, allows consumers a series of filtering process by which they may reduce the universe of available information to a manageable and meaningful size. An efficient Internet search market can be depicted, as in Figure 5.1, as the content space available on the Internet containing the set of selected information, which also contains accessed information space. In this case, even though some product information is only available offline, the online search market is efficient because all contents that are relevant (the area of the pentagon) exist online (the rounded rectangle). In other words, one is a proper subset of the other in the order of (1) contents, (2) selection, and (3) access. If any or some of them are not a proper subset, the search market is not efficient. For example, if some contents, which are needed in (2) selection process, are not available online, the search process cannot be efficient.
Figure5-1 An efficient Internet search market
In Figure 5.2, (a) shows a case where some information, although relevant, is not available online. As a result, only the contents accessible online are retrieved. Even when contents are available online, the search market may fail if these contents are not accessible, for example due to access restriction or congestion (see (b) and (c) in Figure 5.2). Finally, consumers may have to rely on both online and off-line information channels to complete a search (see (d) in Figure 5.2) as is the case with today's market. The obvious implication is that the information available in physical market must also be available online to prevent search problems such as (a) and (d) in Figure 5.2.
A B
C D
Figure5-2 Examples of an inefficient search market
(a) Some information relevant to selection is not available online.
(b) Access problem, where relevant information is not accessible.
(c) Access problem, where only some relevant informationis accessible.
(d) Traditional information access, where both online and off-linemethods have to be used.
Despite some reservations, search services play an important role by aiding consumers in the selection process. In this way, search services are in fact intermediaries who broker product information between sellers and buyers. According to the theory of disintermediation, electronic commerce represents a market where intermediaries will disappear because consumers can interact directly with producers. In such a market, consumers will not need search intermediaries since, for example, consumers will be able to use a powerful search program of their own. Today's search services in fact send out intelligent programs or automated robots to gather information about Web documents. Consumers, in theory, can employ their own agents who roam the cyberspace with a predetermined mission and report back to their owners. On the other hand, search intermediaries may continue to serve in the electronic marketplace for several reasons.
5.2.2 Search Efficiency in Intermediaries
In terms of network traffic, individual agent-based searches will generate much duplication of accessing and downloading information since each consumer must send their own query over the network. This duplicative traffic can be minimized by using intermediaries who collect, process, and store the information.
The efficiency in intermediating potentially duplicative and wasteful information access on the Internet resembles that of wholesaling and retailing in physical markets. By handling products in bulk, wholesalers and retailers in physical markets minimize transportation costs in distributing these products to geographically dispersed end users. For digital products, however, a producer needs only to send one copy to a wholesaler or a retailer, and thus there is no reason to be concerned with minimizing distribution costs. And since no online retailer is closer to consumers than their suppliers, we need not consider the distributive efficiency. Nevertheless, an online intermediary minimizes distribution costs in its own way by reducing costs associated with network traffic. If we compare intermediated and disintermediated markets (see Figure 5.3), the similarity is striking.
The stylized diagram, Figure 5.3, shows how consumers access product information. In (a), each buyer sends a query to all sellers to get product information, whereas in (b), buyers can get information from the intermediary who receives information packages from all these sellers. In a similar delivery scheme in physical markets, such an intermediated structure may not be efficient if some sellers are located closer to buyers than is the intermediary. A significant inefficiency can occur in the hub-and-spoke system used by airlines if some passengers (buyers) are forced to go through the hub (intermediary) regardless of the extra distance involved. In the virtual environment of the electronic marketplace, however, an intermediated search market dramatically reduces duplicated traffic and enhances network efficiency.
This network efficiency has little to do with the intermediary's role in assisting consumers' selection process, the efficiency resulting simply from providing a centralized outlet for all sellers. But, this centralization needs not require the same contents to be stored in both producers' and the intermediary's Web sites—a wasteful duplication. Instead, the product information at the intermediary's Web site will have only the necessary information for buyers to make purchase decisions. In a way, the intermediary also acts as an information filtering agent, which is the second type of efficiency in intermediation. Besides intermediaries, consumers have many tools to filter information, and for this reason, we discuss information filtering in the next section.
Figure5-3 Information access with and without an intermediary
(a) Disintermediation (b) Intermediation
In an extreme case, proper selection and evaluation of a product may require full information contained in the seller's Web site instead of a summary provided by an intermediary. In that case, face-to-face information exchanges can actually be more efficient than intermediation because of the latter's unnecessary duplication. But this will be more of an exception than the rule in electronic commerce because the quality of a digital product is difficult to evaluate even with full information or the product itself. More importantly, intermediaries also resolve the quality uncertainty problem. If buyers were to contact sellers directly, the accessed information might not be reliable unless the content providers were trust worthy. As we examined, by using a simple contract, intermediaries often become trusted third parties in electronic commerce even without verifying all products they broker.
5.3 Search Services on the Internet
In this and the next two sections, we examine various search services on the Internet in terms of the search market and information efficiencies we just discussed. In addition, we compare the network efficiency of information search channels and discuss some implications of this on market organization and advertising.
5.3.1 Search or Surf?
Searching on the Internet starts with a need or a motive to find something, in a stark contradiction to the popular Internet surfing which implies a random, aimless hopping through hyperlinks for fun. Less than five years ago, 'surfing the net' was the main activity for many Internet users. Today, online users begin by visiting their bookmarked sites or by searching for specific sites. The growth in search activity on the Internet represents a new phase in the development of the virtual space. What used to be something equivalent to taking a stroll has become more of an organized mission compiling a list of links, bookmarks, recommended sites, and ultimately an organized personal directory. Such a directory would be extremely useful in mapping out the virtual space. To avoid unnecessary visits, then, a directory should be complete, accurate, meaningful and objective. Current search services are lacking in these aspects.
5.3.2 Inadequacies of Search Services
A complete listing of Web sites and their documents currently does not exist. Instead, consumers need to visit different search sites or relevant Web sites which might have useful links. This lack of a complete directory is not in itself a new problem. In physical markets, a Yellow Page directory only lists local businesses, and there are a number of specialized directories for different industries and markets. However, there is no reason why all information housed in a library's reference section can not be combined into one database, especially on the Internet. Combining different Internet search databases will further alleviate the hassle of having to use several search services and the duplicative costs of collecting the same information. To recover the cost of compiling an Internet database, more and more search services are preoccupied with soliciting advertisers instead of improving data integrity and search efficiency. Search service may be one of a few Internet services which are truly essential in enhancing the usability and usefulness of the Internet for commerce. An incomplete search service will be as useful as a partial phone directory.
Internet search databases are also inaccurate and out-dated since Web sites are constantly changing. They often give consumers those links that no longer exist. In such an environment, updating may require as much effort as compiling the initial database. An alternative may be to accept—or require—submissions by site owners about changes. Another inaccuracy stems from Web sites misrepresenting and pretending to be something that it is not. That possibility compels data compilers to verify each site manually, further increasing costs of maintaining an accurate database. A more coordinated system of feedback between content providers, users and search services is needed.
A third inadequacy of current search services is the irrelevancy of some sites matching search keywords. One problem stems from the lack of sophisticated and complex search mechanisms to weed out irrelevant information. Equally lacking is a proper description for each Web site and its materials upon which to base a search. As a result, a simple search often produces tens of thousands of meaningless links. Digital document metadata standards need to be established and accepted by content providers, and become part of content creation.
Finally, search results need to be objective. Results can be skewed if the database itself consists of information which is pre-selected based on arbitrary criteria. Some search services do not include personal homepages or materials residing on university Web sites. Others reject Web sites which are considered offensive, indecent, or frivolous by their own standards. Also, with the increasing commercialization, some search service providers may give preference to paying advertisers. Although all these are reasonable behaviors for private enterprises, what would be the use of a phone directory if it omits all 'Smiths' or those living in an area with a particular zip code? An Internet search service is no longer just a spring board for Internet surfing. Rather, as an essential infrastructure, its database needs to be complete and accurate to foster an efficient information exchange.
Finally, search results need to be objective. Results can be skewed if the database itself consists of information which is pre-selected based on arbitrary criteria. Some search services do not include personal homepages or materials residing on university Web sites. Others reject Web sites which are considered offensive, indecent, or frivolous by their own standards. Also, with the increasing commercialization, some search service providers may give preference to paying advertisers. Although all these are reasonable behaviors for private enterprises, what would be the use of a phone directory if it omits all 'Smiths' or those living in an area with a particular zip code? An Internet search service is no longer just a spring board for Internet surfing. Rather, as an essential infrastructure, its database needs to be complete and accurate to foster an efficient information exchange.
5.3.3 Search Engine
1. Search Engine Math
Forget power searching. Don't worry about learning to do a "Boolean" search. All most people need to know is a little basic "search engine math" in order to improve their results. Come learn how to easily add, subtract and multiply your way into better searches at your favorite search engine. The information below works for nearly all of the major search engines.
Before learning math, it's a helpful reminder that the more specific your search is, the more likely you will find what you want. Don't be afraid to tell a search engine exactly what you are looking for.
① Using The + Symbol to Add
Sometimes, you want to make sure that a search engine finds pages that have all the words you enter, not just some of them. The + symbol lets you do this.
For example, imagine you want to find pages that have references to both President Clinton and Kenneth Starr on the same page. You could search this way:
+clinton +starr
Only pages that contain both words would appear in your results. Here are some other examples:
+windows +98 +bugs
That would find pages that have all three of the words on them, helpful if you wanted to narrow down a search to Windows 98 bugs, rather than on Windows 98 in general.
The + symbol is especially helpful when you do a search and then find yourself overwhelmed with information. Imagine that you wanted to reserve a camping space in California's Yosemite National Park. You might start out simply searching like this:
Yosemite
If so, chances are, you'll probably get too many off-target results. Instead, try searching for all the words you know must appear on the type of page you're looking for:
+yosemite +camping +reservations
② Using The - Symbol to Subtract
Sometimes, you want a search engine to find pages that have one word on them but not another word. The - symbol lets you do this.
For example, imagine you want information about President Clinton but don't want to be overwhelmed by pages relating to the Monica Lewinsky scandal. You could search this way:
clinton –lewinsky
That tells the search engine to find pages that mention "clinton" and then to remove any of them that also mention "lewinsky."
In general, the - symbol is helpful for focusing results when you get too many that are unrelated to your topic. Simply begin subtracting terms you know are not of interest, and you should get better results.
③ Using Quotation Marks To Multiply
Now that you know how to add and subtract terms, we can move on to multiplication. As in normal math, multiplying terms through a "phrase search" can be a much better way to get the answers you are looking for.
For example, remember above when we wanted pages about reserving a campsite in Yosemite? We entered all the terms like this:
+yosemite +camping +reservations
That brings back pages that have all those words on them, but there's no guarantee that the words may necessarily be near each other. You could get a page that mentions Yosemite in the opening paragraph but then later talks about getting camping reservations in the Grand Canyon. All the words you added together would appear on this page, but it still might not be what you are looking for.
Doing a phrase search avoids this problem. This is where you tell a search engine to give you pages where the terms appear in exactly the order you specify. You do this by putting quotation marks around the phrase, like this:
"yosemite camping reservations"
Now, only pages that have all the words and in the exact order shown above will be listed. The answers should be much more on target than with simple addition.
④ Combining Symbols
Once you've mastered adding, subtracting and multiplying, you can combine symbols to easily create targeted searches.
For example, remember the person who wanted pages only about Star Trek's original series? We searched this way:
star trek -voyager -deep -space -nine -next –generation
A better search might use subtraction and multiplication:
"star trek" -voyager -"deep space nine" -"next generation"
⑤ Wildcards (*)
You can search for plurals or variations of words using a wildcard character. It is also a great way to search if you don’t know the spelling of a word.
The * symbol is used as the wildcard symbol at several major search engines. The format looks like this:
sing* finds singing and sings
theat* finds theater and theatre
Some of the search engines offering wildcard search also support what is called "stemming." That means they will find terms like "singing" even if you only enter "sing." This also means you may not need to use a wildcard symbol.
2. Example
(1) Google
Google's advanced search page uses the allinurl command for finding URLs that contain certain words, as described more on the Checking Your Listing page. However, it is the undocumented "inurl" command that you should use, if you want to find both web pages with words in the URL and within the pages themselves.
For example, let's say you want to find PDF files about mars exploration. Entering "mars exploration" isn't enough, because that could bring back both HTML and PDF pages. To solve this, you can use the inurl command to specify that URLs must have the word "pdf" in them, which will increase the chances of getting PDF files. Here's both commands, combined:
mars exploration inurl:pdf
If you used the "allinurl" command rather than the "inurl" command, this search wouldn't work..
By the way, the "allinurl" command takes its name because when using it, you are requiring that ALL the words appear IN the URL. In contrast, the inurl command means that ANY of the words you specify should appear.
Google also has a command that lets you narrow your search to find documents in particular formats, such works better than forcing the URL command into this role. The command is filetype:, and you follow it with the extension you want to search for. For instance:
california power crisis filetype:pdf
brings back PDF files that contain the words "california power crisis." In contrast:
california power crisis filetype:asp
brings back Microsoft Active Server Pages (ASP) files, while
california power crisis filetype:html
brings back ordinary HTML files that end in .html, that contain the words. It will not bring back HTML files the end in .htm, however. Technically, Google considers those to be a different file type, simply because the ending is different.
(2).Related Searches
A related searches feature is designed to help users narrow in on what they are looking for. For example, let's say you searched for "mars." When the results appeared, you might also be shown some related searches links, such as "mission to mars" or "life on mars." If you selected one of these links, a new search would be conducted, using the words you clicked on. This can help you be more specific in your query, which often leads to better results.
AltaVista
Displays related searches near the top of the results page, next to the words "Others searched for."
AllTheWeb.com
Displays related searches near the top of the results page, next to the words "Narrow your search."
MSN Search
Displays related searches in the "Popular Topics" area below the search box, on the results page.
Yahoo
At Yahoo, related searches appear at the bottom of its results page.
5.4 Utilize the Network to Collect New Product Development Information
5.4.1. Collect New Products’ Plot of Consumer.
There are a lot of sources that the new products are conceived. The most important one is users' questions, this kind of method requires users to propose question and demand when they use a certain specific goods or goods series, and assess these questions and importance required, influence degree, select the plot that is worth developing in view of the above. Some domestic big enterprises have already stepped the first step in this respect. For example, the aim of Haier is to regard the customer's difficult problem as one's own development subject, it set up letter interview and customer feedback form (showed as fig.5.4) column on its website, collect the product question and customer opinions and suggestions to this enterprise product and service reflected in the course of using of this enterprise。Users' questions and suggestion are response to products and potential defect of service and requisition for more high-level products and service in fact. Studying the information conscientiously, perhaps include the limitless intention among them. It has been proved, the new products developed on this basis, have the highest success rate, according to the digital display of relevant investigations, except military products, U.S.A.'s successful technological innovation and new products have 60%-80% from the suggestions or reform which were put forward by users.
Figure5-4 customers’ feedbacks column on Haier website
5.4.2. Collection of new products’ patent information
The development of new products will involve the patent question unavoidably. Generally inquire about the domestic patent information, people can log in the websites such as (www.patent.com.cn),(http://www-trs.beic. gov.cn/patent), China patent information retrieval system (http://search.cpo.cn.net), Chinese periodical network database , patent of China , (www.cnki.net/zlindex.htm or http://chinajournal.net.cn/zlindex.htm), , network patent retrieval , intellectual property right of China , (www.cnipr.com/patent/usearch.htm), invalid patent , all place of database(http://db.istic.ac.cn/demo/patent.htm).
China's patent digest database (Fig.5.5) includes the application from all patents for invention and practical new-type patents that were announced since September 10, 1985 of Patent Office of the People's Republic of China. There are keywords in every application, title of invention, classification number of the international monopoly, classification number of the category, applicant, application number , number of the announcement , priority one , code name , application date , declaration date , applicant's address , agency's code of province and city of country origin. Users can carry on full-text search to all contents in the storehouse. Among them, the digest database of the invalid patent is more characteristic. Invalid patent mean because various kinds of reason give up patent right and patent and application for patent of right to patent. The invalid patented technology has lost protection of the patent law, but its gold content are unquestionable。
Figure 5-5 Patent digest database of China
5.4.3 Collection new product information of scientific research institutions and of the university
Generally speaking, bigger enterprises all have their own research and development departments and set up their own research and development centre or research institute in China such as P&G, Microsoft, engaged in the research and development of the new products independently. And for small and medium-sized enterprises, university and scientific research institutions that talent concentrate on relatively is it develop new cooperative partner of product can yet be regarded as an outlet to look for in person who study. Every university has a focal point of its own research and strong point, the information is very apt to find overall and detailed information on Internet.
Scientific and technological network of Chinese university (www.unitech.net.cn) link the university and comprehensive website of enterprises, enterprises can search the relevant university scientific findings (as Fig. 5.6 shows ) and look for inventors or the designer further according to the name and trade of university's school , thus confirm the new products hoped for and is developed。
Figure 5-6 Scientific and technological network of Chinese university
5.5 Collection of the Statistical Information
The information collected through each channel on Internet is scattered , unordered, so information of enterprise seems very important to work to process to put in order, because investigation results are the basis of enterprise decision often. The one that should be paid attention to is, the going on of this activity should be effective in time. Because Internet is to face the whole world, which covers geographical relatively wider range information and is different from traditional media such as TV, newspaper and broadcast and so on. If we relatively slow the speed of information processing, others may get ahead of us, the business opportunity in our hands may lose.
The tools for processing information on the net are few. The counter of some websites can send the quantity of the mail list automation statistically, investigate which information users are relatively interested in (as Fig. 5.7 shows).
Figure 5-7 Exchange counters
A lot of websites offer the information about statistics at present, such as Hotbot (www.hotbot.com). For example, users input " demographic”, “population” or “census " ,etc. then choose the limited time, language and way to show on the left of the webpage, clicking the button " search ". Finally, we can see the picture illustrated in Fig. 5.8.
Figure 5-8 searching data of people's statistics on Hotbot
When one would like to choose Search Partners, he may choose to search for on Yellow Page of Lycos. Then we need to confirm the searching detail further (such as Fig. 5.9), such as investigating the catalogue (superfine catalogue) or company's name, affiliated urban name and state name of the content. This is simple search. We can also choose the detailed search, search by phone number or by distance.
Figure 5-9 Confirm the details of searching on the Yellow Page in Lycos
If automobile enterprise of our country want Los Angeles automobile statistics of California distributor, they can fill in Automobile Dealers , Los Angeles and California, separately . It will show the lower grade of this catalogues which accords with this condition (as Fig. 5.10 shows). Enter the last result after choosing any catalogue (as Fig. 5.11 shows).
Figure 5-10 lower grade of catalogues of the terms in Lycos
Figure 5-11 The final result of the statistics in Lycos
5.6 Network commerce information reorganization
5.6.1 Information Efficiency in Web Search Services
Numerous Web search services exist on the Internet, some with access to information on tens of millions of Web pages. As the Web becomes the dominant information channel, it is important to focus on how Web search services are organized and to evaluate their efficiency in providing relevant information to users.
Surfing the net in this way is still the only way some Web pages can be found, because not all pages are indexed or cataloged. Users of search services are essentially limited to the Web space that their search intermediaries have mapped out. While this may be a limitation, it is still easier to rely on search services to find specific information. Since the search space is limited by the will of the service providers, we gain in efficiency but lose by foregoing some information not included in the search database. The extent of the loss or gain depends on how the search intermediaries filter information when preparing their databases.
Information filtering is done by search intermediaries in two stages: (1) selection and (2) presentation. In each stage, some arbitrary value judgment is imposed that may or may not affect the information efficiency for the consumers. In terms of selection, search services quote how many unique Web addresses (URLs) their databases cover. The numbers range from tens of millions to several hundred thousands. Some URLs are not visited, and bot-resistant sites may be omitted. Some URLs are not added if they are deemed to be of minor interest. The criteria used are, for example, informational content, graphical presentation, and other interesting features. Knowing how each search database is compiled will help users in selecting a search provider. For example, some search databases give high marks for jazzy graphical contents and technological sophistication. For content-oriented users, these sites, albeit valued highly by database compilers, may appear as a poor source for information. Selection criteria are often discussed in 'About' and 'FAQ' pages of search services.
Once databases of this information are made, search intermediaries can use different methods of accessing them for consumers. In one polar case, the database may be presented as is so that when consumers search by keywords, the results are displayed based on some relevancy criteria only. Relevancy criteria are such measures as how many words in the document match the search words, or whether the search word appears in the title or the URL, which results in a higher relevancy score. Keyword strings can be enclosed in quotes as in "historical fiction," which selects only those documents that contain the phrase. Even with this and other improvements in querying, the result of a search is often overwhelming. Instead of using relevancy tests such as keyword matching, some intermediaries organize their database by categories, e.g. Yahoo's subject listings. While keyword searches may end up presenting irrelevant information that uses the search word in a totally different context, subject listings or directories present more reliable information on a given subject. However, it is sometimes difficult to characterize a Web page in one subject, and intermediaries must exercise certain value judgment in deciding under what subject a Web page must be classified. This arbitrary decision introduces errors as significant as those borne by keyword searches.
5.6.2 Information Acquisition and Efficiency
The process of information filtering has more facets than you may imagine. Information filtering is based on a simple procedure that places a filtering program between a user and the content server (see Figure 5.12). The filtering agent carries out selection processes based on user-determined filtering criteria known as scripts or profiles, which are continuously updated via feedback. In a collaborative filtering scheme, scripts and profiles are exchanged among different users. An increasingly popular use of filtering agents—among parents and educators—is to block certain Web sites that contain inappropriate or indecent materials: for example, Cyberpatrol (http://www.cyberpatrol. com), Cybersitter (http://www.solidoak.com) and NetNanny (http://www.netnanny.com). While these examples are software programs that can be downloaded or installed by individual users, N2H2 (http://www.n2h2.com) provides server based solutions, where filtering is implemented for all users connected to the server. The same filtering scheme is used to remove only the unwanted portion of a Web document. In the WebFilter implementation developed by Axel Boldt (http://www.math.ucsb.edu/~boldt/), the filter is a proxy server that retrieves a document and removes prescribed features such as advertising banners or large graphics before presenting it to a user.
Figure 5.12: Various functions of a filtering agent
Finally, the filter can be used as an agent that selects—that is, filters—documents among all incoming messages. This use of information filtering is gaining popularity because of the tremendous growth in junk emails and spamming on Usenet newsgroups. A message from a known advertiser will get a zero score; and a message dealing with your favorite subject gets a higher score. The result is then displayed on your screen so that you can decide which one to read and respond. InfoScan (http://www.machinasapiens.qc.ca/infoscanang.html), a filtering program, displays the result on a radar screen (see Figure 5.13), where only five out of 50 messages are selected as relevant, the one closer to the center of the radar screen having a higher score.
Figure 5-13 InfoScan's radar screen presents its result of filtering 50 documents
An interesting application of this filtering agent is gaining support to counter spamming on the Usenet. A Usenet newsgroup can be either moderated or unmoderated. A moderated group has one or more moderators who screen all messages before forwarding them to the Usenet. The majority of newsgroups are unmoderated for several reasons: Usenet users prefer unfettered, equal participation; unpaid moderators have to spend time and effort to screen messages; and messages may be delayed unnecessarily. However, due to the increasing level of abuse in many newsgroups, some type of moderation will be needed for most newsgroups in the near future. A hybrid solution is to use an intelligent, software agent. This 'bot moderation' or 'robomoderation' screens messages rejecting those with "MAKE EASY MONEY" or those cross-posted in many newsgroups. Also, robomoderator handles notification, acceptance and forwarding automatically, reducing the workload of human moderators. For example, Secure Team-based Usenet Moderation Program (STUMP), a freely available program (see Online Resources at the end of this chapter), can save time needed for moderation but it also allows messages to be archived as Web pages.
While user-oriented filtering agents are acquiring more diverse uses, in terms of network efficiency, a middle ground may entail using intermediaries, where information filtering occurs in the middle of the acquisition process. If a large number of such intermediaries exists, consumers can also be guaranteed of a choice. One thing to note, however, is that intermediaries are increasingly using advertising, which may unfortunately cause consumers to doubt the objectivity of their search results. In some economic activities, independent third-party status is clearly important, and information search is one of these activities. An element of trust and neutrality is necessitated, and filtering can be seen to be a result of censorship or blatant advertising. Therefore, instead of advertising, search intermediaries may benefit from the adoption of micropayment methods by which consumers pay a small amount, say a penny, for each search and intermediaries guarantee full and unbiased access to their databases.
References
Diamond, P.A., 1971. "A Model of Price Adjustment." Journal of Economic Theory 3:156-168.
Hahn, H., 1996. The Internet: Complete Reference. Second edition. Berkeley: Osborne McGraw-Hill.
Robert, R. and D.O. Stahl, 1993. "Informative Price Advertising in a Sequential Search Model." Econometrica 61(3): 657-686.
Stahl, D.O., 1996. "Oligopolistic pricing with heterogeneous consumer search." International Journal of Industrial Organization, 14:242-268.