AI for Teachers, An Open Textbook: Edition 1

AI Speak : Search Engine Ranking

Compared to the search engines of the early 2000s, the present search engines do richer and deeper analysis. For example, more than just counting words, they can analyze and compare the meaning behind words.1 Much of this richness happens in the ranking process :

Step 4: Query terms are matched with index terms

Once the user types the query and clicks on search, the query is processed. Tokens are created with the same process as the document text. Then the query may be expanded by adding other keywords. This is to avoid the case where relevant documents are not found because the query uses slightly different words from the authors of web content. This is also done to capture differences in custom and usage. For example, the use of words like President, Prime Minister and Chancelor may be interchanged depending on the country.1

Most search engines keep track of user searches (Look at description of some popular search engines to learn more). Queries are recorded with the user data in order to personalise content and serve advertisements. Or, the records from all users are put together to see how and where to improve search engine performance.

User logs contain previous queries, the results page and information on what worked - what did the user click and what did they spend time reading. With the user logs, each query can be matched with relavant documents (the user clicks, reads and closes session) and non-relevant documents(user did not click or did not read or tried to rephrase query).2

With these logs, each new query can be matched with a past query which is similar. One way of telling a query is similar to another is seeing if ranking turns up the same documents : Similar queries may not always contain the same words but the results are likely to be identical.2

Spelling errors can be corrected using similar queries. New key words and synonyms can be added to expand the query. This is done by looking at other words that occur frequently in the relevant documents from the past.  In general, however, words that occur more frequently in the relevant documents than in the non-relevant documents are added to the query or given additional weightage.2

Step 5 : Relevant documents are ranked

Each document is scored for relevance and ranked according to this score. Relevance here is both topic relevance - how well the index terms of a document match that of the query, and user relevance - how well it matches the preferences of the user. A part of document scoring can be done while indexing. The speed of the search engine depends on the quality of indexes. Its effectiveness is based on how the query is matched to the document and on the ranking system.2

User relevance is measured by creating user models (or personality types) based on their previous previous search terms, sites visited, email messages, the device they are using, language and geographic location. Cookies are used to store user preferences. Some search engines buy user information from third parties as well (Refer description of some search engines). If a person is interested in football their results for “Manchester” will be different from the person who just booked a flight to London. Words that occur frequently in the documents associated with a person will be given the highest importance.

Commercial web search engines incorporate hundreds of features in their ranking algorithms, many derived from the huge collection of user interaction data in the query logs. A ranking function combines the document, the query and user relevance features.Whatever be the ranking function used, it would have a solid mathematical foundation. The output is the probability that a document satisfies the user’s information need. Above a certain probability of relevance, the document is classified as relevant.2

Machine learning is used to learn ranking on implicit user feedback in the logs(what worked in previous queries). Machine learning has also been used to develop sophisticated models of how humans use language with which to decipher queries.1,2
Advances in web search has been phenomenal in the last decade. However, where it is about understanding the context for a specific query, there is no substitute for the user providing a better query. Typically, better queries come from users examining the results and reformulating the query.2

Step 6 : Results are diplayed

Finally, the results are ready to be displayed. The page's title and url are displayed, with query terms in bold. A short summary is generated and displayed after each link. The summary highlights important passages in the document. For this, sentences are used from headings, metadata description or from text that corresponds best with the query. If all query terms appear in the title, they not be repeated in the snippet.2 Sentences are also selected based on how readable they are.

Appropriate advertising is added to the results. Advertising is how most search engines generate revenue. In some search engines, they are clearly marked as sponsored content while in others they are not. Since many users look at only the first few results, ads change the whole process substantially.

Advertisements are chosen according to the context of the query and the user model. Search engine companies maintain a database of advertisements which is searched to find the most relevant advertisements for a given query. Advertisers bid for keywords that describe topics associated with their product. Both the amount bid and the popularity of an advertisement are significant factor in the selection process.2

For questions on facts, some engines use their own collection of facts. Google's Knowledge Vault contains over a billion facts indexed from different sources.3 Results are clustered by Machine Learning Algorithms into appropriate groups. Finally, the user is also presented with alternatives to the query to see if they better fit their actual need.

Some references:
The origin of Google can be found in Brin and Paige’s original paper
Some of the math behind Pagerank is on Wiki's PageRank  
For the mathematical minded, a nice explanation of Pagerank

------------------------------------------------------------------------------------------------------
1 Russell, D., What Do You Need to Know to Use a Search Engine? Why We Still Need to Teach Research Skills, AI Magazine, 36(4), 2015
2 Croft, B., Metzler D., Strohman, T., Search Engines, Information Retrieval in Practice, W.B. Croft, D. Metzler, T. Strohman, 2015
3 Spencer, S Google Power Search: The Essential Guide to Finding Anything Online With Google, Koshkonong, Kindle Edition.


 

This page has paths:

This page references: