Yandex search engine what's new. Yandex - what is Yandex and why is it called Yandex. Composition and principles of operation of the search system

They have long become an integral part of the Russian Internet. Search engines are now huge and complex mechanisms that represent not only an information search tool, but also tempting areas for business.

Most search engine users have never thought (or thought about it, but did not find an answer) about the principle of operation of search engines, the scheme for processing user requests, what these systems consist of and how they function...

This master class is designed to answer the question of how search engines work. However, you will not find here factors that influence the ranking of documents. Moreover, you should not count on a detailed explanation of the Yandex algorithm. He, according to Ilya Segalovich, the director of technology and development of the Yandex search engine, can only be recognized “under torture” by Ilya Segalovich himself...

2. Concept and functions of a search engine

A search system is a software and hardware complex designed to search the Internet and respond to a user request, specified in the form of a text phrase (search query), by producing a list of links to sources of information, in order of relevance (in accordance with the request). The largest international search engines: "Google", Yahoo , MSN . On the Russian Internet these are Yandex, Rambler, Aport.

Let's take a closer look at the concept of a search query using the Yandex search engine as an example. The search query should be formulated by the user in accordance with what he wants to find, as briefly and simply as possible. Let's say we want to find information in Yandex on how to choose a car. To do this, open the Yandex main page and enter the text of the search query “how to choose a car.” Next, our task comes down to opening the links provided at our request to sources of information on the Internet. However, it is quite possible that we will not find the information we need. If this happens, then either you need to rephrase your request, or the search engine database really does not have any relevant information on our request (this can happen when asking very “narrow” queries, such as, for example, “how to choose a car in Arkhangelsk”)

The primary goal of any search engine is to deliver to people exactly the information they are looking for. And teach users to make “correct” requests to the system, i.e. queries that comply with the operating principles of search engines are impossible. Therefore, developers create algorithms and operating principles for search engines that would allow users to find the information they are looking for.

This means the search engine must “think” the same way the user thinks when searching for information. When a user makes a request to a search engine, he wants to find what he needs as quickly and easily as possible. Receiving the result, he evaluates the performance of the system, guided by several basic parameters. Did he find what he was looking for? If he didn’t find it, how many times did he have to rephrase the query to find what he was looking for? How much relevant information could he find? How quickly did the search engine process the query? How convenient were the search results presented? Was the result you were looking for the first or the hundredth? How much unnecessary garbage was found along with useful information? Will the necessary information be found when accessing a search engine, say, in a week, or in a month?

In order to satisfy all these questions with answers, search engine developers are constantly improving search algorithms and principles, adding new functions and capabilities, and trying in every possible way to speed up the operation of the system.

3. Main characteristics of a search engine

Let us describe the main characteristics of search engines:

Completeness
Completeness is one of the main characteristics of a search system, which is the ratio of the number of documents found by request to the total number of documents on the Internet that satisfy the given request. For example, if there are 100 pages on the Internet containing the phrase “how to choose a car,” and only 60 of them were found for the corresponding query, then the completeness of the search will be 0.6. Obviously, the more complete the search, the less likely it is that the user will not find the document he needs, provided that it exists on the Internet at all.
Accuracy
Accuracy is another main characteristic of a search engine, which is determined by the degree to which the found documents match the user's query. For example, if the query “how to choose a car” contains 100 documents, 50 of them contain the phrase “how to choose a car”, and the rest simply contain these words (“how to choose the right radio and install it in a car”), then the search accuracy is considered equal to 50/100 (=0.5). The more accurate the search, the faster the user will find the documents he needs, the less various kinds of “garbage” will be found among them, the less often the found documents will not correspond to the request.
Relevance
Relevance is an equally important component of search, which is characterized by the time that passes from the moment documents are published on the Internet until they are entered into the search engine index database. For example, the day after interesting news appeared, a large number of users turned to search engines with relevant queries. Objectively, less than a day has passed since the publication of news information on this topic, but the main documents have already been indexed and available for search, thanks to the existence of the so-called “fast database” of large search engines, which is updated several times a day.
Search speed
Search speed is closely related to its load resistance. For example, according to Rambler Internet Holding LLC, today, during business hours, the Rambler search engine receives about 60 requests per second. Such workload requires reducing the processing time of an individual request. Here the interests of the user and the search engine coincide: the visitor wants to get results as quickly as possible, and the search engine must process the request as quickly as possible, so as not to slow down the calculation of subsequent queries.
Visibility

4. Brief history of the development of search engines

In the initial period of Internet development, the number of its users was small, and the amount of available information was relatively small. For the most part, only research staff had access to the Internet. At this time, the task of searching for information on the Internet was not as urgent as it is now.

One of the first ways to organize access to network information resources was the creation of open directories of sites, links to resources in which were grouped according to topic. The first such project was the Yahoo.com website, which opened in the spring of 1994. After the number of sites in the catalog increased significantly, the ability to search for the necessary information in the catalog was added. In the full sense, it was not yet a search engine, since the search area was limited only to the resources present in the catalog, and not to all Internet resources.

Link directories were widely used in the past, but have almost completely lost their popularity at present. Since even modern catalogs, huge in volume, contain information only about a negligible part of the Internet. The largest directory of the DMOZ network (also called the Open Directory Project) contains information about 5 million resources, while the Google search engine database consists of more than 8 billion documents.

In 1995, search engines Lycos and AltaVista appeared. The latter has been a leader in the field of information search on the Internet for many years.

In 1997, Sergey Brin and Larry Page created the Google search engine as part of a research project at Stanford University. Google is currently the most popular search engine in the world!

In September 1997, the Yandex search engine, which is the most popular on the Russian-language Internet, was officially announced.

Currently, there are three main search engines (international) - Google, Yahoo and, which have their own databases and search algorithms. Most other search engines (of which there are a large number) use in one form or another the results of the three listed. For example, AOL search (search.aol.com) uses the Google database, while AltaVista, Lycos and AllTheWeb use the Yahoo database.

5. Composition and principles of operation of the search system

In Russia, the main search engine is Yandex, followed by Rambler.ru, Google.ru, Aport.ru, Mail.ru. Moreover, at the moment, Mail.ru uses the Yandex search engine and database.

Almost all major search engines have their own structure, different from others. However, it is possible to identify the main components common to all search engines. Differences in structure can only be in the form of implementation of the mechanisms of interaction of these components.

Indexing module

The indexing module consists of three auxiliary programs (robots):

Spider is a program designed to download web pages. The spider downloads the page and retrieves all internal links from that page. The html code of each page is downloaded. Robots use HTTP protocols to download pages. The spider works as follows. The robot sends the request “get/path/document” and some other HTTP request commands to the server. In response, the robot receives a text stream containing service information and the document itself.

Page URL
date the page was downloaded
Server response http header
page body (html code)

Crawler (“traveling” spider) is a program that automatically follows all the links found on the page. Selects all links present on the page. Its job is to determine where the spider should go next, based on links or based on a predetermined list of addresses. Crawler, following the links found, searches for new documents that are still unknown to the search engine.

Indexer (robot indexer) is a program that analyzes web pages downloaded by spiders. The indexer parses the page into its component parts and analyzes them using its own lexical and morphological algorithms. Various page elements are analyzed, such as text, headings, links, structural and style features, special service HTML tags, etc.

Thus, the indexing module allows you to crawl a given set of resources using links, download encountered pages, extract links to new pages from received documents, and perform a complete analysis of these documents.

Database

A database, or search engine index, is a data storage system, an information array in which specially converted parameters of all documents downloaded and processed by the indexing module are stored.

Search server

The search server is the most important element of the entire system, since the quality and speed of the search directly depend on the algorithms that underlie its functioning.

The search server works as follows:

The request received from the user is subjected to morphological analysis. The information environment of each document contained in the database is generated (which will subsequently be displayed in the form, that is, text information corresponding to the request on the search results page).
The received data is passed as input parameters to a special ranking module. Data is processed for all documents, as a result of which each document has its own rating that characterizes the relevance of the query entered by the user and the various components of this document stored in the search engine index.
Depending on the user’s choice, this rating can be adjusted by additional conditions (for example, the so-called “advanced search”).
Next, a snippet is generated, that is, for each document found, the title, a short abstract that best matches the query, and a link to the document itself are extracted from the document table, and the words found are highlighted.
The resulting search results are transmitted to the user in the form of a SERP (Search Engine Result Page) – a search results page.

As you can see, all these components are closely related to each other and work in interaction, forming a clear, rather complex mechanism for the operation of the search system, which requires huge amounts of resources.

6. Conclusion

Now let's summarize all of the above.

The primary goal of any search engine is to deliver to people exactly the information they are looking for.
Main characteristics of search engines:
1. Completeness
2. Accuracy
3. Relevance
4. Search speed
5. Visibility
The first full-fledged search engine was the WebCrawler project, published in 1994.
The search system includes the following components:
1. Indexing module
2. Database
3. Search server

We hope that our master class will allow you to become more familiar with the concept of a search engine and better understand the main functions, characteristics and operating principles of search engines.

Hello dear friends! In this article we will continue to look at the Yandex search engine, and as you remember, in previous articles we discussed the history of the creation of this great company, which ranks first among its competitors in Russia and beyond.

All this is good, but beginners and experienced site builders are interested in the most important question, of course, related to how to bring their projects to the first places in the TOP search results.

Therefore, let's look at how the Yandex search engine works in order to understand what mistakes you can step on, and what to expect from a Russian search engine in general.

In the last article we discussed. The topic turned out to be quite interesting and useful. Therefore, I decided to supplement it, deepen it, so to speak.

So, I probably got a little carried away with the question “Why does a search engine index documents”? It’s a no brainer. All that remains is to figure out the “how” question.

Website ranking algorithms

First, let's get acquainted with some algorithms that are fundamental to any search engine:

— Direct search algorithm.

What is it - you remember reading a wonderful story in one of the books. And you start looking one by one. They took one book, looked through it, didn’t find it, took another... The principle is clear, but this method is extremely long. This is also understandable.

— Reverse search algorithm.

For this algorithm, a text file is created from each page of your blog. This file lists in alphabetical order ALL the words you used. Even the position of this word in the text is indicated (coordinates in the text).

This is a fairly fast method, but the search already occurs with some error.

The main thing to understand here is that this algorithm does not search on the Internet, not by searching on a blog. And in a separate text file that was created a long time ago. When the robot came to you. And these files (reverse indexes) are stored on Yandex servers.

So, these were the basic search algorithms. Those. how Yandex simply finds the necessary documents. There shouldn't seem to be any problems with this.

But Yandex knows more than one or even 100 documents, but according to the latest data from my sources, Yandex knows about 11 billion documents (10,727,736,489 pages).

And among all this quantity, you need to select documents that match the request. And more importantly, you need to somehow rank them. Those. arrange according to the degree of importance, or rather according to the degree of usefulness for the reader.

Mathematical search models

To solve this issue, mathematical models come to the rescue. Now we’ll talk about the simplest models.

Boolean mathematical model– If a word appears in a document, the document is considered found. Just a coincidence and nothing complicated.

But there are problems here. For example, if you, as a user, enter some popular word, or even better, the preposition “v”, which is the most common word in the Russian language and is found in EVERY document, then you will be given so many results that you don’t even realize such a number, How many documents did you find? Therefore, the following mat model appeared.

Vector mathematical model– this model determines the “weight” of the document. Not only does the coincidence occur, but the word must occur several times. Moreover, the more a word appears, the higher the relevance (compliance).

It is the vector model that ALL search engines use.

Probabilistic model– more complex. The principle is this: the search engine found the page template itself. For example, you are looking for information about the history of Yandex. Yandex stores some kind of standard, let’s say this will be my previous article about Yandex.

And he will compare all other documents with this article. And the logic here is this: the more similar your blog page is to my article, the MORE LIKELY is the fact that your blog page will also be useful to the reader and also tells about the history of Yandex.

To reduce the number of documents that need to be shown to the user, the concept of relevance was introduced, i.e. compliance.

How relevant is your blog page to the topic? This is an important topic when it comes to search quality.

Assessors - who are they and what are they responsible for?

This relevance is also needed to assess the quality of the algorithms.

For this purpose there is a special forces headquarters - they are called Assessors. These are special people who look through search results with their hands.

They have instructions on how to check sites, how to evaluate, etc. And they manually determine whether your pages are suitable for search queries or not.

And the quality of search algorithms depends on the opinion of assessors. If all the assessors say that the search results do not correspond to the requests, this means that the ranking algorithm is incorrect and Yandex is the only one to blame.

If the assessors say that only one site does not meet the request, it means that the site flies somewhere far away and is lowered in the search results. More precisely, not the entire site, but only one article, but this is “not the point.”

Of course, assessors cannot review and evaluate ALL articles with their hands and eyes. This is understandable.

And other parameters by which pages are ranked come to the rescue.

There are a lot of them, for example:

page weight (vIC, PageRank, baby bumps All in all);
domain authority;
relevance of the text to the request;
relevance of external link texts to the query;
as well as many other ranking factors.

Assessors make comments, and the people who are responsible for setting up the mathematical ranking model, in turn, edit the formula, as a result of which the search engine works more efficiently.

The main criteria for evaluating the performance of the formula:

1. Search engine results accuracy- percentage of documents that match the request (relevant). Those. The fewer pages that do not match the request, the better.

2. Completeness of search engine results- this is the ratio of relevant web pages for a given query to the total number of relevant documents in the collection (the totality of pages found in the search engine).

For example, if there are more relevant pages in the entire collection than in the search results, this means that the results are incomplete. This happened because some of the relevant web pages were filtered.

3. Relevance of search engine results- this is the compliance of the web page with what is written in the snippet. For example, a document may be very different or not exist at all, but still be present in the search results.

The relevance of the search results directly depends on how often the search robot scans documents from its collection.

Collection collection (indexing of site pages) is carried out by a special program - a search robot.

The search robot receives a list of addresses for indexing, copies them, and then sends the contents of the copied web pages for processing to an algorithm that converts them into reverse indexes.

Well, “in a nutshell,” so to speak, we discussed the principles of the search engine.

Let's summarize:

A search robot comes to your blog.
The search robot stores the reverse index of the page for subsequent searches.
Using a mathematical model, the document is processed and displayed in search results using formulas and taking into account the opinion of the assessor.

This is very, very simplified. Just to get a basic understanding of how the Yandex search engine works.

I have now written so much text, and perhaps so much is not clear. Therefore, I suggest you return to this article a little later and watch this video.

This is an excellent guide, which I also learned from at one time.

I hope this information will help you better understand why one of your sites occupies appropriate positions in searches and do everything to improve them.

With this I say goodbye to you, if you have any questions, I’m always happy to answer them in the comments. Or maybe you want to add to the article?

In any case, express your opinion. !

We are not as unique as we think: millions of people before us puzzled and millions after us will puzzle the search engine with almost identical questions. On the other hand, we are too unpredictable: the formulation of our request is influenced by a huge number of factors that we are not aware of. And at least for this reason, the request of each of us, no matter how banal it may be, requires an individual approach.

In fact, the entire work of the Yandex search engine comes down to two simple things: to understand what a person really wants to know, and in a few seconds to find suitable ones among billions of documents on the Internet.

Take fingerprints

The search engine's operating system is somewhat similar to the Matrix, and the search robot (the complex, independently decision-making program it created) is similar to Agent Smith.

In order not to search the entire Internet every time someone needs to know something, the search engine does part of the work in advance - it checks what is on the Web and where it is, using thousands of search robots. They come in two types: basic and fast. The main one crawls and processes the Internet as a whole, and the fast one - documents that appeared a minute or even a couple of seconds ago. The task of robot programs is to select suitable and useful information for users, process it, weeding out everything outdated and unnecessary. In some ways, this is reminiscent of sorting garbage: paper in one container, glass in another, plastic in a third, food waste in a fourth...

The information collected by robots forms the so-called Internet cast. It is stored on thousands of Yandex servers and is constantly updated. A nugget is like a list that tells you where to find what information. In this list, each keyword has not one, but millions of “pages”. To ensure that all nugget updates are available to users, they are moved from the repository to the “base search”. Data from the main robot is transferred every few days, and from the fast robot - in real time.

Bring to clean water

ILLUSTRATION: EVGENY TONKONOGY

While searching for the answer to a given question in a prepared database, the machine faces two main difficulties. The first difficulty is language. Before looking for an answer to a question, it is important for the machine to understand in what language it should do so. For example, for a Russian-speaking person, the search for “Prince Igor’s squad” will find documents with information about the army, and for a Ukrainian, the “Prince Igor’s squad” will also return documents mentioning Princess Olga, his wife, since in Ukrainian “wife” is "squad". And in the rich Russian language, the same word or its derivatives can mean different things. For example, the word “steel” is one of the forms of the noun “steel” and the verb “become”. The second difficulty is human psychology. When entering a request, we expect a quick and accurate answer, without naturally worrying about whether the wording of the request corresponds to the principles of mathematical analysis by which the machine’s brain works. For example, by entering the word “Napoleon” into the search bar, what does a person want to get: a cake recipe or a biography of the French emperor, buy cognac or find the address of a psychiatric hospital?

In such situations, several technologies come into play. You can give you several hints under the search bar that will specify your request. Like, choose what you need: Napoleon recipes or Napoleon - Bonaparte. If the user does not respond to the machine’s request and does not add words to the “Napoleon”, then the “Spectrum” technology helps the matter: without hoping for help, the machine immediately searches for information in several categories (about the cake, and about the emperor, and about the horse). ..). In addition, personalization mechanisms help to understand the user - the machine’s knowledge of what this user was looking for on his computer a day, two, three, or months ago: if you often asked Yandex questions about cooking, then the machine will first show you results that say that Napoleon is a cake.

Combinations: interest clubs

The task of a search engine is not simply to select documents that contain words and phrases from the search query. The machine must understand which documents meet our conflicting requirements and why they meet them. Do we want to get information about Napoleon the cake, or maybe we visited a fitness club with a pretentious name for a couple of years, or are even completely concerned about the complexes of short people. In any case, solving the problem requires a non-trivial approach.

The creators of the Yandex search program found this approach by delegating the right of choice to the machine. On the one hand, a soulless, but very fast and smart machine does not know and does not want to know anything about us as individuals, and on the other hand, it tries to find out as much as possible about everyone.

In addition to the geographic location of the user and linguistic analysis of his queries, the search engine uses several thousand criteria that are not at all obvious to humans.

The trick is that the machine develops and updates these criteria independently.

It simply uses data on the preferences and user behavior of millions of people and relates this “arithmetic average” to the history of our queries. The principles that guide the Matrix within itself, comparing the thousands of categories of user interests it has developed, often do not fit into traditional human ideas about what “interests” can be in principle. There are tens of thousands of them. They create different, sometimes funny, combinations with each other. For example, one of these combinations could be that search results match the interests of a person who breeds newts. At the same time, a person is not just interested in newts, but is already breeding them, but only for the first year.

Ratings. Helping hands

The matrix, of course, decides itself (with the help of higher mathematics) what and in what sequence needs to be shown to users based on tens of thousands of criteria. But the Matrix also uses living people - 1000 Yandex employees, the so-called assessors, evaluate search results for a particular request (of course, not every request is evaluated, and this is not done in real time) to determine whether they meet the expectations of an ordinary user : not as rational as a machine, not as precise in formulation, contradictory and emotional.

Good afternoon, dear readers of my SEO blog. . This article is about how the Yandex search engine works what technologies and algorithms it uses to rank sites, and what it does to prepare a response to users. Many people know that this flagship of Russian search sets the tone in Runet, owns the largest database in Eurasia, handles the content of more than a billion pages, and knows the answer to any question. According to Liveinternet data for August 2012, Yandex's share in Russia is 60.5%. The monthly audience of the portal is 48.9 million people. But the most important thing for us bloggers is how the search engine receives our requests, how it processes them and what the result is as a result. On the one hand, knowing and understanding this information makes it easier for us to use all Yandex resources; on the other hand, it is easier to promote our blogs. Therefore, I propose to look with me at the most important technologies of the best Runet search engine.

When an Internet user first wants to turn to a search engine for information, he may have one question: “How does the search work?” But when he receives it, this question often changes to another: “Why so fast?” And really, why does searching for a file on a computer take 20 seconds, and the result of a request from an entire network of computers around the world appears in a second? The most interesting thing is that the first two questions (how the search occurs and why 1 second) can be answered in one answer - the search engine has prepared in advance for the user’s request.

To understand the principle of operation of Yandex, like other search engines, let’s draw an analogy with a telephone directory. To find any phone number, you need to know the subscriber's last name, and any search in this case takes a maximum of a minute, because all pages of the directory are a continuous alphabetical index. But imagine if the search was carried out using a different option, where phone numbers were ordered by the numbers themselves. After such searches, which will drag on for a longer time, the numbers will remain before the eyes of the searcher for a very long time. 🙂

Likewise, the search engine displays all the information from the Internet in a form convenient for it. And most importantly, all this data is placed in her directory in advance, before the visitor arrives with his requests. That is, when we ask Yandex a question, it already knows our answer. And gives it to us in a second. But this second includes a number of important processes, which we will now consider in detail.

Internet Indexing

Yandex ru collects all the information it can get its hands on on the Internet. Using special equipment, all content is reviewed, including images based on visual parameters. The search engine is engaged in such collection, and the process of collecting and preparing data is called indexing. The basis of such a machine is a computer system, which is otherwise called a search robot. It regularly crawls indexed sites, checks them for new content, and also scans the Internet for deleted pages. If it discovers that some such page no longer exists or is closed from indexing, it removes it from the search.

How does a search robot find new sites? Firstly, thanks to links from other sites. Because if a link is placed on a new web resource from an already indexed site, then the next time you visit the second one, the robot will visit the first one. Secondly, there is a wonderful service, popularly called “addurlka” (from the phrase in English -addurl - add address). In it you can enter the address of your new site, which will be visited by a search robot after a while. Thirdly, with the help of a special program “Yandex.Bar”, the visits of users who use it are tracked. Accordingly, if a person lands on a new web resource, a robot will soon appear there.

Are all pages included in the search? Millions of pages are indexed every day. Among them there are pages of varying quality, which can contain different information - from unique content to complete garbage. Moreover, as statistics say, there is much more garbage on the Internet. The search robot analyzes each document using special algorithms. It determines whether it has any useful information and whether it can answer the user's request. If not, then such pages are not accepted as “cosmonauts,” but if so, then it is included in the search.

After a robot has visited a page and determined its usefulness, it appears in the search engine's storage. Here we analyze any document down to the very basics, as the auto center masters say - down to the cogs. The page is cleared of html markup, the clean text undergoes a full inventory - the location of each word is calculated. In this disassembled form, the page turns into a table with numbers and letters, which is otherwise called an index. Now, no matter what happens to the web resource that contains this page, its latest copy is always available in the search. Even if the site no longer exists, copies of its documents are stored on the Internet for some time.

Each index, together with data on document types, encoding, language, together with copies, constitute search database . It is updated periodically, so it is located on special servers with the help of which requests from search engine users are processed.

How often does the indexing process occur? First of all, it depends on the types of sites. The first type of web resource changes the content of its pages very often. That is, when a search robot comes to these pages each time, they contain different content each time. Next time you won’t be able to find anything using them, so such sites are not included in the index. The second type of site is a data warehouse, on the pages of which links to documents for downloading are periodically added. The content of such a site usually does not change, so the robot visits it extremely rarely. Other sites depend on the frequency of updating the material. This means the following: the faster new content appears on the site, the more often the search robot comes. And priority is given first to the most important web resources (a news site is an order of magnitude more important than any blog, for example).

Indexing allows you to perform the first function of a search engine - collecting information on new pages on the Internet. But Yandex also has a second function - searching for an answer to a user’s request in an already prepared search database.

Yandex is preparing a response

The process of processing the request and issuing relevant responses is handled by computer system "Metasearch" . For its work, it first collects all the input information: from which region the request was made, what class it belongs to, whether there are errors in the request, etc. After such processing, metasearch checks whether there are exactly the same queries with the same parameters in the database. If the answer is yes, then the system shows the user the previously saved results. If such a question does not exist in the database, the metasearch addresses the search database that contains the index data.

And this is where amazing things happen. Imagine that there is one super-powerful computer that stores the entire Internet processed by search robots. The user sets a query and a search begins in the memory cells for all documents involved in the query. The answer has been found and everyone is happy. But let's take another case when there are a lot of requests containing the same words in their body. The system must go through the same memory cells each time, which can increase the time it takes to process data significantly. Accordingly, the time increases, which can lead to the loss of the user - he will turn to another search engine for help.

To avoid such delays, all copies in the site index are distributed across different computers. After transmitting the request, metasearch instructs such servers to search for their piece of text. After which, all the data from these machines is returned to the central computer, it combines all the results obtained and gives the user the top ten best answers. With this technology, two birds are killed at once: the search time is reduced several times (the answer is obtained in a split second) and, thanks to the increase in platforms, information is duplicated (data is not lost due to sudden breakdowns). The computers themselves with duplicate information make up a data center - this is a room with servers.

When a search engine user asks a query, 20 times out of 100, the goals in the question are ambiguous. For example, if he writes the word “Napoleon” in the search bar, then it is not yet known what answer he expects - a cake recipe or a biography of the great commander. Or the phrase “Brothers Grimm” - fairy tales, films, musical group. To narrow such a possible range of goals to specific answers, Yandex has a special technology Range. It takes into account user needs using search query statistics. Of all the questions asked in Yandex by visitors, Spectrum identifies various objects in them (names of people, titles of books, car models, etc.) These objects are distributed into certain categories. Currently there are more than 60 such categories. With their help, the search engine has in its database different meanings of words in user queries. Interestingly, these categories are periodically checked (analysis occurs a couple of times a week), which allows Yandex to more accurately provide answers to the questions posed.

Based on Spectrum technology, Yandex organized dialog prompts. They appear below the search bar in which the user types his ambiguous query. This line reflects the categories to which the subject of the question may belong. Further search results depend on the user’s choice of this category.

From 15 to 30% of all users of the Yandex search engine want to receive only local information (data from the region in which they live). For example, about new films in cinemas in your city. Therefore, the answer to such a request should be different for each region. In this regard, Yandex uses its technology search based on regions . For example, these are the answers residents who are looking for a repertoire of films in their Oktyabr cinema may receive:

But this is the result that residents of the city of Stavropol will receive for the same request:

The user's region is determined primarily by its IP address. Sometimes this data is not accurate, because a number of providers can work in several regions at once, and therefore change the IP addresses of their users. In principle, if this happens to you, you can easily change your region in the settings in the search engine. It is listed in the upper right corner of the results page. You can change it.

Search engine Yandex ru - response results

When Metasearch has prepared an answer, the Yandex search engine should display it on the results page. It is a list of links to found documents with a little information on each. The task of the technology for issuing results is to provide the user with the most relevant answers in the most informative way. The template for one such link looks like this:

Let's look at this form of result in more detail. For search result title Yandex often uses the name of the page title (what optimizers write in the title tag). If it is not there, then the words from the title of the article or post appear here. If the title text is large, the search engine places in this field the fragment that is most relevant to the given query.

Very rarely, but it happens that the title does not match the content of the request. In this case, Yandex forms its search result title using the text in the article or post. It will definitely have query words.

For snippet the search engine uses all the text on the page. It selects all the fragments where the answer to the query is present, and then selects the most relevant one and inserts links to the document into the form field. Thanks to this approach, a competent optimizer can remake it after seeing a snippet, thereby improving the attractiveness of the link.

To better perceive the result of a user's request, headings are formatted as links in the text (highlighted in blue with underlining). To make the web resource attractive and recognizable, a favicon is added - a small corporate icon of the site. It appears to the left of the text on the first line before the heading. All words that were included in the request in the response are also highlighted in bold for ease of perception.

Recently, the Yandex search engine has been adding various information to the snippet that will help the user find their answer even faster and more accurately. For example, if a user writes the name of an organization in his request, then Yandex will add its address, contact numbers and a link to the location in geographic maps in the snippet. If the search engine is familiar with the structure of the site, which contains a document with an answer for the user, it will definitely show it. Plus, Yandex can immediately add the most visited pages of such a web resource to the snippet so that, if desired, the visitor can immediately go to the section he needs, saving his time.

There are snippets that contain the price of a product for an online store, a hotel or restaurant rating in the form of stars, and other interesting information with various numbers about objects in search documents. The purpose of such information is to provide a complete list of data about those items or objects that are of interest to the user.

In general, with various examples, the page with answers will look like this:

Ranking and assessors

Yandex’s task includes not only searching for all possible answer options, but also selecting the best (relevant) ones. After all, the user will not rummage through all the links that Yandex will provide him with as a search result. The process of organizing search results is called ranking . That is, it is the ranking that determines the quality of the proposed answers.

There are rules by which Yandex determines relevant pages:

Sites that degrade the search quality will be downgraded in positions on the results page. Usually these are web resources whose owners are trying to deceive the search engine. For example, these are sites with pages containing meaningless or invisible text. Of course, it is visible and understandable to a search robot, but not to a visitor reading this document. Or sites that, when clicking on a link in the search results area, immediately transfer the user to a completely different site.
Sites containing erotic content are not included in the results or are greatly reduced in ranking. This is due to the fact that such web resources often use aggressive promotion methods.
Sites infected with viruses are not lowered in search results and are not excluded from search results - in this case, the user is informed about the danger using a special icon. This is due to the fact that Yandex assumes that such web resources may contain important documents at the request of a search engine visitor.

For example, this is how Yandex will rank sites for the query “apple”:

In addition to ranking factors, Yandex uses special samples with queries and answers that search engine users consider the most suitable. No machine can make such samples at the moment - this is the prerogative of man. In Yandex, such specialists are called assessors. Their task is to fully analyze all search documents and evaluate responses to specified queries. They select the best answers and create a special training set. In it, the search engine sees the relationship between relevant pages and their properties. Having such information, Yandex can select the optimal ranking formula for each request. The method for constructing such a formula is called Matrixnet. The advantage of this system is that it is resistant to overfitting, which allows you to take into account a large number of ranking factors without increasing the number of unnecessary ratings and patterns.

At the end of my post, I want to show you interesting statistics collected by the Yandex search engine in the process of its work.

1. Popularity of personal names in Russia and Russian cities (data taken from accounts of bloggers and social network users in March 2012).

Great Seer

In 1863, the great writer Jules Verne created his next book, “Paris in the 20th Century.” In it, he described in detail the subway, the car, the electric chair, the computer and even the Internet. However, the publisher refused to print the book and it lay there for more than 120 years until it was found by the great-grandson of Jules Verne in 1989. The book was published in 1994.

1. Terms and definitions In this agreement on the processing of personal data (hereinafter referred to as the Agreement), the terms below have the following definitions: Operator - Individual Entrepreneur Oleg Aleksandrovich Dneprovsky. Acceptance of the Agreement - full and unconditional acceptance of all the terms of the Agreement by sending and processing personal data. Personal data - information entered by the User (subject of personal data) on the site and directly or indirectly related to this User. User - any individual or legal entity who has successfully completed the procedure of filling out the input fields on the site. Filling out input fields is the procedure for the User to send their first name, last name, phone number, personal email address (hereinafter referred to as Personal Data) to the database of registered users of the site, carried out for the purpose of identifying the User. As a result of filling out the input fields, personal data is sent to the Operator’s database. Filling out the input fields is voluntary. website - a website located on the Internet and consisting of one page. 2. General provisions 2.1. This Agreement is drawn up on the basis of the requirements of the Federal Law of July 27, 2006 No. 152-FZ “On Personal Data” and the provisions of Article 13.11 on “Violation of the legislation of the Russian Federation in the field of personal data” of the Code of Administrative Offenses of the Russian Federation and is valid for all personal data that the Operator can obtain about the User while using the Site. 2.2. Filling out the input fields by the User on the Site means the User’s unconditional agreement with all the terms of this Agreement (Acceptance of the Agreement). In case of disagreement with these conditions, the User does not fill out the input fields on the Site. 2.3. The User’s consent to the provision of personal data to the Operator and their processing by the Operator is valid until the termination of the Operator’s activities or until the User withdraws consent. By accepting this Agreement and going through the Registration procedure, as well as by subsequently accessing the Site, the User confirms that, acting of his own free will and in his own interest, he transfers his personal data for processing to the Operator and agrees to their processing. The User is notified that the processing of his personal data will be carried out by the Operator on the basis of the Federal Law of July 27, 2006 No. 152-FZ “On Personal Data”. 3. List of personal data and other information about the user to be transferred to the Operator 3. 1. When using the Operator’s Website, the User provides the following personal data: 3.1.1. Reliable personal information that the User provides about himself independently when Filling out input fields and/or in the process of using the Site services, including last name, first name, patronymic, telephone number (home or mobile), personal email address. 3.1.2. Data that is automatically transferred to the Site services during their use using software installed on the User’s device, including IP address, information from Cookies, information about the User’s browser (or other program through which the services are accessed). 3.2. The Operator does not verify the accuracy of the personal data provided by the User. In this case, the Operator assumes that the User provides reliable and sufficient personal information on the questions proposed in the Input Fields. 4. Purposes, rules for the collection and use of personal data 4.1. The Operator processes personal data that is necessary to provide services and provide services to the User. 4.2. The User's personal data is used by the Operator for the following purposes: 4.2.1. User identification; 4.2.2. Providing the User with personalized services (as well as informing about new promotions and services of the company by sending letters); 4.2.3. Maintaining contact with the User if necessary, including sending notifications, requests and information related to the use of services, provision of services, as well as processing requests and applications from the User; 4.3. During the processing of personal data, the following actions will be performed: collection, recording, systematization, accumulation, storage, clarification (updating, changing), extraction, use, blocking, deletion, destruction. 4.4. The user does not object that the information specified by him in certain cases may be provided to authorized state bodies of the Russian Federation in accordance with the current legislation of the Russian Federation. 4.5. The User's personal data is stored and processed by the Operator in the manner provided for in this Agreement for the entire period of activity by the Operator. 4.6. The processing of personal data is carried out by the Operator by maintaining databases, automated, mechanical, and manual methods. 4.7. The Site uses Cookies and other technologies to track the use of Site services. This data is necessary to optimize the technical operation of the Site and improve the quality of service provision. The Site automatically records information (including URL, IP address, browser type, language, date and time of request) about each visitor to the Site. The user has the right to refuse to provide personal data when visiting the Site or disable Cookies, but in this case, not all functions of the Site may work correctly. 4.8. The confidentiality conditions provided for in this Agreement apply to all information that the Operator can obtain about the User during the latter’s stay on the Site and use of the Site. 4.9. Information that is publicly disclosed during the execution of this Agreement, as well as information that can be obtained by the parties or third parties from sources to which any person has free access, is not confidential. 4.10. The Operator takes all necessary measures to protect the confidentiality of the User’s personal data from unauthorized access, modification, disclosure or destruction, including: ensuring constant internal verification of the processes of collecting, storing and processing data and ensuring security; ensures physical security of data, preventing unauthorized access to technical systems that ensure the operation of the Site, in which the Operator stores personal data; provides access to personal data only to those employees of the Operator or authorized persons who need this information to perform duties directly related to the provision of services to the User, as well as the operation, development and improvement of the Site. 4.11. The User's personal data remains confidential, except in cases where the User voluntarily provides information about himself for general access to an unlimited number of persons. 4.12. The transfer by the Operator of the User’s personal data is legal during the reorganization of the Operator and the transfer of rights to the Operator’s legal successor, while all obligations to comply with the terms of this Agreement in relation to the personal information received by him are transferred to the legal successor. 4.13. This Statement applies only to the Operator’s Website. The Company does not control and is not responsible for third party sites (services) that the user can access via links available on the Operator’s Website, including in search results. On such Sites (services), other personal information may be collected or requested from the user, and other actions may be performed 5. Rights of the user as a subject of personal data, change and deletion of personal data by the user 5.1. The user has the right: 5.1.2. Require the Operator to clarify his personal data, block it or destroy it if the personal data is incomplete, outdated, inaccurate, illegally obtained or not necessary for the stated purpose of processing, and also take measures provided by law to protect his rights. 5.1.3. Receive information regarding the processing of his personal data, including information containing: 5.1.3.1. confirmation of the fact of processing of personal data by the Operator; 5.1.3.2. the purposes and methods of processing personal data used by the operator; 5.1.3.3. name and location of the Operator; 5.1.3.4. processed personal data related to the relevant subject of personal data, the source of their receipt, unless a different procedure for the presentation of such data is provided for by federal law; 5.1.3.5. terms of processing of personal data, including periods of their storage; 5.1.3.6. other information provided for by the current legislation of the Russian Federation. 5.2. Withdrawal of consent to the processing of personal data can be carried out by the User by sending the Operator an appropriate written (printed on a tangible medium and signed by the User) notification. 6. Responsibilities of the Operator. Access to personal data 6.1. The Operator undertakes to ensure the prevention of unauthorized and non-targeted access to personal data of Users of the Operator's Website. In this case, authorized and targeted access to the personal data of Site Users will be considered access to them by all interested parties, implemented within the framework of the objectives and subject of the Operator’s Site. At the same time, the Operator is not responsible for possible misuse of Users’ personal data that occurs as a result of: technical problems in the software and in hardware and networks beyond the control of the Operator; in connection with the intentional or unintentional use of the Operator’s Websites other than for their intended purpose by third parties; 6.2 The Operator takes necessary and sufficient organizational and technical measures to protect the user’s personal information from unauthorized or accidental access, destruction, modification, blocking, copying, distribution, as well as from other unlawful actions of third parties with it. 7. Changes to the Privacy Policy. Applicable legislation 7.1. The Operator has the right to make changes to these Regulations without any special notice to Users. When changes are made to the current edition, the date of the last update is indicated. The new edition of the Regulations comes into force from the moment of its publication, unless otherwise provided by the new edition of the Regulations. 7.2. The law of the Russian Federation shall apply to this Regulation and the relationship between the User and the Operator arising in connection with the application of the Regulation. I accept I do not accept