Miscellaneous information on search engines

    Google sandbox
 
In early 2004, a new and mysterious term appeared among SEO specialists - Google sandbox. This is the name of a new Google spam filter that excludes new sites from search results. Work Sandbox filter results in new jobs being absent from search results for almost every phrase. This also happens with sites that have high quality and unique content which are promoted using lawful techniques.

   Sandbox is currently applied only in the English segment of the Internet; sites in other languages ​​are not yet affected by this filter. However, this filter can expand its influence. It is assumed that the sandbox filter purpose is to exclude spam sites - indeed, no search spammer will be able to wait for months until he gets the right results. However, many perfectly valid New suffer the consequences. So far, there is no accurate information about what is actually sandbox filter. Here are some guesses based on practical experience seo:

   - Sandbox is a filter that is applied to new places. A new site has been in the sandbox and is kept there for some time until the search engine starts treating it as a normal country.

   - Sandbox is a filter applied to new inbound links to new pages. There is a fundamental difference between this and the previous assumption: the filter is not based on the age of the site, but at the age of inbound links to the site. In other words, Google treats the site normally, but it refuses to recognize any inbound links to it if they have existed for a few months. Since such inbound links are one of the main factors to rank, ignoring inbound links is equivalent to the country being absent from search results. It's hard to say which of these assumptions is true, it is quite possible that they are both true.

   - The site can be held in the sandbox from 3 months to one year or more. It has also been noticed that the pages are released from the sandbox in batches. This means that the time the pages are kept in the sandbox is not calculated individually for each country, but also for groups of countries. All pages created within a certain period of time are put into the same group and they are eventually released all at the same time. Thus, the individual pages in a group can take different times depending on the sandbox where they were in the capture-release cycle group.

   Typical signs that your site is in the sandbox include:

   - Your page is indexed by Google and robot normally visits it regularly.
   - Your site has a PageRank, search engine knows about and correctly displays inbound links to your site.
   - A search of the Internet address (www.site.com) displays accurate results, exact title, PCS (resource description), etc.
   - Your page is found rare and unique word combinations present in the text of its pages.
   - Your page is not displayed in the first thousand results for any other questions, even for those for which it was originally created. Sometimes, there are exceptions and the site appears among 500-600 positions for some questions. This will not change the situation sandbox, of course.

   There is no practical way to bypass the sandbox filter. There have been some suggestions about how it can be done, but they are no more than suggestions and are of little use for a regular webmaster. The best course of action is to continue work on SEO site content and structure, and wait patiently until the sandbox is enabled, after which you can expect a dramatic increase in ratings, to positions 400-500.

    Google LocalRank
 
On 25 February 2003, the company Google patented a new algorithm for ranking sites called LocalRank. It is based on the idea that we should not with their quotes sites ranked global links, but by the way they are cited among the sites that deal with topics related to the specific question. LocalRank algorithm is not used in practice (at least, not in the form that is described in the patent). However, the patent contains several interesting innovations we think any SEO specialist should know about. Almost all search engines now consider topics that refer to pages are devoted. Looks rather different algorithms are used to algorithm LocalRank and patent research will allow us to learn general ideas about how it can be implemented.

   While reading this section, please keep in mind that it contains information theoretical than practical guidelines.

   These three items constitute the main idea of ​​algorithm LocalRank:

   1. An algorithm is used to select a certain number of documents relevant to the research question (let it be N). These documents are sorted first by some criteria (this can be, PageRank importance or a group of other criteria). Let us call this OldScore numerical value criterion.

   2. Each of the selected NN websites goes through a new ranking procedure and it takes a new rank. Let's call it LocalScore.

   3. Values ​​OldScore and LocalScore for each page are multiplied to give a new value - NewScore. Sites are listed at the bottom under NewScore.

   Key procedures in this algorithm is a new ranking procedure, which gives every page a new order LocalScore. Let us examine this new procedure in more detail:

   0. An initial ranking algorithm is used to select the N pages relevant to the search. Each of the sites N is allocated a value OldScore by this algorithm. New ranking algorithm only needs to work on these selected sites N. .

   1. While LocalScore calculation for each page, the system selects those from N sites that have inbound links on this page. Let this number M. At the same time, any other pages from the same host (as determined by IP address) and pages that are given page statement shall be exempt from M.

   2. M group is divided into subsets Li. These subsets contain pages grouped according to the following criteria:
   - I belong to one (or similar) hosts. Thus, the sites of which the first three octets in their IP addresses are the same will get in a group. This means that sites whose IP addresses belong xxx.xxx.xxx.0 range xxx.xxx.xxx.255 shall be deemed to belong to a group.
   - Sites that have the same or similar content (mirrors)
   Pages in the same location (domain).

   3. Each page in each subset Li OldScore on. A site with great OldScore rank is taken from each subset, the rest of the pages are excluded from the analysis. So, we've got some among K pages referring to this page.

   4. K among sites are sorted by parameter OldScore, then only the first pages k (k is a predetermined number) are left in K. Among the rest of the pages are excluded from the analysis.

   5. LocalScore is calculated in this step. Parameters OldScore are combined together for the rest of k pages. This can be shown with the help of the following formula:

   There is a predetermined parameter m can vary from one to three. Unfortunately, the patent for the algorithm in question did not describe this parameter in detail.

   After LocalScore is calculated for every page of N set, values ​​NewScore are calculated and the pages are re-ranked according to the new criteria. The following formula is used to calculate NewScore:

   NewScore (I) = (a + LocalScore (I) / MaxLS) * (+ b OldScore (I) / MaxOS)

   The page to which the new grade is calculated.

   a and b - are numerical constant (no detailed information about these parameters patent).

   MaxLS - is maximum LocalScore among those calculated.

   MaxOS - is the maximum value among values ​​OldScore.

   Now let's put aside the math and explain these simple steps in question.

   In step 0) the relevant pages in question are selected. Algorithms that do not take into account the link text used for this. For example, relevance and overall popularity links are used. We now have a group of values ​​OldScore. OldScore is an assessment of each site based on Relevance, overall link popularity and other factors.

   In step 1) inbound links to the web pages of interest are selected from the group obtained in step 0). The group whittled down by removing mirror and other countries in steps 2), 3) and 4) so ​​that we are left with a set of truly unique sites that all share a common theme with the site that is under analysis. Analyzing inbound links from sites in this group (ignoring all the other sites on the Internet), we get local popularity (thematic) link.

   LocalScore values ​​are then calculated in step 5). LocalScore is an assessment of a particular site among the pages related to the topic. Finally, the sites are rated and sorted by using a combination LocalScore and OldScore.

    SEO tips, assumptions, observations
 
This section provides information based on an analysis of various articles SEO, communication optimization specialists, practical experience and so on. This is a collection of interesting ideas and useful tips and assumptions. Do not see this section as it is written in stone, but as a collection of information and suggestions for your consideration.

   - Links out. Publish about authoritative sources in your subject area using appropriate keywords. Search engines place a high value in relation to other resources based on the same theme.

   - Links out. Do not publish links to FFA pages and other countries excluded from the search engines indexes. Doing so may reduce the assessment of your site.

   - Links out. A page must not contain more than 50-100 out connection. More links will not hurt your credit rating site but links beyond that number will not be recognized by search engines.

   - Within the site-wide links. These are links provided on every page of the site. It is believed that the search engines do not approve of such links and does not consider them while ranking pages. Another thought is that this is only true for large countries with thousands of pages.

   - Ideal keyword density is a common theme SEO discussion. Real answer is that there is the ideal keyword density. It is different for each query and the search engines calculate it dynamically for each query. Our advice is to analyze the front pages in the search results for a particular query. This will allow you to evaluate the optimal approximate density for specific actions.

   - Age Site. Search engines prefer old sites because they are more durable.

   - Site Updates. Search engines prefer sites that are constantly evolving. Development sites are those in which new information and new sites periodically appear.

   - Domain Zone. Search engines prefer sites that are located in the area. Edu,. Mil. Gov, etc. Only the relevant organizations can register domains such domains so that they are more reliable.

   - Search engines pursue percent of visitors to immediately return to the search, as they visit a site through a search result link. A number of immediate returns means that the content is not probably related to the relevant topic and the ranking of such a page gets lower.

   - Search engines track how often a link is selected in the search results. If some link is only occasionally selected, it means that the site is of little interest and assessment of such a page gets lower

   - Use synonyms and derived forms such as keywords, the search engines will appreciate that (word stem).

; - Search engines consider a very rapid increase in inbound links as artificial promotion and this results in the reduction of evaluation. This is a controversial topic, because this method can be used to reduce the assessment of one's competitors.

   - Google no matter inbound links, if they are (or similar) hosts the same. It was discovered using host IP addresses. Pages whose IP addresses are within the range of xxx.xxx.xxx.0 in xxx.xxx.xxx.255. are considered to be on the same host. This view is more likely to be rooted in the fact that Google have expressed this idea in their patents. However, Google employees claim that no IP address restrictions are placed on inbound links and there is no reason not to believe them.

   - Search engines check the information on the owners of domains. Inbound links originating from a variety of countries that belong to all of an owner are considered less important than normal links. This information is presented in a patent.

   - Search engines prefer sites with long-term domain registrations.