[SEO] Google Search Internal Engineering Documents Leaked!?

seo-google-search-internal-engineering-documents-leaked
This article can be read in about 9 minutes.

Introduction.

On May 27, 2024, Google’s search algorithm was probably the largest(?) in history. The company said it was compromised in the

It is also clear that a number of algorithms that had long been suspected by the SEO industry, though denied by Google itself, were in fact real.

So, to summarize.

Click here to read the original article.

Secrets from the Algorithm: Google Search’s Internal Engineering Documentation Has Leaked
Learn what you always wish you knew about Google's algorithms.

Google Search Algorithm Leak Contents

More than 14,000 ranking factors

There are over 14,000 ranking factors in Google’s API documentation. This reveals which factors influence the ranking of search results. These factors are wide-ranging and include quality of content, quality of links, and user behavior.

Existence of Domain Authority

Although Google officially denies the existence of domain authority, internal documents confirm the existence of “site authority. This suggests that certain sites are deemed more authoritative than others and may receive preferential treatment in search results.

As part of the compression quality signal stored with each document, Google calculates a feature called “siteAuthority”.

Use of click data

This also differs from Google’s official position, which is that user click data is used for ranking by a ranking system called NavBoost. This means that data such as which links users click and how much time they spend on a page influences a site’s rating. Sites with higher user engagement tend to receive higher rankings.

NavBoost has been in existence since 2005 and is based on click data from the past 18 months.

Existence of a sandbox

A “sandbox” period is established for new websites and low-trust sites. During this period, a site’s ranking will be limited.

This will prevent new sites from gaining high rankings immediately and should build credibility over time.

Using Chrome’s Data

The official position is that Chrome usage data does not affect the search, but in fact Chrome data was used.

Panda Algorithm

Panda’s “Site Quality Score” patent indicates that the ratio of referring queries to user selections and clicks affects the score.

If they want to maintain their rankings, they need touse a broader range of queries to get moresuccessful clicks and a greater variety of links.

The author clearly features

As per Google’s official EEAT recommendation, the author attribute was a feature; Google explicitly stores the author associated with a document as text. It also determines if the entity on the page is also the author of the page.

For more information on the evaluation standard called EEAT, which was introduced in September 2023, click here.

Retention of the 20 most recent page updates

Google maintains a history of the 20 most recent updates to indexed pages. This is to prevent hacks who abuse the reputation of a page and then change it to different content to lead visitors to it after the page’s reputation has increased. For example, it prevents people from creating high quality pages and then replacing them with spam content.

→It is important to carefully update and maintain the quality of highly rated pages. Content and quality of updates are more important than the frequency of updates themselves

Short content is valued for originality.

OriginalContentScore indicates that short content is scored based on originality. In other words, content ratings do not necessarily depend on the number of characters

Date of content (freshness of article) is very important

Google is very focused on up-to-date results, and the document shows numerous attempts to correlate dates and pages. The best practice is to specify a date and keep it consistent across structured data, page titles, and XML sitemaps. Putting dates in URLs that conflict with dates elsewhere on the page may degrade the performance of the content.

Sites specializing in video are treated differently.

If more than 50% of the pages on a site contain video, the site is considered video-centric and treated differently.

Topic Checking by Embedding

Google vectors pages and sites and compares page embeds to site embeds to see how far off topic (the subject or theme of the page or site as a whole) the page is.

SEO measures based on the above

Personally, although there was no information that I felt was particularly brand new, I was impressed by the fact that embedding checks to see if the content of the page is consistent with the purpose of the site.

As well as article titles and content, if there are many articles, this means that the overall theme of the site and the consistency of each article’s content are also being looked at.

I was also interested in the existence of the sandbox and the fact that the number of letters is not the only thing that is evaluated.

Copied title and URL