Author vectors patent is a Google patent filed in 2018, which uses neural networks that can distinguish who writes content based on text classification. Author classification could some day be an influential ranking factor to determine what popular content is by distinguished authors in the SERPs (search engine result pages). Author classification has been used in libraries for decades to identify writing styles you may prefer to follow, so it makes sense to use it in search results.
A Neural Network is a series of algorithms that endeavor to recognize underlying relationships in a set of data through a process that mimics how the human brain operates.
Author verticals use a similar approach as the website representation vectors patent, which uses neural networks to process website content to classify industries and expertise levels.
How do Author Vectors work?
The patent abstract:
“Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating author vectors.
One of the methods includes obtaining a set of sequences of words, the set of sequences of words comprising a plurality of first sequences of words and, for each first sequence of words, a respective second sequence of words that follows the first sequence of words, wherein each first sequence of words and each second sequence of words has been classified as being authored by a first author; and training a neural network system on the first sequences and the second sequences to determine an author vector for the first author, wherein the author vector characterizes the first author.”
This abstract means that Google can use neural networks to learn about and understand the styles of authors and to be able to tell them apart.
The patent tells us:
“The author vector generated by the author vector system for a given author is a vector of numeric values that characterizes the author.
In particular, depending on the context of the use of the author vector, the author vector can characterize one or more of the communication style of the author, the author’s personality type, the author’s likelihood of selecting certain content items, and other characteristics of the author.”
The author vectors patent explains how it may classify authors:
“Text classification systems can classify pieces of electronic text, e.g., electronic documents. For example, text classification systems can classify a piece of text related to one or more of a set of predetermined topics. Some text classification systems receive input features of the piece of text and use the features to generate the classification for the piece of text.”
The patent also explains how neural networks operate:
“Neural networks are machine learning models that employ one or more layers of models to generate an output, e.g., a classification, for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer of the network. Each layer of the network generates an output from a received input by current values of a respective set of parameters.”
So what does this mean?
The author vector patent analyzes a set of sequences of words; these sequences of words are compared to the first and second sequences and vice versa. Comparing content against each other detects a specific writing style and can be used to classify particular authors.
A sequence can consist of:
- A sentence.
- A paragraph.
- A collection of multiple paragraphs.
- A search query.
- Another group of multiple natural language words.
A neural network system could be trained on those sets of sequences to determine authorship and characterize a particular set of text.
Author classification can be used on text without the author ever being labeled. This classification means content can be attributed to a specific author without attaching a name to the articles.
Once an author’s vector has been generated, it can be used to characterize with different properties. Meaning it can be used across websites, social media, email platforms, files (e.g., PDF, word) etc…
What are some real-world uses for Author Vectors?
The Author classification could generate reputation scores, apply them as a ranking factor, understand the context of words in queries better, classify websites, and identify authorship.
We have no idea when Author Vectors was developed, deployed, or ever used in production, but it tells us Google can effectively identify and organize writing styles.
Patent listing can be found at:
Generating author vectors
Inventors: Brian Patrick Strope and Quoc V. Le
Assignee: Google LLC
US Patent: 10,599,770
Granted: March 24, 2020
Filed: May 29, 2018
Credit must be given to Bill Slawski; I started reading his articles and found them too advanced, so I started re-writing them to understand the patents better. For a more advanced understanding of these patents, please check out his blog.
Published on: 2021-04-16
Updated on: 2021-05-25