So what is LSA ?
There has been a lot of talk and hype on search relevancy scores recently. Some of those well in the know, attribute this to latent semantic analysis. Even if they are not using LSA Google has likely been using other word relationship technologies for a while, but recently increased its weighting.
The next question for all those who are not in the know is “What is Latent Semantic Analysis ” ?
Latent semantic analysis (LSA) is a technique in natural language processing, in particular in vectorial semantics, of analysing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms.
LSA was patented in 1988 [1] by Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum and Lynn Streeter. In the context of its application to information retrieval, it is sometimes called latent semantic indexing (LSI).
Occurrence matrix
LSA can use a term-document matrix which describes the occurrences of terms in documents; it is a sparse matrix whose rows correspond to terms and whose columns correspond to documents, typically stemmed words that appear in the documents. A typical example of the weighting of the elements of the matrix is tf-idf (term frequency–inverse document frequency): the element of the matrix is proportional to the number of times the terms appear in each document, where rare terms are up-weighted to reflect their relative importance.
This matrix is also common to standard semantic models, though it is not necessarily explicitly expressed as a matrix, since the mathematical properties of matrix are not always used.
LSA transforms the occurrence matrix into a relation between the terms and some concepts, and a relation between those concepts and the documents. Thus the terms and documents are now indirectly related through the concepts.
Source http://en.wikipedia.org/wiki/Latent_semantic_analysis
How Does Latent Semantic Indexing Work?
Latent semantic indexing allows a search engine to determine what a page is about outside of specifically matching search query text.
A page about Apple computers will likely naturally have terms such as iMac or iPod on it.
Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn’t understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent.
By placing additional weight on related words in content, or words in similar positions in other related documents, LSI has a net effect of lowering the value of pages which only match the specific term and do not back it up with related terms.
LSI vs semantically Related Words:
I realised that many SEO’s analyst (like me) blended the concepts of semantically related words with latent semantic indexing, and due to constraints of the web it is highly unlikely that large scale search engines are using LSI on their main search indexes. Nevertheless I can only make an assumption on limited research on current retrieval model builds for search engine technology .
Nonetheless, it is overtly obvious to anyone who studies search relevancy algorithms by watching the results and ranking pages that the following are true for Google:
Search engines such as Google do try to figure out phrase relationships when processing queries, improving the rankings of pages with related phrases even if those pages are not focused on the target term
pages that are too focused on one phrase tend to rank worse than one would expect (sometimes even being filtered out for what some SEOs call being over-optimized) pages that are focused on a wider net of related keywords tend to have more stable rankings for the core keyword and rank for a wider net of keywords
Given the above, here are tips to help increase your page relevancy scores and make your rankings far more stable…
Mix Your Anchor Text!
Latent semantic indexing (or similar technologies) can also be used to look at the link profile of your website. If all your links are heavy in a few particular phrases and light on other similar phrases then your site may not rank as well.
Example Related Terms:
Implement other anchor text combinations to make the linkage data appear less manipulative.
Instead of using SEO in all the links some of them may use phrases like
search engine optimisation
search engine marketing
search engine placement
search engine positioning
search engine promotion
search engine ranking
etc.
Instead of using book in all the links some other good common words might be
ebook
manual
guide
tips
report
tutorial
etc.
How do I Know What Words are Related?
There are a variety of options to know what words are related to one another.
Search Google for search results with related terms using a ~. For example, Google Search: ~seo will return pages with terms matching or related to seo and will highlight some of the related words in the search results.
Understanding the semantic relationships of words is just another piece of the relevancy algorithms, though many sites will significantly shift in rankings due to it.
Copy-writing and keyword relevancy on your site will attribute building the relationship between keyword density and ROI (return of investment)
Victor Quinteros (SEO Analyst)
CEO
Q Interactive Pty Ltd