Back to Blog
Tutorial

How to Measure Content Relevance Using Cosine Similarity

A step-by-step tutorial on computing cosine similarity between queries and content, with simple math examples you can follow along.

Want to know exactly how relevant your content is to a search query? Cosine similarity gives you a precise, mathematical answer. In this tutorial, we'll walk through computing cosine similarity step-by-step, with simple examples you can follow along.

Whether you're optimizing content for SEO, building a search system, or analyzing text relevance, cosine similarity is the gold standard for measuring semantic similarity. And the best part? The math is actually quite straightforward once you break it down.

What Is Cosine Similarity?

Cosine similarity measures the cosine of the angle between two vectors. In text analysis, we convert text into vectors (embeddings), then measure how similar those vectors are. The result is a number between -1 and 1:

  • 1.0 = Identical meaning (vectors point in the same direction)
  • 0.8-0.9 = Very similar (highly relevant)
  • 0.5-0.7 = Moderately similar (somewhat relevant)
  • 0.0-0.4 = Different meanings (low relevance)
  • -1.0 = Opposite meanings (vectors point in opposite directions)

Step 1: Convert Text to Embeddings

Before we can compute cosine similarity, we need to convert our text into numerical vectors called embeddings. These embeddings capture the semantic meaning of the text.

📝 Example Setup

Let's say we want to measure how relevant this content is to a search query:

  • Query: "best coffee maker"
  • Content: "Our premium coffee brewing machine features automatic drip technology and thermal carafe for optimal flavor extraction."

Note: In practice, you'd use an embedding model like OpenAI's embeddings API. For this tutorial, we'll use simplified 3-dimensional vectors to illustrate the concept.

Using an embedding model (like OpenAI's), we convert both texts into vectors. For our simplified example, let's say:

Query embedding (A):

A = [0.8, 0.6, 0.2]

Content embedding (B):

B = [0.7, 0.5, 0.3]

These vectors represent the semantic meaning in a high-dimensional space. In reality, embeddings are typically 1536 dimensions (for OpenAI), but we're using 3 dimensions here to make the math easier to follow.

Step 2: Understand the Cosine Similarity Formula

The cosine similarity formula is:

cosine_similarity = (A · B) / (||A|| × ||B||)

Where:

  • A · B = Dot product of vectors A and B
  • ||A|| = Magnitude (length) of vector A
  • ||B|| = Magnitude (length) of vector B

Step 3: Calculate the Dot Product

The dot product (A · B) is calculated by multiplying corresponding elements and summing them up:

Dot Product Calculation:

A · B = (A₁ × B₁) + (A₂ × B₂) + (A₃ × B₃)

A · B = (0.8 × 0.7) + (0.6 × 0.5) + (0.2 × 0.3)

A · B = 0.56 + 0.30 + 0.06

A · B = 0.92

Step 4: Calculate Vector Magnitudes

The magnitude (or length) of a vector is calculated using the Pythagorean theorem:

Magnitude of Vector A:

||A|| = √(A₁² + A₂² + A₃²)

||A|| = √(0.8² + 0.6² + 0.2²)

||A|| = √(0.64 + 0.36 + 0.04)

||A|| = √1.04

||A|| = 1.02

Magnitude of Vector B:

||B|| = √(B₁² + B₂² + B₃²)

||B|| = √(0.7² + 0.5² + 0.3²)

||B|| = √(0.49 + 0.25 + 0.09)

||B|| = √0.83

||B|| = 0.91

Step 5: Compute Cosine Similarity

Now we have everything we need. Let's plug the values into the formula:

Final Calculation:

cosine_similarity = (A · B) / (||A|| × ||B||)

cosine_similarity = 0.92 / (1.02 × 0.91)

cosine_similarity = 0.92 / 0.9282

cosine_similarity = 0.991

✅ Interpretation

A cosine similarity of 0.991 means the query and content are extremely similar semantically. This indicates the content is highly relevant to the search query "best coffee maker."

Real-World Example: Comparing Multiple Content Pieces

In practice, you'll often compare one query against multiple content pieces to find the most relevant one. Let's see how this works:

Query: "how to make coffee"

Let's compare it against three different content pieces:

Content 1: "Step-by-step guide to brewing coffee"

Cosine Similarity: 0.89

Highly relevant - directly addresses the query

Content 2: "History of coffee cultivation"

Cosine Similarity: 0.45

Low relevance - related topic but doesn't answer the query

Content 3: "Best coffee maker reviews"

Cosine Similarity: 0.67

Moderate relevance - related but focuses on products, not process

Content 1 has the highest cosine similarity (0.89), making it the most relevant. This is exactly how search engines rank results - they compute cosine similarity between the query and all candidate pages, then rank by similarity score.

Why Cosine Similarity Works So Well

Cosine similarity is perfect for measuring text relevance because:

📏 Scale Invariant

It measures direction (meaning) rather than magnitude (length). A short phrase and a long article about the same topic can have identical cosine similarity.

🎯 Normalized

The result is always between -1 and 1, making it easy to interpret and compare across different queries and content.

⚡ Efficient

Computationally fast, making it practical for real-time search ranking and content analysis at scale.

🌐 Language Agnostic

Works with any language. The same embedding model can handle English, Spanish, Chinese, and more.

Implementing Cosine Similarity in Practice

Here's how you'd actually implement this using Python and OpenAI embeddings:

# Python example using OpenAI embeddings

import openai

import numpy as np

from numpy.linalg import norm

# Get embeddings

query = "best coffee maker"

content = "Premium coffee brewing machine..."

query_embedding = openai.Embedding.create(

input=query, model="text-embedding-ada-002"

)['data'][0]['embedding']

content_embedding = openai.Embedding.create(

input=content, model="text-embedding-ada-002"

)['data'][0]['embedding']

# Calculate cosine similarity

def cosine_similarity(a, b):

return np.dot(a, b) / (norm(a) * norm(b))

similarity = cosine_similarity(

query_embedding, content_embedding

)

print(f"Cosine similarity: {similarity:.3f}")

Using Cosine Similarity for Content Optimization

Now that you understand how cosine similarity works, here's how to use it to optimize your content:

1. Measure Current Relevance

Start by measuring the cosine similarity between your target search query and your existing content. This gives you a baseline score.

2. Identify Gaps

If your score is low (below 0.7), your content likely doesn't address the query's intent well. Analyze what concepts the query embedding captures that your content misses.

3. Optimize Iteratively

Revise your content to better match the query's semantic meaning. Re-measure after each revision. Aim for scores above 0.8 for highly relevant content.

4. Compare Against Competitors

Measure the cosine similarity of top-ranking pages for your target query. This shows you what level of semantic relevance you need to achieve.

Ready to Measure Your Content?

Meaning IQ makes it easy to measure cosine similarity between your content and search queries. No coding required - just paste your query and content, and get instant relevance scores.

Start Measuring Relevance

Powered by OpenAI embeddings and cosine similarity calculations.

Cosine similarity isn't just a mathematical concept - it's the practical tool that powers modern search and content optimization. Master it, and you master semantic relevance.