semanticlens package

Subpackages

Submodules

semanticlens.lens module

Lens: Main class for visual concept analysis and exploration.

This module provides the primary interface for semantic analysis of neural networks, combining component visualization with foundation models to explore relationships between visual concepts and text embeddings.

class semanticlens.lens.Lens(fm, device=None)[source]

Bases: object

Orchestration layer for feature extraction and concept probing.

The Lens class is the main entry point for using the semanticlens library. It provides a high-level, stateful interface that manages a foundation model and orchestrates the entire semantic analysis workflow, from computing a concept database to searching it and evaluating its interpretability.

This class simplifies the process by holding the state of the foundation model and providing convenient methods that wrap the core functionalities of the package.

Parameters:
  • fm (AbstractVLM) – An initialized vision-language foundation model that will be used for all embedding and probing tasks.

  • device (str or torch.device, optional) – The device to run the foundation model on (e.g., “cuda”, “cpu”). If None, the model’s current device is used.

fm

The foundation model instance used by the Lens.

Type:

AbstractVLM

device

The device on which the foundation model is located.

Type:

torch.device

Examples

>>> import torch
>>> from semanticlens import Lens
>>> from semanticlens.foundation_models import ClipMobile
>>> from semanticlens.component_visualization import ActivationComponentVisualizer
>>>
>>> # 1. Initialize the Lens with a foundation model
>>> fm = ClipMobile(device="cpu")
>>> lens = Lens(fm=fm)
>>>
>>> # 2. Assume `cv` is an initialized ActivationComponentVisualizer
>>> cv = ActivationComponentVisualizer(...)
>>>
>>> # 3. Compute the concept database
>>> concept_db = lens.compute_concept_db(cv)
>>>
>>> # 4. Probe the database with a text query
>>> aggregated_db = {"layer4": concept_db["layer4"].mean(dim=1)}
>>> scores = lens.text_probing("a photo of a cat", aggregated_db)
__init__(fm, device=None)[source]
compute_concept_db(cv, **kwargs)[source]

Compute or load from cache the concept database for a visualizer.

This method orchestrates the creation of the concept database, which is a semantic representation of the concepts learned by a model’s components. It follows an Inversion of Control (IoC) pattern by calling the internal _compute_concept_db method of the provided component visualizer cv.

If caching is enabled in the component visualizer, this method will first attempt to load the concept database from a pre-computed cache file. If the file does not exist, it will compute the database and save it to the cache for future use.

Parameters:
  • cv (AbstractComponentVisualizer) – An initialized component visualizer that has already collected the maximally activating samples for the target model’s components (i.e., cv.run() has been called).

  • **kwargs – Additional keyword arguments to be passed to the visualizer’s _compute_concept_db method, such as batch_size or num_workers.

Returns:

A dictionary mapping layer names to their concept databases. Each database is a tensor of shape (n_components, n_samples, embedding_dim).

Return type:

dict[str, torch.Tensor]

eval_clarity(concept_db)[source]

Compute the clarity score for concepts in the database.

Clarity measures how semantically coherent the examples for each concept are. A high clarity score suggests that a neuron has learned a well-defined, easily understandable concept.

This method wraps the clarity_score() function.

Parameters:

concept_db (torch.Tensor or dict[str, torch.Tensor]) – A concept database tensor of shape (n_components, n_samples, embedding_dim), or a dictionary mapping layer names to such tensors.

Returns:

A tensor of clarity scores for each component, or a dictionary of such tensors.

Return type:

torch.Tensor or dict[str, torch.Tensor]

See also

semanticlens.scores.clarity_score

The underlying function for the score.

eval_polysemanticity(concept_db)[source]

Compute the polysemanticity score for concepts in the database.

Polysemanticity measures whether a single neuron encodes multiple, semantically distinct concepts. The score is calculated by clustering the examples for each concept and measuring the diversity of the resulting cluster centers.

This method wraps the polysemanticity_score() function.

Parameters:

concept_db (torch.Tensor or dict[str, torch.Tensor]) – A concept database tensor of shape (n_components, n_samples, embedding_dim), or a dictionary mapping layer names to such tensors.

Returns:

A tensor of polysemanticity scores for each component, or a dictionary of such tensors.

Return type:

torch.Tensor or dict[str, torch.Tensor]

See also

semanticlens.scores.polysemanticity_score

The underlying function for the score.

eval_redundancy(aggregated_concept_db)[source]

Compute the redundancy score for concepts in the database.

Redundancy measures the degree of semantic overlap between different components (e.g. neurons) in a layer. It is calculated as the average maximal similarity of each component to any other component in the set.

This method wraps the redundancy_score() function.

Parameters:

aggregated_concept_db (torch.Tensor or dict[str, torch.Tensor]) – An aggregated concept database tensor of shape (n_components, embedding_dim), or a dictionary mapping layer names to such tensors.

Returns:

A tensor representing the mean redundancy score, or a dictionary of such scores for each layer.

Return type:

torch.Tensor or dict[str, torch.Tensor]

See also

semanticlens.scores.redundancy_score

The underlying function for the score.

image_probing(query, aggregated_concept_db)[source]

Probe a concept database with image queries to find matching concepts.

This method is a convenient wrapper around the stateless image_probing() function. It uses the foundation model stored within the Lens instance to perform the search.

Parameters:
Returns:

A tensor or a dictionary of tensors containing the cosine similarity scores between the query and the concepts.

Return type:

torch.Tensor or dict[str, torch.Tensor]

text_probing(query, aggregated_concept_db, templates=None, batch_size=None)[source]

Probe a concept database with text queries to find matching concepts.

This method is a convenient wrapper around the stateless text_probing() function. It uses the foundation model stored within the Lens instance to perform the search.

Parameters:
  • query (str or list[str]) – The text query or a list of text queries to search for.

  • aggregated_concept_db (torch.Tensor or dict[str, torch.Tensor]) – The aggregated concept database to search within, with tensors of shape (n_components, embedding_dim).

  • templates (list[str], optional) – A list of prompt templates, e.g., “a photo of {}”.

  • batch_size (int, optional) – The batch size for embedding text queries.

Returns:

A tensor or a dictionary of tensors containing the cosine similarity scores between the query and the concepts.

Return type:

torch.Tensor or dict[str, torch.Tensor]

semanticlens.lens.compute_concept_db(cv, fm)[source]

Compute a concept database in a stateless manner.

This function delegates the computation of the concept database to the provided component visualizer instance. It follows an Inversion of Control (IoC) pattern where the visualizer, which holds the logic for extracting concepts, is controlled by this function to perform the embedding using the provided foundation model.

Parameters:
  • cv (AbstractComponentVisualizer) – An initialized component visualizer instance (e.g., ActivationComponentVisualizer) that has already been run to find concept examples.

  • fm (AbstractVLM) – An initialized foundation model instance (e.g., OpenClip) used for embedding the concept examples.

  • **kwargs – Additional keyword arguments to be passed to the component visualizer’s internal computation method.

Returns:

A dictionary mapping layer names to their corresponding concept database. Each concept database is a tensor of shape (n_components, n_samples, embedding_dim).

Return type:

dict[str, torch.Tensor]

semanticlens.lens.image_probing(fm, query, aggregated_concept_db)[source]

Probe a concept database with image queries to find matching concepts.

This function searches for concepts that are semantically similar to a given query image or a set of query images. It embeds the image(s) using the foundation model and computes the cosine similarity against the concept embeddings in the database.

If a list of images is provided, their embeddings are averaged to form a single probe vector. This is useful for finding concepts that represent the common theme across multiple images.

Parameters:
  • fm (AbstractVLM) – An initialized foundation model instance for encoding the image query.

  • query (PIL.Image.Image or list[PIL.Image.Image]) – A single PIL image or a list of PIL images to use as the query.

  • aggregated_concept_db (torch.Tensor or dict[str, torch.Tensor]) – The aggregated concept database to search within. This should contain the mean embedding for each concept, resulting in a tensor of shape (n_components, embedding_dim). It can be a single tensor or a dictionary mapping layer names to tensors.

Returns:

A tensor or a dictionary of tensors containing the cosine similarity scores between the image query embedding and the concept embeddings. Higher scores indicate a closer semantic match.

Return type:

torch.Tensor or dict[str, torch.Tensor]

semanticlens.lens.text_probing(fm, query, aggregated_concept_db, templates=None, batch_size=None)[source]

Probe a concept database with text queries to find matching concepts.

This function searches for concepts in a model’s learned representations using natural language. It works by embedding the text query using the foundation model and then computing the cosine similarity between the query embedding and the concept embeddings in the database.

Parameters:
  • fm (AbstractVLM) – An initialized foundation model instance for encoding the text query.

  • query (str or list[str]) – The text query or a list of text queries to search for.

  • aggregated_concept_db (torch.Tensor or dict[str, torch.Tensor]) – The aggregated concept database to search within. This should contain a single embedding for each concept (e.g. mean aggregated), resulting in a tensor of shape (n_components, embedding_dim). It can be a single tensor or a dictionary mapping layer names to tensors.

  • templates (list[str], optional) – A list of prompt templates, e.g., “a photo of {}”. Using templates can improve search fidelity by averaging out the influence of the prompt’s structure. If provided, the embedding of an empty template is subtracted from the embedding of the filled template.

  • batch_size (int, optional) – The batch size for embedding text queries if multiple templates are used. If None, all templated queries are processed in one batch.

Returns:

A tensor or a dictionary of tensors containing the cosine similarity scores between the query embedding(s) and the concept embeddings. Higher scores indicate a closer semantic match.

Return type:

torch.Tensor or dict[str, torch.Tensor]

Examples

>>> import torch
>>> from semanticlens.foundation_models.clip import OpenClip
>>> from semanticlens.lens import text_probing
>>>
>>> # Mock foundation model and concept database
>>> fm = OpenClip(url="...")
>>> lens = ConceptLens(fm)
>>> concepts = lens.compute_concept_db(cv)
>>> # Find neurons related to "dogs"
>>> scores = text_probing(fm=fm,query="dog", aggregated_concept_db=concepts, templates=["a photo of a {}"])
>>> top_neuron = scores["layer4"].argmax()
>>> print(f"Top matching neuron for 'dog': {top_neuron.item()}")

semanticlens.scores module

Scoring functions for evaluating concept quality in semantic analysis.

This module provides various metrics to assess the quality and characteristics of learned concepts in neural networks, including clarity, redundancy, and polysemanticity scores.

semanticlens.scores.clarity_score(V)[source]

Compute clarity score for concept representations.

Measures how uniform the concept examples are, indicating how clear the representation is. Higher values indicate better clarity.

Parameters:

V (torch.Tensor) – Concept tensor of shape (n_neurons, n_samples, n_features).

Returns:

Clarity scores of shape (n_neurons,). Values in range [-1/(n_samples-1), 1], where higher values indicate clearer concepts.

Return type:

torch.Tensor

Examples

>>> V = torch.randn(10, 20, 512)  # 10 neurons, 20 samples, 512 features
>>> clarity = clarity_score(V)
>>> clarity.shape
torch.Size([10])
semanticlens.scores.polysemanticity_score(V, replace_empty_clusters=True, random_state=123, n_clusters=2)[source]

Compute polysemanticity score for concept representations.

Measures how polysemantic (multi-meaning) concepts are by clustering concept examples and computing clarity of cluster centers. Higher values indicate more polysemantic concepts.

Parameters:
  • V (torch.Tensor) – Concept tensor of shape (n_neurons, n_samples, n_features).

  • replace_empty_clusters (bool, default=True) – Whether to replace empty clusters with alternative computation.

  • random_state (int, default=123) – Random seed for K-means clustering reproducibility.

  • n_clusters (int, default=2) – Number of clusters for K-means algorithm.

Returns:

Polysemanticity scores of shape (n_neurons,). Values in range [0, 1], where higher values indicate more polysemantic concepts.

Return type:

torch.Tensor

Examples

>>> V = torch.randn(10, 20, 512)  # 10 neurons, 20 samples, 512 features
>>> poly = polysemanticity_score(V)
>>> poly.shape
torch.Size([10])
semanticlens.scores.redundancy_score(cones)[source]

Compute redundancy score for concept representations.

Measures the redundancy across neurons by computing pairwise similarities and taking the maximum similarity for each neuron.

Parameters:

cones (torch.Tensor) – Concept tensor of shape (n_neurons, n_features).

Returns:

Redundancy scores of shape (n_neurons,). Higher values indicate more redundant representations.

Return type:

torch.Tensor

Examples

>>> cones = torch.randn(10, 512)  # 10 neurons, 512 features
>>> redundancy = redundancy_score(cones)
>>> redundancy.shape
torch.Size([])
semanticlens.scores.similarity_score(x, y)[source]

Compute similarity score between two tensors.

Calculates cosine similarity between tensors x and y, handling different tensor shapes appropriately.

Parameters:
  • x (torch.Tensor) – First tensor for similarity computation.

  • y (torch.Tensor) – Second tensor for similarity computation.

Returns:

Similarity scores. Shape depends on input dimensions. For matrices: (x_n, y_n) where x_n and y_n are the number of vectors. For vectors: scalar similarity score.

Return type:

torch.Tensor

Raises:

ValueError – If tensor shapes are incompatible for similarity computation.

Examples

>>> x = torch.randn(5, 512)
>>> y = torch.randn(3, 512)
>>> sim = similarity_score(x, y)
>>> sim.shape
torch.Size([5, 3])

Module contents

SemanticLens: A package for mechanistic understanding and validation of large AI models.

SemanticLens provides tools for visual concept analysis and exploration of deep learning models, specifically designed for mechanistic interpretability and semantic analysis of foundation models.

Modules

foundation_models

Contains foundation model implementations including CLIP variants.

scores

Provides scoring functions for concept clarity, redundancy, and polysemanticity.

Classes

ConceptTensor

A tensor subclass for storing embeddings with associated metadata.

Lens

Main class for visual concept analysis and exploration.

Functions

label

Compute alignment of text embeddings with concept embeddings.

clarity_score

Measure how uniform concept examples are.

polysemanticity_score

Measure concept polysemanticity using clustering.

redundancy_score

Measure concept redundancy across neurons.

class semanticlens.Lens(fm, device=None)[source]

Bases: object

Orchestration layer for feature extraction and concept probing.

The Lens class is the main entry point for using the semanticlens library. It provides a high-level, stateful interface that manages a foundation model and orchestrates the entire semantic analysis workflow, from computing a concept database to searching it and evaluating its interpretability.

This class simplifies the process by holding the state of the foundation model and providing convenient methods that wrap the core functionalities of the package.

Parameters:
  • fm (AbstractVLM) – An initialized vision-language foundation model that will be used for all embedding and probing tasks.

  • device (str or torch.device, optional) – The device to run the foundation model on (e.g., “cuda”, “cpu”). If None, the model’s current device is used.

fm

The foundation model instance used by the Lens.

Type:

AbstractVLM

device

The device on which the foundation model is located.

Type:

torch.device

Examples

>>> import torch
>>> from semanticlens import Lens
>>> from semanticlens.foundation_models import ClipMobile
>>> from semanticlens.component_visualization import ActivationComponentVisualizer
>>>
>>> # 1. Initialize the Lens with a foundation model
>>> fm = ClipMobile(device="cpu")
>>> lens = Lens(fm=fm)
>>>
>>> # 2. Assume `cv` is an initialized ActivationComponentVisualizer
>>> cv = ActivationComponentVisualizer(...)
>>>
>>> # 3. Compute the concept database
>>> concept_db = lens.compute_concept_db(cv)
>>>
>>> # 4. Probe the database with a text query
>>> aggregated_db = {"layer4": concept_db["layer4"].mean(dim=1)}
>>> scores = lens.text_probing("a photo of a cat", aggregated_db)
__init__(fm, device=None)[source]
compute_concept_db(cv, **kwargs)[source]

Compute or load from cache the concept database for a visualizer.

This method orchestrates the creation of the concept database, which is a semantic representation of the concepts learned by a model’s components. It follows an Inversion of Control (IoC) pattern by calling the internal _compute_concept_db method of the provided component visualizer cv.

If caching is enabled in the component visualizer, this method will first attempt to load the concept database from a pre-computed cache file. If the file does not exist, it will compute the database and save it to the cache for future use.

Parameters:
  • cv (AbstractComponentVisualizer) – An initialized component visualizer that has already collected the maximally activating samples for the target model’s components (i.e., cv.run() has been called).

  • **kwargs – Additional keyword arguments to be passed to the visualizer’s _compute_concept_db method, such as batch_size or num_workers.

Returns:

A dictionary mapping layer names to their concept databases. Each database is a tensor of shape (n_components, n_samples, embedding_dim).

Return type:

dict[str, torch.Tensor]

eval_clarity(concept_db)[source]

Compute the clarity score for concepts in the database.

Clarity measures how semantically coherent the examples for each concept are. A high clarity score suggests that a neuron has learned a well-defined, easily understandable concept.

This method wraps the clarity_score() function.

Parameters:

concept_db (torch.Tensor or dict[str, torch.Tensor]) – A concept database tensor of shape (n_components, n_samples, embedding_dim), or a dictionary mapping layer names to such tensors.

Returns:

A tensor of clarity scores for each component, or a dictionary of such tensors.

Return type:

torch.Tensor or dict[str, torch.Tensor]

See also

semanticlens.scores.clarity_score

The underlying function for the score.

eval_polysemanticity(concept_db)[source]

Compute the polysemanticity score for concepts in the database.

Polysemanticity measures whether a single neuron encodes multiple, semantically distinct concepts. The score is calculated by clustering the examples for each concept and measuring the diversity of the resulting cluster centers.

This method wraps the polysemanticity_score() function.

Parameters:

concept_db (torch.Tensor or dict[str, torch.Tensor]) – A concept database tensor of shape (n_components, n_samples, embedding_dim), or a dictionary mapping layer names to such tensors.

Returns:

A tensor of polysemanticity scores for each component, or a dictionary of such tensors.

Return type:

torch.Tensor or dict[str, torch.Tensor]

See also

semanticlens.scores.polysemanticity_score

The underlying function for the score.

eval_redundancy(aggregated_concept_db)[source]

Compute the redundancy score for concepts in the database.

Redundancy measures the degree of semantic overlap between different components (e.g. neurons) in a layer. It is calculated as the average maximal similarity of each component to any other component in the set.

This method wraps the redundancy_score() function.

Parameters:

aggregated_concept_db (torch.Tensor or dict[str, torch.Tensor]) – An aggregated concept database tensor of shape (n_components, embedding_dim), or a dictionary mapping layer names to such tensors.

Returns:

A tensor representing the mean redundancy score, or a dictionary of such scores for each layer.

Return type:

torch.Tensor or dict[str, torch.Tensor]

See also

semanticlens.scores.redundancy_score

The underlying function for the score.

image_probing(query, aggregated_concept_db)[source]

Probe a concept database with image queries to find matching concepts.

This method is a convenient wrapper around the stateless image_probing() function. It uses the foundation model stored within the Lens instance to perform the search.

Parameters:
Returns:

A tensor or a dictionary of tensors containing the cosine similarity scores between the query and the concepts.

Return type:

torch.Tensor or dict[str, torch.Tensor]

text_probing(query, aggregated_concept_db, templates=None, batch_size=None)[source]

Probe a concept database with text queries to find matching concepts.

This method is a convenient wrapper around the stateless text_probing() function. It uses the foundation model stored within the Lens instance to perform the search.

Parameters:
  • query (str or list[str]) – The text query or a list of text queries to search for.

  • aggregated_concept_db (torch.Tensor or dict[str, torch.Tensor]) – The aggregated concept database to search within, with tensors of shape (n_components, embedding_dim).

  • templates (list[str], optional) – A list of prompt templates, e.g., “a photo of {}”.

  • batch_size (int, optional) – The batch size for embedding text queries.

Returns:

A tensor or a dictionary of tensors containing the cosine similarity scores between the query and the concepts.

Return type:

torch.Tensor or dict[str, torch.Tensor]

semanticlens.clarity_score(V)[source]

Compute clarity score for concept representations.

Measures how uniform the concept examples are, indicating how clear the representation is. Higher values indicate better clarity.

Parameters:

V (torch.Tensor) – Concept tensor of shape (n_neurons, n_samples, n_features).

Returns:

Clarity scores of shape (n_neurons,). Values in range [-1/(n_samples-1), 1], where higher values indicate clearer concepts.

Return type:

torch.Tensor

Examples

>>> V = torch.randn(10, 20, 512)  # 10 neurons, 20 samples, 512 features
>>> clarity = clarity_score(V)
>>> clarity.shape
torch.Size([10])
semanticlens.polysemanticity_score(V, replace_empty_clusters=True, random_state=123, n_clusters=2)[source]

Compute polysemanticity score for concept representations.

Measures how polysemantic (multi-meaning) concepts are by clustering concept examples and computing clarity of cluster centers. Higher values indicate more polysemantic concepts.

Parameters:
  • V (torch.Tensor) – Concept tensor of shape (n_neurons, n_samples, n_features).

  • replace_empty_clusters (bool, default=True) – Whether to replace empty clusters with alternative computation.

  • random_state (int, default=123) – Random seed for K-means clustering reproducibility.

  • n_clusters (int, default=2) – Number of clusters for K-means algorithm.

Returns:

Polysemanticity scores of shape (n_neurons,). Values in range [0, 1], where higher values indicate more polysemantic concepts.

Return type:

torch.Tensor

Examples

>>> V = torch.randn(10, 20, 512)  # 10 neurons, 20 samples, 512 features
>>> poly = polysemanticity_score(V)
>>> poly.shape
torch.Size([10])
semanticlens.redundancy_score(cones)[source]

Compute redundancy score for concept representations.

Measures the redundancy across neurons by computing pairwise similarities and taking the maximum similarity for each neuron.

Parameters:

cones (torch.Tensor) – Concept tensor of shape (n_neurons, n_features).

Returns:

Redundancy scores of shape (n_neurons,). Higher values indicate more redundant representations.

Return type:

torch.Tensor

Examples

>>> cones = torch.randn(10, 512)  # 10 neurons, 512 features
>>> redundancy = redundancy_score(cones)
>>> redundancy.shape
torch.Size([])