How does the segmentation section work?

With the new Segmentation section, you can strengthen your organization’s anti-money laundering (AML) efforts by grouping and analyzing the characteristics of your risk factors (customers, counterparties, channels, jurisdictions, and products). This enables you to monitor and track their behavior and transactions to ensure they are not involved in suspicious activities related to money laundering, terrorist financing, or the proliferation of weapons of mass destruction.

The information entered into the Segmentation section is supported by a dedicated infrastructure that stores the data independently and ensures it receives the level of protection required given its sensitive nature.

→ Please note that to access this module, you must have the AML+ Management System under the Enterprise plan ⚡

How do you create a segmentation?

To create a segmentation for your customers and counterparties, you must be in the AML (Anti-Money Laundering) Management System. Go to the left-hand sidebar, select the “Segmentation” section, and click on “Create Segmentation.”

Next, you must enter a name to clearly identify and reference the segmentation you are about to create. You must also select the risk factor to be segmented: customers, counterparties, channels, jurisdictions, or products.

After selecting the risk factor, the section “Dataset Partitions” will appear. These partitions help you distinguish between data subsets, making the segmentation process more efficient and accurate.

For example, the variable Type of Person divides the segmentation into two datasets: natural persons (individuals) and legal entities (corporate entities), since their behavior patterns differ.

In addition to the Type of Person variable, you may select only one additional partition variable. This must be one of the variables included in the risk factor form you are segmenting — in this example, the customer form. To add this variable, click the “+” icon located to the right of the Type of Person variable.

A dropdown field will then appear, allowing you to select a variable from those available in the form of the risk factor you previously chose. Once you select the variable, the system will display, alongside it, the data subsets that will be created based on that partition.

Finally, click on “Create and Generate Models.”

The tool will now display the datasets created from the selected partitions, showing all possible combinations.

In our example, four datasets appear because we selected two partition variables. The resulting combinations are: natural–active, legal–active, natural–inactive, and legal–inactive.

How do I manage multiple models within a segmentation?

From the “Actions” column, you can select:

View Models

A segmentation may contain more than one model. For example:

Natural persons (individuals)
Legal entities (corporate entities)
Different customer or counterparty typologies

Data mining now begins through the CRISP-DM methodology for each dataset. Additionally, you must select each dataset individually to carry out the segmentation process.

How do I perform the segmentation process?

To start the segmentation process, you must first have the datasets created through data partitioning. Then, click on one of the datasets to begin processing the segmentation.

The segmentation process consists of five steps, which we will explain below.

How do I select the variables for segmentation?

In this first stage, you must choose the variables that will feed the model.

The system will display:

A list of available variables
Previously selected variables
A segmentation evaluation projection
A Data Completeness Indicator per variable

How is data completeness calculated?

Data completeness measures the level of completeness of the uploaded data.

It indicates:

How many records have been properly filled out
What percentage of the information is complete
How robust the data is for running the model

The higher the completeness level, the greater the statistical reliability of the segmentation.

This step allows you to validate the quality of the information before proceeding to the next stage.

How are the data prepared before running the model?

Once the variables have been selected, the system begins the statistical preparation phase.

During this stage, the following processes are performed automatically:

Data cleaning and validation
Adjustments and transformations
Normalization (where applicable)
Preliminary statistical calculations

The system displays each stage of the processing to ensure:

Transparency in how the information is handled
Clarity regarding how the model processes the data
Proper documentation to support audits and regulatory compliance

How is the segmentation model selected?

Model selection combines automated analysis with an intelligent recommendation based on the information uploaded by the user.

Once the data preparation phase is complete, the system:

Generates a preview of the uploaded information
Presents a representative data sample
Displays 2D or 3D distribution charts, depending on the selected variables
Allows you to observe preliminary grouping behavior

How does Copilot intervene?

The processed information is sent to Copilot, which automatically analyzes:

The typology of the variables
The statistical distribution
Data density and dispersion
The overall structure of the information
Potential natural segmentation patterns

Based on this analysis, Copilot suggests the most appropriate model among:

Two-Step Clustering
K-Means
DBSCAN

The recommendation depends directly on the information uploaded by the user and on how the data behaves after preparation.

Why are there multiple models?

Each model performs better with different data structures:

Some work best when groups are clearly defined.
Others identify segments based on density.
Others adapt better to combinations of numerical and categorical variables.

For this reason, the system does not impose a single model but instead recommends the most suitable one according to the actual behavior of the data.

Who makes the final decision?

Although Copilot provides a technical recommendation based on statistical analysis, the final decision always rests with the user, ensuring methodological control and alignment with the objectives of the segmentation process.

How are hyperparameters configured?

After selecting the modeling technique, the system allows you to define or validate the hyperparameters necessary to optimize the segmentation.

Hyperparameters are configuration values that directly influence the model’s behavior and the way segments are generated.

How are hyperparameters determined according to the model?

K-Means and Two-Step Clustering

For these models, the main hyperparameter is the number of segments, which can be determined using the following methods:

Elbow Method

Calculates the average distance of objects to their centroid.
Evaluates how close the elements are within the same group.
Plots internal variation as the number of segments increases.
The inflection point where the curve stops dropping sharply is called the “elbow” and suggests the optimal number of segments.

Silhouette Method

Measures how similar elements are within their own group compared to other groups.
Helps validate whether the selected number of segments is statistically consistent.

DBSCAN

In this model, the number of segments is not predefined; instead, clusters are identified based on density.

K-Distance Method

Helps estimate the optimal value of eps (neighborhood radius).
Analyzes the distance of each point to its nearest neighbors.
The point where a significant change in the slope of the graph occurs indicates a suggested value for eps.

Density Parameters

eps (epsilon): Maximum distance between points to be considered neighbors. Controls the proximity required to form a cluster.
min_samples: Minimum number of points within the eps radius required to consider a group as a real cluster.

These parameters determine:

The detection of density-based groups
The separation of points considered noise
The granularity of the segmentation

How are segmentation results visualized?

Once the model has been executed, you can view:

Distribution of the generated segments
Graphical representation of the clusters
Model quality indicators
Technical execution details

You also have access to:

Model used
Selected variables
Analyzed customers, transactions, and products
Evaluated period
Downloadable resources
Segmentation PDF (when available)

How are the quality indicators interpreted?

Quality indicators are generated automatically once segmentation is complete.

Silhouette Indicator

Measures internal cohesion within clusters
Evaluates how similar elements are within the same cluster
Includes a traffic-light system and equivalence table

Interpretation:

Low value → Weak segmentation
Moderate value → Acceptable if supported by other indicators
High value → Better comparative quality

Davies–Bouldin Indicator

Measures separation between groups
Evaluates how distinct clusters are from one another
Includes a traffic-light system