How does the segmentation section work?
Segmentation allows you to group, understand, monitor and track the behavior and transactions of your risk factors through statistical modeling.
With the new Segmentation section, you can strengthen your organization’s anti-money laundering (AML) efforts by grouping and analyzing the characteristics of your risk factors (customers, counterparties, channels, jurisdictions, and products). This enables you to monitor and track their behavior and transactions to ensure they are not involved in suspicious activities related to money laundering, terrorist financing, or the proliferation of weapons of mass destruction.
The information entered into the Segmentation section is supported by a dedicated infrastructure that stores the data independently and ensures it receives the level of protection required given its sensitive nature.
→ Please note that to access this module, you must have the AML+ Management System under the Enterprise plan ⚡
How do you create a segmentation?
To create a segmentation for your customers and counterparties, you must be in the AML (Anti-Money Laundering) Management System. Go to the left-hand sidebar, select the “Segmentation” section, and click on “Create Segmentation.”

Next, you must enter a name to clearly identify and reference the segmentation you are about to create. You must also select the risk factor to be segmented: customers, counterparties, channels, jurisdictions, or products.

After selecting the risk factor, the section “Dataset Partitions” will appear. These partitions help you distinguish between data subsets, making the segmentation process more efficient and accurate.
For example, the variable Type of Person divides the segmentation into two datasets: natural persons (individuals) and legal entities (corporate entities), since their behavior patterns differ.
In addition to the Type of Person variable, you may select only one additional partition variable. This must be one of the variables included in the risk factor form you are segmenting — in this example, the customer form. To add this variable, click the “+” icon located to the right of the Type of Person variable.

A dropdown field will then appear, allowing you to select a variable from those available in the form of the risk factor you previously chose. Once you select the variable, the system will display, alongside it, the data subsets that will be created based on that partition.
Finally, click on “Create and Generate Models.”

The tool will now display the datasets created from the selected partitions, showing all possible combinations.
In our example, four datasets appear because we selected two partition variables. The resulting combinations are: natural–active, legal–active, natural–inactive, and legal–inactive.

How do I manage multiple models within a segmentation?
From the “Actions” column, you can select:
View Models
A segmentation may contain more than one model. For example:
- Natural persons (individuals)
- Legal entities (corporate entities)
- Different customer or counterparty typologies

Data mining now begins through the CRISP-DM methodology for each dataset. Additionally, you must select each dataset individually to carry out the segmentation process.
How do I perform the segmentation process?
To start the segmentation process, you must first have the datasets created through data partitioning. Then, click on one of the datasets to begin processing the segmentation.
The segmentation process consists of five steps, which we will explain below.
How do I select the variables for segmentation?
In this first stage, you must choose the variables that will feed the model.
The system will display:
- A list of available variables
- Previously selected variables
- A segmentation evaluation projection
- A Data Completeness Indicator per variable
How is data completeness calculated?
Data completeness measures the level of completeness of the uploaded data.
It indicates:
- How many records have been properly filled out
- What percentage of the information is complete
- How robust the data is for running the model
The higher the completeness level, the greater the statistical reliability of the segmentation.
This step allows you to validate the quality of the information before proceeding to the next stage.

How are the data prepared before running the model?
Once the variables have been selected, the system begins the statistical preparation phase.
During this stage, the following processes are performed automatically:
- Data cleaning and validation
- Adjustments and transformations
- Normalization (where applicable)
- Preliminary statistical calculations
The system displays each stage of the processing to ensure:
- Transparency in how the information is handled
- Clarity regarding how the model processes the data
- Proper documentation to support audits and regulatory compliance

How is the segmentation model selected?
Model selection combines automated analysis with an intelligent recommendation based on the information uploaded by the user.
Once the data preparation phase is complete, the system:
- Generates a preview of the uploaded information
- Presents a representative data sample
- Displays 2D or 3D distribution charts, depending on the selected variables
- Allows you to observe preliminary grouping behavior
How does Copilot intervene?
The processed information is sent to Copilot, which automatically analyzes:
- The typology of the variables
- The statistical distribution
- Data density and dispersion
- The overall structure of the information
- Potential natural segmentation patterns
Based on this analysis, Copilot suggests the most appropriate model among:
- Two-Step Clustering
- K-Means
- DBSCAN
The recommendation depends directly on the information uploaded by the user and on how the data behaves after preparation.
Why are there multiple models?
Each model performs better with different data structures:
- Some work best when groups are clearly defined.
- Others identify segments based on density.
- Others adapt better to combinations of numerical and categorical variables.
For this reason, the system does not impose a single model but instead recommends the most suitable one according to the actual behavior of the data.
Who makes the final decision?
Although Copilot provides a technical recommendation based on statistical analysis, the final decision always rests with the user, ensuring methodological control and alignment with the objectives of the segmentation process.


How are hyperparameters configured?
After selecting the modeling technique, the system allows you to define or validate the hyperparameters necessary to optimize the segmentation.
Hyperparameters are configuration values that directly influence the model’s behavior and the way segments are generated.
How are hyperparameters determined according to the model?
K-Means and Two-Step Clustering
For these models, the main hyperparameter is the number of segments, which can be determined using the following methods:
Elbow Method
- Calculates the average distance of objects to their centroid.
- Evaluates how close the elements are within the same group.
- Plots internal variation as the number of segments increases.
- The inflection point where the curve stops dropping sharply is called the “elbow” and suggests the optimal number of segments.
Silhouette Method
- Measures how similar elements are within their own group compared to other groups.
- Helps validate whether the selected number of segments is statistically consistent.
DBSCAN
In this model, the number of segments is not predefined; instead, clusters are identified based on density.
K-Distance Method
- Helps estimate the optimal value of eps (neighborhood radius).
- Analyzes the distance of each point to its nearest neighbors.
- The point where a significant change in the slope of the graph occurs indicates a suggested value for eps.
Density Parameters
- eps (epsilon): Maximum distance between points to be considered neighbors. Controls the proximity required to form a cluster.
- min_samples: Minimum number of points within the eps radius required to consider a group as a real cluster.
These parameters determine:
- The detection of density-based groups
- The separation of points considered noise
- The granularity of the segmentation

How are segmentation results visualized?
Once the model has been executed, you can view:
- Distribution of the generated segments
- Graphical representation of the clusters
- Model quality indicators
- Technical execution details
You also have access to:
- Model used
- Selected variables
- Analyzed customers, transactions, and products
- Evaluated period
- Downloadable resources
- Segmentation PDF (when available)
How are the quality indicators interpreted?
Quality indicators are generated automatically once segmentation is complete.
Silhouette Indicator
- Measures internal cohesion within clusters
- Evaluates how similar elements are within the same cluster
- Includes a traffic-light system and equivalence table
Interpretation:
- Low value → Weak segmentation
- Moderate value → Acceptable if supported by other indicators
- High value → Better comparative quality
Davies–Bouldin Indicator
- Measures separation between groups
- Evaluates how distinct clusters are from one another
- Includes a traffic-light system
Interpretation:
- High value → Weak segmentation
- Medium value → Acceptable
- Low value → Better differentiation
Note: For Davies–Bouldin, lower values are better.
Calinski–Harabasz Indicator
- Assesses the overall balance between homogeneity and heterogeneity
- Higher values are better
- Should always be compared with another execution; cannot be interpreted in isolation
This indicator requires at least a second execution to determine which configuration is superior.
How do I decide which model is better?
You should compare the indicators across different executions. General rules:
- Silhouette → Higher is better
- Davies–Bouldin → Lower is better
- Calinski–Harabasz → Higher is better (comparative)
The best segmentation will be the one that:
- Shows consistency across indicators
- Is statistically robust
- Is interpretatively useful for business decisions
Available options
- New Execution → Starts a new run
- Resume Execution → Allows adjustments to a previous run
Each model can be executed independently.
