# Classification et extraction

## Overview

In the **Classification and Extraction** settings, you can:

* Enable **Document Splitting** based on QR codes
* Configure **amount formatting**
* Set up **table extraction**
* Toggle processing of unsupported **ZUGFeRD** files
* Define special classification rules
* Monitor Custom-Trained **AI Models** used in the classification process

This page provides a detailed explanation of all available settings.

## **Accessing Classification and Extraction Settings**

To access the **Classification and Extraction** settings, go to:\
**Settings → Document Processing → Classification and Extraction**

## Document Splitting

In the **Document Splitting** section, you can configure whether an uploaded document should be split into multiple documents whenever a **barcode** appears on one of its pages.

To activate this feature:

1. Go to the **Document Splitting** section.
2. Open the dropdown menu.
3. Select **Split by Barcode/QR Code**.

You will then have the option to:

* Select one or more barcode types to be detected.
* Specify a regex pattern that the barcode must match in order to trigger document splitting.

## Amount Formatting

In the **Amount Formatting** section, you have two options:

* **Allow Rounding During Amount Comparison:**\
  If enabled, a tolerance of ±0.5 is allowed during amount comparison.\
  If disabled, a default tolerance of ±0.05 applies.
* **Require Exact Match for Amount Comparison:**\
  If enabled, amounts must match exactly with zero tolerance.\
  If disabled, a tolerance of ±0.05 is allowed.

<mark style="color:red;">**Note**</mark>: Only one of these settings can be active at a time.

## Table Extraction

You can extract tables from documents by enabling either **Table Extraction** or **AI Table Extraction**. A trained table—whether AI-based or manual—will always be linked to a specific supplier.

**Table Extraction:** Activates manual **table extraction**. Tables must be trained manually.\
Learn more about manual training [here](https://docs-fr.docbits.com/administration-and-setup/setup/document-training/training-line-fields-table-training/defining-tables-and-columns).

**AI Table Extraction:** Uses AI to automatically extract tables. If the results are not accurate enough, it's recommended to switch to manual **Table Extraction** for better control and training.

**Table Extraction for Costing Element:** When enabled, DocBits can extract costing elements from tables at the line level and classify them accordingly.\
Detailed explanation available [here](https://docs-fr.docbits.com/administration-and-setup/settings/document-processing/classification-and-extraction/table-extraction-for-costing-element).

**Auto Extract Tax Code:** When enabled, the system automatically fills the **Tax Code** field on the Validation Screen—provided that a tax code field is configured.\
More information on this setting [here](https://docs-fr.docbits.com/administration-and-setup/settings/document-processing/classification-and-extraction/auto-extract-tax-code).

**AI Model:** Allows you to specify which **AI model** is used for table extraction.\
You’ll also see a table showing:

* Which **suppliers** are using which AI model
* Whether they use E-Text
* Options to delete an entry or reset the training data

This setting is explained in detail [here](https://docs-fr.docbits.com/administration-and-setup/settings/document-processing/classification-and-extraction/ai-model).

## Electronic Document

**Process Unsupported ZUGFeRD PDF:** If enabled, unsupported **ZUGFeRD** versions will be processed as standard PDFs, and the embedded XML will be ignored.

The list of supported **ZUGFeRD** versions can be found [here](https://github.com/Fellow-Consulting-AG/docbits/blob/fr/readme/administration-and-setup/settings/global-settings/document-types/edi/zugferd-1.0-2.1-and-2.3.md).

## **Classification Rules**

In the **Classification Rules** section, you can define specific **regex** patterns and criteria to help the system automatically classify documents during processing.

To access this section, click the **Classification Rules** tab at the top of the page.

### **Add a New Classification Rule**

To create a new rule:

1. Click **Add** in the top-right corner.
2. Fill in the following fields:
   * **Pattern**: The regex pattern the system should search for to trigger classification.
   * **Type**: Where the pattern should be searched (e.g., **Barcode**).
   * **Sub Organization** *(optional)*: Specify which sub organization the rule applies to.
   * **Document Type**: Define the document type to assign when the pattern is matched.
   * **Sub Document Type** *(optional)*: Specify a sub type for more detailed classification.
3. Click **Save** to save your classification rule.

### **Edit a Classification Rule**

To edit an existing rule:

1. Click the three dots in the **Actions** column.
2. Select **Edit**.
3. Make your desired changes.
4. Click **Save** to apply the updates.

### **Delete a Classification Rule**

To delete a rule:

1. Click the three dots in the **Actions** column.
2. Select **Delete**.

## AI Models

The **AI Models** section displays all custom-trained models that have been specifically fine-tuned for your needs.

### Accessing the AI Models Section

To open this section, click the **AI Models** tab located at the top of the page.

### Model Categories

Models are organized into categories. Below each category name, the number of models it contains is shown.\
Click on a category to view its details.

At the top of the selected category page, you’ll see key information about each model:

* **Type**: The type of model.
* **First Page Only**: Indicates whether the model processes only the first page of a document.
* **Version**: The version number of the model.

### Model Table

All models within a category are listed in a table, which includes the following information:

* **Name**: The name of the model.
* **Next Model**: The model that will further process the output of the current model.
* **Document Type**: The primary document type assigned by the model during classification.
* **Document Sub Types**: The sub types into which the document is further classified.
* **Priority**: The priority level that determines the model’s position in the classification queue.

### Editing a Model

To edit a model:

1. Click the pen icon in the **Actions** column next to the model you want to edit.
2. Update the available fields:
   * **Next Model**: Select the model that should process the output from the current model.
   * **Document Type**: Choose the document type the model should classify the input as.
3. Click **Save** to apply your changes.
