How to auto categorise and classify documents with SharePoint Syntex

By Leslie Anjan, Prometix

Many organisations are facing challenges in setting metadata and applying properties to content stored in SharePoint. At Prometix we experienced this issue recently when delivering a large document management rollout project.

Educating end users to apply metadata or taxonomy was difficult, and there were many questions asked on how to automate the metadata tagging process. Unfortunately, there wasn’t any technology available at that time to solve this easily.

Fast forward, and things have changed now as Microsoft’s SharePoint Syntex addresses this exact issue.

SharePoint Syntex, the first product to come out of Microsoft’s larger Project Cortex, is now available under general release. Syntex takes advantage of the advanced AI and machine learning coming out of Project Cortex to automatically categorise and classify documents based on models set up by the user.

Using these models, Syntex can extract specific data and apply it as metadata to documents, as well as applying Sensitivity and Retention labels for information protection.

SharePoint Syntex lets users create two types of model: Form Processing and Document Understanding. The difference between the two is that Form Processing extracts values from a structured form, while Document Understanding is trained to pick out information from an unstructured document.

To implement a Document Understanding model, you need to provision a Content Center. A Content Center is a SharePoint site that’s used to create and store the different document models, as well as to apply those models to your Document Libraries.

Graphical user interface, applicationDescription automatically generated

When you have identified a document type you wish to model – such as invoices, receipts, board papers of purchase orders – you can create the new model in the Content Center. On creation, the default action is to create a new Content Type; however, you can change this to use an existing Content Type if you have one already set up in your Content Type Hub.

Training the model to understand the patterns

After creating the model, you need to train it by adding in some example files. Syntex only requires five positive examples and one negative example, but more is always better if possible. Once the example files have been uploaded, explanations need to be created alongside them to train the model. The model can then be tested within the Content Center.

You can optionally create Extractors. Extractors provide the ability to draw specific information out of a document – such as a key date, a client name, or a contract value. The information an extractor can pull out of a document can then be mapped to a metadata field on the document.

The final part of a Document Understanding model is the ability to apply a Sensitivity and/or Retention label to the type of document. This ensures that the different document types that are being processed by Syntex can conform to organisational or regulatory requirements for being retained due to the nature of their data, as well as being marked for sensitivity.

Form processing made easy

If your organisation stores documents that are in structured format (like forms) – such as Purchase Orders – specific key/value pairs from these forms can be extracted and used to populate metadata fields for your documents. Setting this type of model up is done at the Document Library level using Power Automate’s AI Builder functionality.

As with Document Understanding models, you must upload sample documents in order to train the model. Once analysed, it’s simply a case of selecting the fields to be extracted, and ensuring they have been given a suitable name.

The automated categorising and tagging of documents can become very powerful when coupled with other tools within Microsoft 365, such as Power Automate or Power BI, but the value of these tools can only come with engaging with different areas of your business to identify the use cases.

The administration and governance of the functionality will also need to be considered before rolling out at scale. Document Understanding models are added to SharePoint Document Libraries from the Content Center; however, they can only be applied to libraries the acting user can access, so there is the option to devolve the running of this to specified users within your business units. There is also the option to provision more than one Content Center for further separation if this is required.

Every organisation will have its own requirements when coming to implement the functionality made available by SharePoint Syntex. Taking the time to consider how it will be used at the start of the journey will most likely help save some headaches further down the road.

Prometix as a Microsoft Gold certified partner and certified O365 consultants (Sydney, Canberra, Melbourne & Perth), we have extensive experience in delivering Microsoft 365 based solutions. If you need any assistance or demo with SharePoint Syntex then please contact us from enquiries@prometix.com.au.

Leslie Anjan is a Solutions Architect in Office365/Azure with solution provider Prometix

Business Solution

Business Process & Workflow

Document & Records Management

Enterprise Content Management