How to create a custom Content Control List for US Medical Record Number detection

  • Article ID: 112192
  • Rating:
  • 3 customers rated this article 4.0 out of 6
  • Updated: 27 Apr 2016


The HIPAA (Health Insurance Portability and Accountability Act) and HITECH (Health Information Technology for Economic and Clinical Heath) Act require companies holding electronic protected health information (EPHI) to secure it.

This knowledge base article is designed to guide you through the process of creating a custom Content Control List for the MRN format used in your health organization. Please note that Sophos does not provide technical support for the creation of custom Content Control Lists.


Sophos provides a set of Content Control Lists (CCLs) that allow administrators to track/block the movement of EPHI. Relevant CCLs have been tagged with the "HIPAA" identifier and are provided within the following products:

  • Endpoint Security and Data Protection
  • Email Security and Data Protection
  • PureMessage
  • SG UTM
  • XG Firewall

A Medical Record Number (MRN) is classified as EPHI so it is important to be able to identify the use of the MRN within outbound documents and email communications. Typically, a HIPAA data control rule would be configured to identify an MRN alongside other EPHI e.g. a Social Security Number or National Health Provider Identifier.

Across the US there is not a widely adopted standard for Medical Record Numbers. There are likely to be hundreds of MRN formats in use, so Sophos is unable to provide a universal CCL for Medical Record Number detection.

Example Medical Record Number identification CCL

This simplified example wildd detect on a document that contain a MRN qualifying phrase (i.e. "MRN" or "Medical Record Number") and ten instances of a six digit number (e.g. "123456", "234567", "345678" etc.).

<?xml version="1.0" encoding="utf-16"?>
<contentConditions xmlns:xsi="" xmlns:xsd="" xmlns="">

<!-- The triggerWeight equals "11", this means that the CCL requires a phrase match (scores "1") plus at ten numeric matches (scoring "1" each) -->
<contentCondition triggerWeight="11" comment="An example custom CCL for identifying data tagged with a Medical Record Number." name="Medical Record Number">


<!-- 'count' is equal to "1" as we only care that a MRN phrase exists somewhere, how many times does not matter -->
<expression value="&quot;Medical Record Number&quot; OR MRN" count="1" weight="1" nearDistance="50" />


<!-- 'count' & 'weight' are set so that this expression can not reach the triggerWeight of "11" by itself -->
<expression value="\b\d{6}\b" count="10" weight="1" />




Changing both the 'triggerWeight' and the 'count' for the numeric test will adjust how many numeric matches are needed be fore the CCL rule causes an action to occur.

NOTE: If you create a new custom CCL XML file, it must be saved with a Unicode UTF16 Little Endian file encoding format. Otherwise it may be inject by the import

How to create custom Content Control Lists (CCLs) using the Sophos template

SophosLabs have created a more comprehensive template custom CCL for MRN detection which is available below for download. The goal for this custom CCL is to identify a number/code which can match the MRN format(s) used in your organization. It may be necessary to also identify additional qualifying terms such as "MRN" or "Medical Record Number" to extend coverage whilst still reducing the risk of false positives (numbers being identified as an MRN when they are not).

  • Click here to download XML file sample 

    Note: For download purposes this file is saved in .zip format. Unzip the download and open the sample file "MRN.xml" and the sample test data file "MRN_test_data.txt" contained within it. You should use the XML contained in this file, do NOT copy and paste the XML from this knowledgebase article as it might not import correctly due to file system encoding differences.

How to customize the Medical Record Number XML file

The downloadable example contains additional options and comments. To customize the MRN XML file you will need to:-

  • Enable the most appropriate expressions by uncommenting.
  • Make any required alteration needed to the actual regular expression code with within the expressions to precisely match the MRN number format that you use.
  • Remove or comment-out expressions that are not appropriate.
  • Adjust expression 'count's and the 'triggerWeight' to suit.

How to uncomment sample expressions so that they are enabled:-

  • Before (commented): <!--ExpX <expression value=”expression” count=”X” weight=”X” /> -->
  • After (uncommented): <expression value=”expression” count=”X” weight=”X” />

The following steps will guide you through the process of customizing the XML:

  1. Open the supplied XML file (see link above) in a text editor (e.g. notepad).

  2. You can change the name of the CCL (label: name) and the description (label: comment) by editing the underlined text:

    <contentCondition triggerWeight="2" comment="An example custom CCL for identifying data tagged with a Medical Record Number." name="Medical Record Number">

  3. Medical Record Numbers are often shown alongside a text label, such as Medical Record Number or MRN. We recommend that you leave the following expression uncommented to make use of these qualifying terms. "<expression value="&quot;Medical Record Number&quot; OR MRN" count="1" weight="100" nearDistance="50" />" This will significantly reduce the risk of false positives.

    'simpleExpression' terms are case insensitive so the term medical record number will match Medical Record Number and Medical record number.

  4. You can add additional qualifying terms alongside those already supplied. This can be done by adding an extra OR operand within the expression. To match an exact phrase enclose the phrase using &quot;. For example OR &quot;Acme Medical Record&quot; will match Acme Medical Record.

  5. As supplied the CCL requires a match from qualifying phrase expression and one match from the enabled number format expressions (e.g. "MRN something or other 123456").
    • Just uncomment to enable multiple formats of MRN number.
    • To increase the number of numeric matches before the CCL fires:-
      • increase each numeric expression's 'count' value to required amount
      • increase the 'triggerWeight' value by the same amount. (e.g. numeric 'count' = "5" and 'triggerWeight' = "105")
    • To detect just on numeric matches without a qualifying phrase then
      • delete or comment-out the qualifying phrase expression
      • adjust the 'triggerWeight' from "101" to the number of numeric matches required. (e.g. a trigger weight of just "1" will detect a single numeric match, but this is likely to have a high risk of False Positive detections)

  6. Once you have finished editing the CCL it can be saved. Ensure that you save the file with a .xml file extension rather .txt.

  7. You can verify the structure of the edited XML file by reviewing it using a web browser.
    Note: in Firefox you may see the following message which can be ignored: 'This XML file does not appear to have any style information associated with it. The document tree is shown below.'

  8. Once you have reviewed the changes made you can import the XML file into either Sophos Enterprise Console or the Email Security Appliance web interface:
    • Sophos Enterprise Console: Open the Tools | Manage data control | Data Control Content Control Lists | Import. The CCL Import dialog box is displayed.
    • Email Security Appliance: Under Content control lists, click Import. The CCL Import dialog box is displayed.

For additional information on testing Content Control Lists please refer to:

Addendum - Regular Expression primer

Value Description
\b Matches a word boundary (space, comma, period etc.)
\d Matches any single digit.
\d{3} Matches 3 digits.
[ -] Matches either a space or a hyphen (always ensure the hyphen is the last thing in these brackets.
? Matches zero or one of the thing before it in the case of [ -]? this means zero or one of either a space or a hyphen.
[a-z] Matches all lowercase alphabetic characters between a and z.
[a-zA-Z] Matches all alphabetic characters.
[A-HK-NP-Z] Matches all uppercase alphabetic characters except I, J and O (which could be mistaken for numbers).
(?:...) Is a non-capturing grouping of an expression.

For more detail refer to this external web page:

If you need more information or guidance, then please contact technical support.

Rate this article

Very poor Excellent