Content Scanning: Custom Regular Expression & Keyword Search



Content scanning in BetterCloud enables you to locate, alert on, and automatically remediate files that contain sensitive data across your SaaS Integrations. You can find a guide on setting up Content Scanning in BetterCloud by referring to our Content Scanning article here.

This includes information such as: 

  • How to build content scans
  • Supported file types
  • Supported integrations
  • Auditing results
  • Triggering workflows off of Go-forward policies
  • Taking action on files that trigger a violation
  • Important other information and requirements

Please note Content Scanning is only available for customers on the following versions of BetterCloud (see Understanding BetterCloud Versions for more information):

  • Secure
  • Pro (Legacy SKU)
  • Enterprise (Legacy SKU)

What is Regular Expression?

Regular expression (commonly referred to as "regex") is a string of characters that define a search pattern. This allows users to create their own custom expression to find the exact violation or criteria they specify. For more information about creating your own regular expression please refer to our "Creating your own Custom Regular Expression" help center article here.

Please note BetterCloud uses the RE2 Golang flavor of regular expression.

Custom Regular Expression and Keyword Search in BetterCloud:

Using custom regular expressions in BetterCloud allows you to define what sensitive data is outside of the 95+ pre-defined datasets that are already populated in BetterCloud.  Some examples may include datasets unique to your environment such as company computer names, customer numbers, employee numbers, purchase order numbers, etc.

A keyword search allows you to query for exact information such as a particular email address or a particular customer number. This type of search can be useful for GDPR or CCPA requests.

Navigating to Custom Regular Expression and Keyword Search in BetterCloud:

You can utilize custom regular expression and keyword search in BetterCloud three ways:

Navigating to Custom Regular Expression in a File Audit:

To get started with a File Audit, navigate to Files > Scans and click “New Scan” in the top right corner: 


A new window will appear where you can start building your File Audit. On the first page of the audit, you will be presented with file-specific criteria to help narrow the files your Audit will target. 


Criteria you can sort by on in this page include:

  1. File Sharing Settings: allows you to target files with specific sharing settings (ie: Public, External, and Internal)
  2. Integrations: allows you to target all Integrations or a specific Integration 
  3. File Owner: allows you to target all Files Owned by a specific User
  4. Shared With: allows you to target all Files Shared With a specific User

Please Note:

  • the “File Owner” and “Shared With” fields are wildcard fields, meaning BetterCloud will search for partial matches instead of exact matches for the content you enter (ie: entering "" in the "Shared with" field will find all documents shared with addresses) 
  • scans targeting over 500,000 files may take a significant amount of time to complete

After all information is populated click "Next: Data Selection" in the bottom right-hand corner. Jump here to learn how to populate custom regular expressions and/or a keyword search.


Navigating to Select Scan/Targeted Scan:

To get started with a Select Scan, navigate to the Files grid by going to Files > Browse. In this grid, you can do some initial filtering using the column filters to search for the files you want to target. Once you’ve located the files you want to scan, select the box next to them: 


Next, select the Actions menu in the top right corner > choose “BetterCloud” to filter by BetterCloud specific Actions > and select “Scan Content”:


The "Data Selection" screen will appear. Jump here to learn how to populate custom regular expressions and/or a keyword search.


Navigating to Go Forward Policies and Alerts:

There are many different alerts in BetterCloud that can utilize content scanning. Some examples of those alerts are:

  • Sensitive Data Scanned Alerts for Google, Dropbox, Box, Slack, and Office 365.
  • Exposure Focused alerts such as Files Shared Externally, Files Shared Publicly, etc. 
  • For a full list of those alerts please refer to our "Content Scanning" help center article here.

To get started with Go-Forward Policies and Alerts navigate to Alerts > Manage > find one of the content scanning alerts. From there you can choose to add content scanning to the alert. See "Adding Custom Regular Expression" below to learn how to populate custom regular expressions and/or a keyword search.


Adding Custom Regular Expression:

To create a new custom regular expression, select "Custom Data" > "Regular Expressions" > "Create New"


On the next screen, you will be able to create and enter your own regular expression. You have the ability to save the regex you created for future use.


  1. Custom Regex: Where you can create your own custom regular expression. Currently, there is a 400 character limit on the length of your regular expression.
  2. Name your Regex: The name of the regular expression. **Do not store sensitive information or financial data in the name of your regular expression for security purposes** 
  3. Save for future Scans: Toggling the “Save for future scans” checkbox allows you to save the regular expression to be used in future content scans. We allow you to save up to 15 different regular expressions. 

Adding Keyword Search:

To add a keyword to your content scan select "Custom Data"  > "Keywords"


A new screen will appear that will allow you to enter in the details of your keyword search:


  1. Keyword list: Where you can enter keywords you wish to scan for. You can separate different keywords with a comma. Keyword lists are NOT case sensitive. Currently, you can enter up to 60KB of text, which is roughly equal to 2,000 different email addresses.
  2. Name your Keyword List: The name of your keyword list. **Do not store sensitive information or financial data in the name of your keyword list for security purposes** As an example of naming convention, use 123-45-6789 in the Keyword list, while naming your Keyword list use "Clark Kent SSN". This still makes your search easily identifiable, while keeping sensitive information secure.

Data Selection:

You can utilize a combination of both pre-defined data sets, as well as custom data sets in your content scan. Currently, you can include up to 15 custom regular expressions, 1 keyword list, and any additional pre-defined data sets in one scan. Please note that the more datasets you include in your scan, the longer the scan will take. 


In the above example screenshot, we are including two custom regular expressions in our scan "Class C IP Addresses", as well as "Purchase Order Numbers". We have also included a keyword list search for "Clark Kent SSN". Finally, we have also selected two pre-defined data sets of "U.S. Bank Routing #" and "Credit Card".

1. This is where you can select the different saved custom regular expressions and/or pre-defined data sets to include in your scans.

2. This is where you can see the different data sets that you have already selected for this scan.


The "Summary" page breaks down the settings/details of the scan you have crafted so far including:



  1. Scan Name: The name of the scan
  2. File Selected: The number of files selected
  3. File Sharing Settings: The sharing settings of the selected files
  4. File Filters: The integrations, owners, and "shared with" details of the scan
  5. Data Selected: The names of the data sets you have selected for this scan

Auditing results:

File Audit and Select Scan/Targeted Scan:

You can see the results of your File Audit and Select Scan/Targeted scan by navigating to > Files > Scans. There are two sections one for "In-progress scans" and one for "Completed" scans. You can audit the results of a scan after it has completed as well as when it is in progress. You can find a guide on auditing results for content scans here.

The names of your Keyword Lists and Custom Regular Expressions will show if any violations have matched your data set: 


**When viewing violations of a content scan please be aware that we mask 50% of the violation for security purposes.**

Go Forward Policies:

Once your Go Forward Policy Alert has triggered, it will display on the Triggered Alerts page, under Alerts > Triggered from the left nav.



Best Practices/Acceptable Use:

The amount of time it takes for a scan to finish will increase depending on the number of scans running,, the number of files targeted in each scan, and the number of datasets you include in your scan. Note that scans with over 500,000 documents will take a significant amount of time to finish. It's best practice to either run content scans in smaller batches or for larger scans targeting more than 500,000 files, it's best practice to run these scans up to once per week.

Important information and requirements:

  • Content Scanning is only available for customers Secure, Pro (legacy SKU), and Enterprise (legacy SKU) versions of BetterCloud.
  • In order to perform content scanning, BetterCloud downloads each file that is subject to be scanned. You will see these download operations appear in the different provider audit logs.
  • Scans are only performed on documents that have not been edited for at least 5 minutes.
  • You cannot delete a completed scan.
  • A "View File" link will only show if certain sharing setting requirements are met
  • BetterCloud may be unable to scan Files for various reasons (Here are some common messages/reasons you may see.)




Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request