Quick Start for the Single Table Deduplicator

 

INDEX - Click the topic to go to the subject

 

***IMPORTANT NOTES ABOUT MERGING***

General Duping Guidelines:

1. Run dupes in the proper order. Ensures most information retained during merging with the least amount of time and effort expended by the CRM Admin.

Recommended Order: Account; Contact; Lead; Lead to Contact; Lead to Account; Opportunity; Custom Objects

2. Use a 3 or more pass strategy for identifying dupes

-Start with very rigid criteria
-Loosen criteria with each pass
-STD comes with pre-built scenarios for primary objects - These are just a starting point, NEED REVIEW AND MODIFICATION TO SPECIFIC CUSTOMER NEEDS.

3. Objects owned by inactive users

In PRE-1.7 versions of DemandTools, Contacts, Opportunities, Leads, and Cases would only be merged if they were owned by an Active user. For example, if an
opportunity within an Account that is being merged into a master Account was owned by an Inactive User, the Opportunity would NOT be merged to the master
account. You would need to first change the ownership of the object to an active user, and then run the merge.

In this release, we have added the ability to merge objects owned by inactive users, and KEEP THE OWNERSHIP HISTORY. In step#3 Merge Control, you can
select the “Keep Ownership” option under “Inactive Ownership Exceptions”. This will re-activate the inactive user just long enough to transfer the object
to the master, and then de-activate the user once again. YOU WILL NEED A SPARE SF LICENSE TO USE THIS OPTION. You also have the option to change the
ownership of these objects to the Master Owner, and/or the Current User.

Tasks and Events that are owned by inactive users will be moved in the deduplication process.

4. Archived completed activities

Completed Activities will only be moved if they are less than a year old. Salesforce.com will "Archive" activities older than a year and they are not
accessible via the API, and hence, the DemandTools.

You can request from salesforce.com that only activities older than 3 years be archived so you will be able to move tasks and events less than 3 years old.
You will need to contact salesforce.com technical support to have this done.

The process to do this as follows:
  1. Create a case salesforce.com with a subject of "unarchive activity request"
  2. In the body of the message specify:
    - the justification for this request.
    - how many years you'd like your activities to remain active (unarchived.)
    - if the previously archived records should be unarchived as well.
  3. The request must be made by a user with the System Administrator profile on the Account.

All requests for archive date changes are evaluated individually and implementation of the request is not guaranteed. Possible ramifications of changes to
the archive period include a possible performance impact on activity related reports and list views for the specific organization.

5. In order for objects that have to be cloned vs. reparented to retain the proper CreatedDate and CreatedByID field values, you need to create a case with salesforce.com to allow you to modify system fields.

There is a feature in salesforce.com called "Modifiable System Fields" and is only available when you are creating new records (cannot be used when updating existing records).

The process to do this is as follows:

1. Create a case with Salesforce.com with a subject of "Modifiable System Fields Request"
2. In the body of the message specify the justification for this request. Include as MUCH DETAIL as possible. (i.e. migrating from a legacy system and wanting to keep the integrity of the original data, merging data using DT ).
3. The request must be made by a user with the System Administrator profile on the Account.

All requests for are evaluated individually and implementation of the request is not guaranteed.

 

Overview

The Single Table Deduplicator allows extremely flexible viewing of potential duplicates within salesforce.com.   Standard scenarios for deduping different objects are already built and may be loaded, modified, and saved.   Additionally, custom scenarios may be built and saved.   Access the module from either the DemandTools Today screen using the menu in the left column or from the pull down menus across the top of the interface to be taken to the Single Table Deduplicators first wizard screen.

Screen 1 of the Single Table Deduplicator


Back to top

Step #1 - Select the object (Salesforce.com table) to deduplicate from the pull down list.   Click “Use Object”.

Step #2 - Select the fields that you would like to display when the Single Table Deduplicator shows the list of potential duplicates found by clicking the appropriate box next to the field.   You can also adjust the order in which you would like the fields displayed.

Step #3 - Decide whether you would like to search all records, or narrow down the list by creating custom conditions.   Customers with small databases will usually select to deduplicate all objects in the table. The ”Use conditions” button allows you to narrow the area of the database that you are deduplicating by developing dupe criteria by country, state, owner etc.

Step #4 - Press "Screen 2 - Mapping" to develop the criteria you will use to find potential duplicates.

  Screen 2 of the Single Table Deduplicator

Back to top

Step #1 - Select the "Add Mapping Condition" button to insert the blank mapping condition in the interface.

Step #2 - Select the field from the available "Fields to Match" list (all fields in the table you are deduplicating) that you wish to use as criteria for finding potential duplicates.

Step #3 - Using the combination of the "Match Category" and "Match Type" options (click in field to change), choose a mapping technique.

Step #4 - Select the corresponding/applicable mapping options you would also like to apply.

Step #5 - Go back to Step 1 to add additional mapping conditions (if required).

Step #6 - Select the "Screen 3 - Merge Control" button to search for potential duplicates and move to screen 3 to analyze and eventually process.

TIP: To remove a mapping condition, highlight the row by clicking on the box to the left of the Match Blank Values Column and hit Delete on your keyboard.

Screen 3 of the Single Table Deduplicator

Back to top

Step #1 - Work in the Grid:   View, analyze, edit, process, single record merge

Step #2 - Choose or design a master rule

The Single Table Deduplicator is packaged at initial installation to have default master rules ready for use for most of the standard salesforce.com objects that are typically chosen for deduplication.   These may include rules like oldest or newest record, most recently modified record, owned by, etc.   Custom objects and less frequently de-duped objects may require that even the simplest rules such as oldest record and newest record will have to be created by the end user.

Select a Master Rule

When using strict deduplication criteria (looking for records that are nearly for sure duplicates) you may decide that (after a quick scan maybe) that you want to process all the records in the grid view.  

In order to process duplicates in this mode you do not need to apply master rules to each duplicate set. Simply select the button "Check All" at the bottom of the grid and all the duplicate sets will be checked.     All records are now checked but no master servant selection has been pre-made.   Masters  will automatically be selected based on the currently selected master rule in the upper right hand of the interface when the end user selects "Merge Checked Objects" and the merge process will be completed.


Currently selected master rule

Back to top

 

Master Rules Creation and Editing of Master Rules

Any object can have a master rule written for it. When building a master rule you have the option to add unlimited conditions for analyzing either the record (as a whole) or the field values that the record contains.

At the record level, options such as oldest, newest, most recently modified and many other advanced constraints are available. For field values, the STD displays constraints appropriate for the data type of the field being analyzed.

Sample constraints on different field types
Rule Dialog Box

After the setup of the initial field and constraint you have the ability to score the positive incident of the constraint with an arbitrary point value. The default rules that ship with the software are based upon a base 100 scoring system but any system can be used. When adding multiple constraints the record is scored in cumulative fashion with each additional positive expression adding to the total score of the records.   Additionally negative scores can also be applied to penalize a record.


A complex rule with multiple expressions and a score delta set


The final component to building rules after determining the desired expressions is to decide to assign a score delta. The score delta will be used by the deduper to automatically determine master vs . servant records. The value of the score delta determines how many points away a master record has to score from the next nearest servant record in order to automatically be selected as the master.

If the score delta threshold is set and not met:

If no score delta is set:

Back to top

Methods for Applying Master Rules Against Duplicates

In order to apply rules on a record by record basis the end user can right click in the duplicate key set box. This will display the popup menu with the manual merge options on the left and the available master rules in the right hand column. Simply select the master rule to apply to the duplicate set and the records will be scored according to the rules scoring system.

If a score delta has been set, the master record will be automatically selected (pin turns green) for all the record s with the largest cumulative point score that surpass the next nearest servant by at least the delta point value.   If no records are determined to be master worthy, then the pins will remain gray (no master selected) but the score of the records will be shown in the interface.

As opposed to right clicking in each duplicate set the interface provides for two more automated processes for apply master rules to larger duplicate sets.

The first button "Rule to Checked" will find all records/record sets that have a check mark in the square left box but do not currently have a master vs . servant selection made to them. The records will be scored according to the selected Master Rule and assigned a master vs. servant if the score delta is set and surpassed by any one record.   As above, if no records are determined to be master worthy, then the pins will remain gray (no master selected).

The second button is the "Apply Rules" button which will process all the records in the duplicate grid (regardless of check status) and score and select all records.

Once the records you wish to merge are checked, and the master records determined (either by score delta or manual selection) select "Merge Checked Objects" to begin the merge process.

The records will be merged according to the merge options in Step 3 below.

 

Back to top

Step #3 -Select the merge control options

Update Fields where Master is Empty

Combine Field Values

Merge Objects

Reassign to Master Owner

Options

Non-Master Objects

 

Duplicate Campaign Exception

Inactive Ownership Exceptions

Reassignment

Additional log fields

has not yet been enabled in this version of the beta  

Merging of custom object sub-objects

Step #4 - Selecting to process records in batch mode


Executing the duplicates batch

     

Back to top


Developing and Executing Dedupe Scenarios

Through the development of this deduplication pass all the way from the selection of the object, the sub set of objects, match criteria and options, master rule and score delta through the merge options can now be saved in a new DemandTools object called the deduplication scenario. The dedupe scenario saves all setting except for specific record custom actions such as field edits, manual master vs . slave selections etc.

The scenario is given a name and a self generated description of what the scenario settings are. After the system self creates this file it is end user modifiable to allow for custom notations etc. All fields and value settings are automatically recorded in the description file to save time.


Save scenario dialogue box


Scenarios pre built or built by user being displayed in Screen 1

 

Load Scenario

Loads the scenario and all settings. User must manually select the next screen for each step.

Run Scenario

Selection of Run Scenario takes the user to the final Single Table Deduplication screen. Duplicates are displayed, master records are selected. User must execute the merge.

Delete Scenario

Deletes the source xml file and removes the file from the interface listing

Save Scenario

Used for newly created or modified scenarios. Scenarios are stored in the Demand Tools directory.

Clear Scenario

Lets you start over with either selecting another scenario or selecting an object to deduplicate

 

     

Back to top

Fully Automatic Deduplication

Schedule Automatic Non-Attended Dedupe Scenarios

Instructions for the scheduling of the Single Table Deduplication module are located here: Scheduling Events. The STD is fully capable of working within the Windows scheduler interface and will allow for times execution of scenarios as long as the host cpu has an active internet connection.

     

Back to top

Match Categories

Back to top

Mapping Types

Cleaned Account Name:   Uses the built in Account Name Cleaning List. To see the Cleaning list tool select the "Edit Account Cleaning List" button.   The cleaning list standardizes punctuation, spaces, removes common business prefixes and suffixes. These lists are customizable to your language(s) and/or line of business.

Country Match:   The country mapping type is used to standardize field values for the recognized countries of the world. It makes the long name, 2 digit ISO short form, 3 digit ISO short form and the numeric ISO country value all to appear to be matches of each other.

Domain:   The domain mapping type is used when mapping web pages and/or email addresses. It allows for the independent analysis of the domain information contained within the URL or the email address. For email addresses it uses any information to the right of the @ sign. For web pages it parses the XXXXX.com portion. This tool allows for easy comparison of web page field vs . web page field or email field vs . email field. It also by nature allows for the comparison of email addresses compared to web pages and vice versa.

Exact : The exact mapping type in the Single Table Deduplication tool is exactly that, a 100% match of every character (assuming no options apply).

FirstName:   Uses the built in Nickname List. To see the Nickname tool select the "Edit Nickname List" button at the top of the interface.

The Nickname list allows the deduplication tool to see Bill, William, Billy, etc. as potential duplicates of each other. This list is also customizable by the end user for localization or even in theory for non contact substitution on any field by replacing the nickname list with synonyms.

First XX Letters:   Compares only the first XX letters in a field. Text fields are the only applicable field type.   The user can select as many letters as they would like to compare.

Numeric:   Compares only the numeric values in a field.   Other characters that the field contains, such as spaces or punctuation, will be ignored and not seen by the deduper.   A field with a value of " Apt # 31" is seen to the deduper as only the numeric characters "31".   This is often used with phone number fields, so that (999) 555-1212 will match to 999-555-1212.   In this case the deduper will see this as 9995551212.

Relaxed Address Match:   Parses the street address to the lowest common denominator. Based on North American standards, it has also proved effective with most country address formats.

With relaxed address match the following addresses are all seen in the lowest common denominator of: 123 Pavillion:


Relaxed NA Phone Match: Removes all non-numeric characters and spaces. If the first is a 1 or 0 removes it. If just 7 digits are left use those seven digits, else just return digits 4 - 10. It will not match the "Phone-word" values and will trim off the "SPOT" in the phone number and only look at the numeric portion.

Salesforce.com ObjectID Match :   Used to match any Salesforce.com objects' 15 digit ID to its 18 digit equivalent ID and vice versa.

Street Address Match:   The street address match is a slightly more rigid criteria than the relaxed address match tool. It will ignore the differences in street type short forms such as crescent - cres, road - rd, street – st.

Zip 5 and 9 Match:   This mapping type will automatically match USPS 5 and 9 digit zip codes together without the need to standardize them first to a common number of digits.

Back to top

Mapping Options

Type Description Mapping Types
Phonetics engine capable of analyzing words for how they sound when pronounced. Through a technique of removing vowels and analyzing the remaining consonants the fuzzy engines works very well for matching fields with spelling mistakes. Cleaned Account Name
Exact
FirstName

The transpositional engine allows for fields to appear to be duplicates even if the have differences in their word order. For example Jones, Smith and Jackson will appear to be a duplicate of Jackson, Smith and Jones.

Cleaned Account Name
Exact
FirstName
Street

The alpha cleaner extends some of the capabilities of the account name cleaner to other fields for matching. The alpha cleaner is used when you know you only have ascii (north american) data and you would like to ensure that the only characters that are analyzed are the 26 characters of the english alphabet and the numbers 0-9. Any other character that the field may contain will be ignored and not seen by the deduplication matching algorithms.

Cleaned Account Name
Exact
FirstName
Numeric
Street
Zip 5 and 9


Back to top

Master Rule Options

Field

Constraint

Value

Record

  • Owned by
  • Not Owned by
  • Oldest
  • Newest
  • Most Recently Modified
  • Most Opportunities
  • Most Contacts
  • Equals
  • Not Equal to
  • Prepopulates with sfdc values

Field Name

  • Equals
  • Not Equal to
  • Starts With
  • Does Not Start With
  • Ends With
  • Does Not End With
  • Is Populated By
  • Not Populated By
  • Contains
  • Does Not Contain

This will vary based on the field selected.   If there is a pick list associated with the field the possible values will be shown in a pop-up menu.

 

Back to top

KNOWN LIMITATIONS

Object Area Description
Contacts SELF SERVICE USERS When a servant contact is merged into a master contact and the servant contains a self service user the SSU will be inactivated on the servant contact and will be recreated on the master contact. You will lose all login information for the SSU and will have to send the user a new password as this is not moved in the creation of the SSU.

Additionally, if the servant was also a "Super User" you will need to manually re-activate this status on the master contact.
Contracts ACTIVE CONTRACTS The SF.com API does not allow the re-parenting of "Active" contracts. Therefore when duping account records DemandTools CANNOT merge "active" contracts from slave accounts to the selected master account.

As a workaround we have added a master rule for "Most Active Contracts", so you can set the master based on active contracts. When you run your Single Table De-dupe on your accounts you will recieve errors if there are still contracts that could not be re-parented. In the log file, it will indicate which slave accounts had active contracts and that it was unable to delete these slave records. You can then select these using SF.com's merge feature and merge them one at a time.
Activities CREATED DATE In order for tasks and activities older than one year to be moved/merged, the customer must contact salesforce.com to 'open the window' for the API to functionally be able to do this. A 'work' around may be that the customer selects the oldest record as the master.
Contacts CAMPAIGN MEMBER STATUS DATES

When Deduping contacts and opting to use the "Duplicate Campaign Exception - Use Most Recent, if the most recent resord is on the slave, when it is cloned to the master, the "Last Modified Date" (which ias also the status date) will become the date of the merge. If the most recent record is on the master, we do not touch the record, so the original date stays intact.

This will happen EVEN If you have "Modify System Fields" turned on by salesforce.com. The only workaround is to create a custom object to track campaign history. There are some 3rd party vendors taht have developed software that does this for you. Search the SF AppExchange for a current list.

Use Conditions OWNER ROLE

Using the "Owner Role" in your "Use Conditions" for selecting which objects to check for duplicates will physically convert eht Owner Role to the individual user id's.

If you update the members of a Role, you should re-do the "use Conditions" (clear and re-add) so the correct user will be inserted.