CRMfusion Inc. | Support Home | DemandTools | PeopleImport | DupeBlocker | EzDeDupe | AccountComplete | NO-DUPES | Team Technical Blog |

EzDeDupe

EzDeDupe Product Details

EzDeDupe for Microsoft Windows - Product Details

EzDeDupe has been designed as an easy to use, affordable and effective database/spreadsheet deduplication tool. EzDeDupe gives the ability to load multiple files, use advanced deduplication algorithms to match mismatched data and then the ability to export cleaned data (and more) to a variety of database formats.


Using EzDeDupe to load data sources
(Click for Larger View)

EzDeDupe will act as a central repository of data while you are in the process of standardizing, cleaning and of course deduplicating.

You will be able to fill the central repository by connecting and importing a variety of different data sources including XLS, MDB, XML, CSV, TXT, DBF as well as any ODBC or UDL compliant database. Once all the data sources are loaded into the EzDeDupe central repository, then they are ready for duplicate finding and potentially merging.

Effective deduplication requires the ability to match “mismatched” data.  This is what separates not only good software vs. poor software but also usable data vs. non-usable data.

Every field in the loaded data set can be used to develop deduplication criteria.

With EzDeDupe two different classes of matching techniques are provided for identifying duplicates, mapping types and mapping options.  These can be used individually or in combination with each other.  Detailed descriptions of each can be found at the bottom of this document.

 


In EzDeDupe any field can be used with our matching algorithms
(Click for Larger View)

Duplicate found Screen (notice mismatched data)
(Click for Larger View)

The EzDeDupe found duplicate screen easily shows the records in their duplicate sets. Details are available under the view icon and from here a variety of different merge operations can be performed.

The easiest merge is the custom merge, where all the records in the duplicate set are displayed and the user selects or "paints" the perfect record on a field by field basis.

For larger data sets where custom merging is not practical, EzDeDupe allows for a more typical IT merge, the master vs. servant merge.  In this merge one record is selected as the primary record (master) and the other records are merged into the master.  Master records can be selected manually, or by using our master rules which provide a fast and flexible method for merging.


Complex master rules can be developed to automatically select the appropriate record as the master based on the users specific data requirements.  As with all of EzDeDupe, the Master Rules are designed for maximum flexibility.


Developing a Master Rule to select master vs. servant records
(Click for Larger View)

After applying rule, notice scored and selected records
(Click for Larger View)

After applying the custom developed master rule, the records are individually scored and the master records indicated by green pins. The servant records are indicated with red pins. The red pinned records will be merged into the green pinned records.

Of course EzDeDupe provides a great number of merge options including the ability to update a field where a master record has a blank value and field combining and concatenation.

Once the merge is complete on the internal database (source data is never changed), EzDeDupe gives the user a variety of different export formats including the ability to export the clean database, the change records, the merge reports and more.


Post Merge with Export options along right hand menu
(Click for Larger View)

   

Examples of Mapping types in EzDeDupe.

Cleaned Account Name:   Uses the built in Account Name Cleaning List. The cleaning list standardizes punctuation, spaces, word synonyms as well as removing common business prefixes and suffixes. These lists are customizable to your language(s) and/or line of business.

Country Match:   The country mapping type is used to standardize field values for the recognized countries of the world. It makes the long name, 2 digit ISO short form, 3 digit ISO short form and the numeric ISO country value all to appear to be matches of each other.

Domain:   The domain mapping type is used when mapping web pages and/or email addresses. It allows for the independent analysis of the domain information contained within the URL or the email address. For email addresses it uses any information to the right of the @ sign. For web pages it parses the XXXXX.com portion. This tool allows for easy comparison of web page field vs . web page field or email field vs . email field. It also by nature allows for the comparison of email addresses compared to web pages and vice versa.

Exact : The exact mapping type in the Single Table Deduplication tool is exactly that, a 100% match of every character (assuming no options apply).

FirstName:   Uses the built in Nickname List. To see the Nickname tool select the "Edit Nickname List" button at the top of the interface.

The Nickname list allows the deduplication tool to see Bill, William, Billy, etc. as potential duplicates of each other. This list is also customizable by the end user for localization or even in theory for non contact substitution on any field by replacing the nickname list with synonyms.

First XX Letters:   Compares only the first XX letters in a field. Text fields are the only applicable field type.   The user can select as many letters as they would like to compare.

Numeric:   Compares only the numeric values in a field.   Other characters that the field contains, such as spaces or punctuation, will be ignored and not seen by the deduper.   A field with a value of " Apt # 31" is seen to the deduper as only the numeric characters "31".   This is often used with phone number fields, so that (999) 555-1212 will match to 999-555-1212.   In this case the deduper will see this as 9995551212.

Relaxed Address Match:   Parses the street address to the lowest common denominator. Based on North American standards, it has also proved effective with most country address formats.

With relaxed address match the following addresses are all seen in the lowest common denominator of: 123 Pavillion:

  • Apt #4, 123 Pavillion Street
  • 123 Pavillion, Apt 4
  • 4-123 Pavillion Ave NW


Relaxed NA Phone Match: Removes all non-numeric characters and spaces. If the first is a 1 or 0 removes it. If just 7 digits are left use those seven digits, else just return digits 4 - 10. It will not match the "Phone-word" values and will trim off the "SPOT" in the phone number and only look at the numeric portion.

Street Address Match:   The street address match is a slightly more rigid criteria than the relaxed address match tool. It will ignore the differences in street type short forms such as crescent - cres, road - rd, street – st.

Zip 5 and 9 Match:   This mapping type will automatically match USPS 5 and 9 digit zip codes together without the need to standardize them first to a common number of digits.

Examples of Mapping Options

Type Description Mapping Types
Fuzzy Phonetics engine capable of analyzing words for how they sound when pronounced. Through a technique of removing vowels and analyzing the remaining consonants the fuzzy engines works very well for matching fields with spelling mistakes. Cleaned Account Name
Exact
FirstName

Transpose

The transpositional engine allows for fields to appear to be duplicates even if the have differences in their word order. For example Jones, Smith and Jackson will appear to be a duplicate of Jackson, Smith and Jones.

Cleaned Account Name
Exact
FirstName
Street
Alpha Clean

The alpha cleaner extends some of the capabilities of the account name cleaner to other fields for matching. The alpha cleaner is used when you know you only have ascii (north american) data and you would like to ensure that the only characters that are analyzed are the 26 characters of the english alphabet and the numbers 0-9. Any other character that the field may contain will be ignored and not seen by the deduplication matching algorithms.

Cleaned Account Name
Exact
FirstName
Numeric
Street
Zip 5 and 9