Skip to content
July 10, 2013 / christinazuniga

Best Practice – Data Cleansing and Formatting

Data_IntegrityArt Without Limits (AWoL) is a non-profit organization that creates intensive one-on-one mentorships for aspiring artists by pairing them with professional artists and business experts. Their contact database is filled with experienced and prospective mentors, donors, like minded organizations, and aspiring artists currently being mentored.

They build their contact list through a variety of ways, but often through copy / paste or manually entered from stacks of business cards. This method of data entry is often rife with errors. Duplicates, spelling or data formatting errors, inappropriate special characters…there is an almost endless list of reasons why a contact list can contain incorrect information.

Spelling Errors
One of my tasks was to manually enter a number of contacts into a spreadsheet based on their business card information. Human error is common with this type of repetitive task, such as leaving out or adding letters: instead of .cOm. One of the most common mistakes is spelling difference between a name and an email address, such as JoHn Doe with an email of – the H in his name has been excluded from the email. Most email programs will alert you to errors like this by marking your contact as a bounce.

Special Characters
When copy and pasting an email address or contact details, special characters can accidentally be added even when they are not visible. Microsoft Office Word is notorious for slipping special characters as part of their formatting. Even an ampersand (&) can be considered a special character by many email programs.

Once I built the list of business card contacts, I requested a copy of AWoL’s contact database and performed a VLOOKUP to remove duplicates from the list. VLOOKUP is an Excel function that is used to retrieve information from a table of data by matching based on a cell. I used email address as a marker to see which contacts were in both lists – the business card list and the existing database. Then I sent the organization the list of “new” contacts for them to upload into their database.

Imagine our surprise when the upload was completed and the database was recorded dozens of duplicate contact records! In response I ran another VLOOKUP only to see the same result. That is, until I completed a direct pull of the database myself.

Data Formatting
During the initial download, my contact at the non-profit reviewed the records in Excel, made changes and added formatting to the fields. The formatting was meant to help her track which fields had been modified in the Excel sheet and would change the contact record when the upload was completed. This additional formatting, specifically of the email address cell, caused my VLOOKUP to fail. Excel was unable to recognize identical email addresses because one had special formatting while the other was plain text!

It is best to make changes to individual contact records directly in the database or in the source database if there are multiple databases. Uploading changes, especially to small-medium email or contact database companies is not ideal unless the upload is to a newly created empty field or the sheer number of contacts to update is not possible on an individual basis.

Have you made a mistake that caused data integrity problems?

%d bloggers like this: