Data cleansing

Data cleansingData cleansing is the process of detecting and correcting corrupt or inaccurate data records in a data repository. It transforms large amounts of ambiguous and heterogeneous information into consistent data sets.

Any data repository of a decent size is likely to contain a certain percentage of “junk” data records, i.e. duplicates generated due to typos or different spelling, manual data entry errors and similar words or phrases. Automatic data cleansing technology involves fuzzy search methods for recognising potential duplicates and inconsistencies with further reconciliation. When detected groups of duplicate or similar values are reviewed and, if necessary, corrected. Reconciliation may include removing typos, validating values against a reference list, discarding inconsistent data or automatic insertion of missing values.

Data cleansing and validation at entry are introduced in all Texunatech’s products including Soltex, and bespoke software solutions.