Match scraped data with your internal databases
Data matching, or record linkage, is the task of finding records in a data set that refer to the same entity across different data sources.
It is an often-needed next step after the retrieving of data from remote sites. For example, after extracting a website’s product prices, you may want to match the site’s products against your own in order to enable price comparison.
This can be an easy sample to obtain, if your data and the sought after data share an identical key. However, if they do not share an identical key this can be a difficult and resource-consuming data acquisition project.
Our data specialists configure our versatile data matching software to provide cost-efficient and accurate results.
The entire spectrum of use cases are supported, these include:
- If your data requires domain expertise to be matched, a human with the knowledge will do the matching, usually someone within your own organisation. Our state of the art user interface is there to make this process as quick as possible. Before starting, we fine-tune the parameters to our software’s algorithms to display the best matches first. These minimize the time it takes to treat each item.
- If there is a pattern that is relatively simple to follow from a software point of view, we configure our software to automatically match your datasets. No action on your part would be required.
- If there are no consistent matching rules and a large amount of data but approximate matching is good enough for your needs, we will select an algorithm that will get mostly accurate results, and check it against a small handmade sample to measure the quality and accuracy.
- If your data must be matched by hand but no particular expertise is required, our data entry operatives will do the matching for you, under close supervision and with quality and accuracy checks.
- If your dataset fits a combination of the above, we will divide it into parts that we will treat separately so that quality can be maximized and costs minimized.