Record Linkage Resources
From DML
Here is a listing of useful record linkage papers, blogs, software, etc.
Contents |
[edit] Our Publications
See Genealogical_Record_Linkage#Publications
[edit] Links
- Record Linkage in Genealogy - Describes some of the theory behind the technology of record linkage in genealogy.
- Book List - Recommended Books and Articles on Record Linking.
"...Detecting database records that are approximate duplicates, but not exact duplicates, is an important task. Databases may contain duplicate records concerning the same real-world entity because of data entry errors, unstandardized abbreviations, or differences in the detailed schemas of records from multiple databases – such as what happens in data warehousing where records from multiple data sources are integrated into a single source of information"
[edit] Blogs
- Our Data Mining Lab's Blog - Has open discussion forums on Genealogical record linkage.
- Record Linkage Resources and Blog - A good blog list on record linkage.
[edit] General Tools
- Eclipse - A very good Open Source Java IDE. Also see the Eclipse Wiki.
- SimMetrics - For String Comparisons, SimMetrics is an open source extensible library of Similarity or Distance Metrics (Direct SourceForge Download). Steve also has created a number of additional tools including ensembles of his favorite matching metrics.
- SecondString - An open-source Java package of approximate string-matching techniques.
- DDupe - A novel tool, by Lise Getoor et al, for interactive data deduplication and Integration
- Febrl - (Freely Extensible Biomedical Record Linkage) does data standardization (segmentation and cleaning) and probabilistic record linkage ("fuzzy" matching) of one or more files or data sources which do not share a unique record key or identifier.
- The Link King - a free record linkage application for SAS
- Weka - Open Source machine learning toolkit (Also, see the Weka Wiki, or the Weka API).
- Wikipedia - Quick Information
- JavaBayes - Bayesian Networks in Java
- Berkeley DB Java Edition - Berkeley DB Java Edition is a high performance, transactional Database System written entirely in Java.
- Junit - Great java testing suite (See its API).
- Databases - Different database resources.
