Automatic (near-) duplicate content document detection in a cancer registry
{{output}}
Background: Duplicate and near-duplicate medical documents are problematic in document management, clinical use, and medical research. In this study, we focus on multisourced medical documents in the context of a population-based... ...