首页 正文

Journal of biomedical informatics. 2025 Jun 1:104847. doi: 10.1016/j.jbi.2025.104847 Q24.02024

Focused digital cohort selection from social media using the metric backbone of biomedical knowledge graphs

基于生物医学知识图谱度量骨架的社交媒体数字人群聚焦式选择方法研究 翻译改进

Ziqi Guo  1, Jack Felag  1, Jordan C Rozum  1, Rion Brattig Correia  1, Xuan Wang  2, Luis M Rocha  3

作者单位 +展开

作者单位

  • 1 School of Systems Science & Industrial Engineering, Binghamton University, Binghamton, NY, USA.
  • 2 School of Informatics, Computing & Engineering, Indiana University, Bloomington, IN, USA; School of Systems Science & Industrial Engineering, Binghamton University, Binghamton, NY, USA.
  • 3 School of Systems Science & Industrial Engineering, Binghamton University, Binghamton, NY, USA; Universidade Católica Portuguesa, Católica Biomedical Research Centre, Lisbon, Portugal. Electronic address: rocha@binghamton.edu.
  • DOI: 10.1016/j.jbi.2025.104847 PMID: 40460925

    摘要 中英对照阅读

    Social media data allows researchers to construct large digital cohorts-groups of users who post health-related content--to study the interplay between human behavior and medical treatment. Identifying the users most relevant to a specific health problem is, however, a challenge in that social media sites vary in the generality of their discourse. While X (formerly Twitter), Instagram, and Facebook cater to wide ranging topics, Reddit subgroups and dedicated patient advocacy forums trade in much more specific, biomedically-relevant discourse. To filter relevant users on any social media, we have developed a general method and tested it on epilepsy discourse. We analyzed the text from posts by users who mention epilepsy drugs at least once in the general-purpose social media sites X and Instagram, the epilepsy-focused Reddit subgroup (r/Epilepsy), and the Epilepsy Foundation of America (EFA) forums. We used a curated medical terminology dictionary to generate a knowledge graph (KG) from each social media site, whereby nodes represent terms, and edge weights denote the strength of association between pairs of terms in the collected text. Our method is based on computing the metric backbone of each KG, which yields the (sparsified) subgraph of edges that participate in shortest paths. By comparing the subset of users who contribute to the backbone to the subset who do not, we show that epilepsy-focused social media users contribute to the KG backbone in much higher proportion than do general-purpose social media users. Furthermore, using human annotation of Instagram posts, we demonstrate that users who do not contribute to the backbone are much more likely to use dictionary terms in a manner inconsistent with their biomedical meaning and are rightly excluded from the cohort of interest. Our metric backbone approach, thus, has several benefits: it yields focused user cohorts who engage in discourse relevant to a targeted biomedical problem; unlike engagement-based approaches, it can retain low-engagement users who nonetheless contribute meaningful biomedical insights and filter out very vocal users who contribute no relevant content, it is parameter-free, algebraically principled, does not require classifiers or human-curation, and is simple to compute with the open-source code we provide.

    Keywords: Epilepsy; Network science; Network sparsification; Patient cohort selection; Social media mining.

    Keywords:digital cohort selection; social media; metric backbone; biomedical knowledge graphs

    社交媒体数据允许研究人员构建大规模的数字队列——即发布健康相关内容的用户群体,以研究人类行为与医疗治疗之间的相互作用。然而,在特定健康问题背景下识别最相关的用户是一个挑战,因为社交媒体网站在讨论话题的广泛性上各不相同。尽管X(前身为Twitter)、Instagram和Facebook涵盖各种主题,但Reddit子组和专门针对患者倡导的论坛则专注于更具体、更具生物医学相关性的讨论。为了过滤任何社交媒体上的相关用户,我们开发了一种通用方法并将其应用于癫痫相关的讨论。我们分析了在通用目的社交媒体平台X和Instagram上提及至少一次癫痫药物的用户的帖子文本,在以癫痫为重点的Reddit子组(r/Epilepsy)以及美国癫痫基金会(EFA)论坛中的内容。

    我们使用了一个经过整理的医学术语词典来从每个社交媒体网站生成知识图谱(KG),其中节点表示术语,边权重代表收集到的文本中成对术语之间关联强度。我们的方法基于计算每个KG的度量骨架,这将生成参与最短路径的子图(即稀疏化后的边缘)。通过比较贡献于骨架的用户与不贡献于骨架的用户,我们发现专注于癫痫的社交媒体用户比通用目的社交媒体用户更有可能为KG骨架做出贡献。

    此外,通过人工标注Instagram帖子,我们证明了没有为骨架做贡献的用户更可能以不符合其生物医学含义的方式使用字典术语,并且这些用户理应从感兴趣的队列中排除。因此,我们的度量骨架方法具有多种优势:它能够生成专注于特定生物医学问题的相关用户群体;与基于参与度的方法不同,它可以保留虽然参与度低但贡献了有意义的生物医学见解的用户,同时过滤掉那些非常活跃但没有提供相关内容的用户;该方法无需参数,具备代数原则,并且不需要分类器或人工标注,同时还易于计算,我们提供了开源代码。

    关键词:癫痫、网络科学、网络稀疏化、患者队列选择、社交媒体挖掘。

    关键词:数字队列选择; 社交媒体; 度量基础; 生物医学知识图谱

    翻译效果不满意? 用Ai改进或 寻求AI助手帮助 ,对摘要进行重点提炼
    Copyright © Journal of biomedical informatics. 中文内容为AI机器翻译,仅供参考!

    相关内容

    期刊名:Journal of biomedical informatics

    缩写:J BIOMED INFORM

    ISSN:1532-0464

    e-ISSN:1532-0480

    IF/分区:4.0/Q2

    文章目录 更多期刊信息

    全文链接
    引文链接
    复制
    已复制!
    推荐内容
    Focused digital cohort selection from social media using the metric backbone of biomedical knowledge graphs