TY - GEN
T1 - Abstract Mining
AU - Small, Ellie
AU - Cabrera, Javier
AU - Kostis, John B.
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/9/21
Y1 - 2020/9/21
N2 - IMPORTANCE: The marked explosion and fragmentation of bibliographic databases that include large parts pertaining to medical subspecialties has created the opportunity to identify new areas of research using the citations at the interface of subspecialty information. Bibliographic databases such as PubMed are useful to researchers when they wish to identify specific citations of their interest. However, they are not useful because of their size for the purpose of identifying new areas of research. OBJECTIVE: To present a method and two computer applications that identify areas for new research by finding abstracts at the interface between subspecialty parts of PubMed. DESIGN: Here we present a new method and computer applications that aim to ameliorate the problem by examining all abstracts that fulfill the general search terms from PubMed. Using text-mining algorithms of the abstracts to extract all non-Trivial words, the researcher can repeatedly cluster the publications by commonality of the words in the abstracts to find unusual or unexpected combinations of words that may lead to new research. When single words are not descriptive enough to identify unique and unexpected ideas for potential new research, we allow the extraction of principal phrases from those abstracts instead. Here we define a principal phrase as a phrase that is common by itself, i.e. not common only as part of another common phrase, does not cross punctuation marks, and is informative (e.g. "and this disease"is not an informative phrase). FINDINGS: We present four examples of identifying new research areas by examining PubMed outcomes after searches for "takotsubo", "embolic stroke"excluding "atrial fibrillation", "impedance mismatch", and "aortic and stenosis". New areas of research were identified including comparisons of the clinical picture and pathophysiology of Takotsubo with scorpion envenomation, and the importance of impedance mismatch in pulmonary and renal circulation. CONCLUSION AND RELEVANCE: In conclusion, we have developed a method and two computer applications to mine words and/or principal phrases from the abstracts retrieved from PubMed or other databases to identify new ideas for research.
AB - IMPORTANCE: The marked explosion and fragmentation of bibliographic databases that include large parts pertaining to medical subspecialties has created the opportunity to identify new areas of research using the citations at the interface of subspecialty information. Bibliographic databases such as PubMed are useful to researchers when they wish to identify specific citations of their interest. However, they are not useful because of their size for the purpose of identifying new areas of research. OBJECTIVE: To present a method and two computer applications that identify areas for new research by finding abstracts at the interface between subspecialty parts of PubMed. DESIGN: Here we present a new method and computer applications that aim to ameliorate the problem by examining all abstracts that fulfill the general search terms from PubMed. Using text-mining algorithms of the abstracts to extract all non-Trivial words, the researcher can repeatedly cluster the publications by commonality of the words in the abstracts to find unusual or unexpected combinations of words that may lead to new research. When single words are not descriptive enough to identify unique and unexpected ideas for potential new research, we allow the extraction of principal phrases from those abstracts instead. Here we define a principal phrase as a phrase that is common by itself, i.e. not common only as part of another common phrase, does not cross punctuation marks, and is informative (e.g. "and this disease"is not an informative phrase). FINDINGS: We present four examples of identifying new research areas by examining PubMed outcomes after searches for "takotsubo", "embolic stroke"excluding "atrial fibrillation", "impedance mismatch", and "aortic and stenosis". New areas of research were identified including comparisons of the clinical picture and pathophysiology of Takotsubo with scorpion envenomation, and the importance of impedance mismatch in pulmonary and renal circulation. CONCLUSION AND RELEVANCE: In conclusion, we have developed a method and two computer applications to mine words and/or principal phrases from the abstracts retrieved from PubMed or other databases to identify new ideas for research.
KW - Abstracts
KW - Clustering
KW - Phrase Mining
KW - Phrases
KW - Text Mining
UR - http://www.scopus.com/inward/record.url?scp=85096989326&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096989326&partnerID=8YFLogxK
U2 - 10.1145/3388440.3412476
DO - 10.1145/3388440.3412476
M3 - Conference contribution
AN - SCOPUS:85096989326
T3 - Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
BT - Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
PB - Association for Computing Machinery, Inc
T2 - 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
Y2 - 21 September 2020 through 24 September 2020
ER -