Published on 28 August 2015 |
Electronic health records (EHR) represent a rich and relatively untapped resource for characterizing the true nature of clinical practice and for quantifying the degree of inter-relatedness of medical entities such as drugs, diseases, procedures and devices. We provide a unique set of co-occurrence matrices, quantifying the pairwise mentions of 3 million terms mapped onto 1 million clinical concepts, calculated from the raw text of 20 million clinical notes spanning 19 years of data. Co-frequencies were computed by means of a parallelized annotation, hashing, and counting pipeline that was applied over clinical notes from Stanford Hospitals and Clinics. The co-occurrence matrix quantifies the relatedness among medical concepts which can serve as the basis for many statistical tests, and can be used to directly compute Bayesian conditional probabilities, association rules, as well as a range of test statistics such as relative risks and odds ratios. This dataset can be leveraged to quantitatively assess comorbidity, drug-drug, and drug-disease patterns for a range of clinical, epidemiological, and financial applications.
Cited on 01 January 2026
Weight: 1.00
Cited on 28 December 2020
Weight: 1.59
Cited on 16 September 2014
Weight: 1.00
Mentioned on 07 October 2025
Weight: 1.79
Mentioned on 06 October 2025
Weight: 1.79
Mentioned on 02 October 2025
Weight: 1.79
Mentioned on 27 September 2025
Weight: 1.79
Mentioned on 27 September 2025
Weight: 1.79
Mentioned on 01 September 2025
Weight: 1.79
Mentioned on 31 August 2025
Weight: 1.79
Mentioned on 31 August 2025
Weight: 1.79
Mentioned on 27 August 2025
Weight: 1.79
Mentioned on 23 August 2025
Weight: 1.79
Dataset Index
FAIR Score
Citations
Mentions
Publisher
Dryad
Topic Name
Probabilistic Statistics in Medicine
Subfield
Statistics and Probability
Field
Mathematics
Domain
Physical Sciences
FT
CTw
MTw