Literature

lingtypology

CRAN Peer-reviewed

Linguistic Typology and Mapping

Maintainer

George Moroz

Description

Provides R with the Glottolog database https://glottolog.org/ and some more abilities for purposes of linguistic mapping. The Glottolog database contains the catalogue of languages of the world. This package helps researchers to make a linguistic maps, using philosophy of the Cross-Linguistic Linked Data project https://clld.org/, which allows for while at the same time facilitating uniform access to the data across publications. A tutorial for this package is available on GitHub pages https://docs.ropensci.org/lingtypology/ and package vignette. Maps created by this package can be used both for the investigation and linguistic teaching. In addition, package provides an ability to download data from typological databases such as WALS, AUTOTYP and some others and to create your own database website.

Scientific use cases

Maisak, T. (2017). Repetitive prefix in Agul: Morphological copy from a closely related language. International Journal of Bilingualism, 136700691774006. https://doi.org/10.1177/1367006917740060
Roettger, T., & Gordon, M. (2017). Methodological issues in the study of word stress correlates. Linguistics Vanguard, 3(1). http://www.linguistics.ucsb.edu/faculty/gordon/Roettger&Gordon_AcousticMethodologoy.pdf
Hantgan-Sonko, A. (2020). Synchronic and diachronic strategies of mora preservation in Gújjolaay Eegimaa. Journal of African Languages and Literatures, (1), 1-25. http://www.politics.unina.it/index.php/jalalit/article/download/6732/7790
Ye, J. (2020). Independent and dependent possessive person forms. Studies in Language, 44(2), 363–406. https://doi.org/10.1075/sl.19020.ye

View Documentation

medrxivr

CRAN Peer-reviewed

Access and Search MedRxiv and BioRxiv Preprint Data

Maintainer

Luke McGuinness

Description

An increasingly important source of health-related bibliographic content are preprints - preliminary versions of research articles that have yet to undergo peer review. The two preprint repositories most relevant to health-related sciences are medRxiv https://www.medrxiv.org/ and bioRxiv https://www.biorxiv.org/, both of which are operated by the Cold Spring Harbor Laboratory. medrxivr provides programmatic access to the Cold Spring Harbour Laboratory (CSHL) API https://api.biorxiv.org/, allowing users to easily download medRxiv and bioRxiv preprint metadata (e.g. title, abstract, publication date, author list, etc) into R. medrxivr also provides functions to search the downloaded preprint records using regular expressions and Boolean logic, as well as helper functions that allow users to export their search results to a .BIB file for easy import to a reference manager and to download the full-text PDFs of preprints matching their search criteria.

View Documentation

rplos

CRAN Staff maintained

Interface to the Search API for PLoS Journals

Maintainer

Scott Chamberlain

Description

A programmatic interface to the SOLR based search API (http://api.plos.org/) provided by the Public Library of Science journals to search their articles. Functions are included for searching for articles, retrieving articles, making plots, doing faceted searches, highlight searches, and viewing results of highlighted searches in a browser.

Scientific use cases

Hartgerink, C. H. J., van Aert, R. C. M., Nuijten, M. B., Wicherts, J. M., & van Assen, M. A. L. M. (2016). Distributions ofp-values smaller than .05 in psychology: what is going on? PeerJ, 4, e1935. https://doi.org/10.7717/peerj.1935
White, E. (2015). Some thoughts on best publishing practices for scientific software. IEE, 8. https://doi.org/10.4033/iee.2015.8.9.c
Gálvez, R. H. (2017). Assessing author self-citation as a mechanism of relevant knowledge diffusion. Scientometrics. https://doi.org/10.1007/s11192-017-2330-1
Li, K., Yan, E., & Feng, Y. (2017). How is R cited in research outputs? Structure, impacts, and citation standard. Journal of Informetrics, 11(4), 989–1002. https://doi.org/10.1016/j.joi.2017.08.003
Federer LM, Belter CW, Joubert DJ, Livinski A, Lu YL, et al. (2018) Data sharing in PLOS ONE: An analysis of Data Availability Statements. PLOS ONE 13(5): e0194768. https://doi.org/10.1371/journal.pone.0194768
Jaspers, S., De Troyer, E., & Aerts, M. (2018). Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA. EFSA Supporting Publications, 15(6), 1427E. https://doi.org/10.2903/sp.efsa.2018.EN-1427
Nuijten, M. B. (2018, April 30). Research on Research: A Meta-Scientific Study of Problems and Solutions in Psychological Science. https://doi.org/10.31234/osf.io/qtk7e
Enkhbayar, A., Haustein, S., Barata, G., & Alperin, J. P. (2019). How much research shared on Facebook is hidden from public view? A comparison of public and private online activity around PLOS ONE papers. arXiv preprint arXiv:1909.01476. https://arxiv.org/abs/1909.01476
Mishra, P., & Narayan Tripathi, L. (2019). Characterization of two‐dimensional materials from Raman spectral data. Journal of Raman Spectroscopy. https://doi.org/10.1002/jrs.5744
Vílchez-Román, C., Huamán-Delgado, F., & Alhuay-Quispe, J. (2020). Social dimension activates the usage and academic impact of Open Access publications in Andean countries: a structural modeling-based approach. Information Development, 026666692090184. https://doi.org/10.1177/0266666920901849
Enkhbayar, A., Haustein, S., Barata, G., & Alperin, J. P. (2020). How much research shared on Facebook happens outside of public pages and groups? A comparison of public and private online activity around PLOS ONE papers. Quantitative Science Studies, 1–22. https://doi.org/10.1162/qss_a_00044

View Documentation

qpdf

CRAN Staff maintained

Split, Combine and Compress PDF Files

Maintainer

Jeroen Ooms

Description

Content-preserving transformations transformations of PDF files such as split, combine, and compress. This package interfaces directly to the qpdf C++ API and does not require any command line utilities. Note that qpdf does not read actual content from PDF files: to extract text and data you need the pdftools package.

View Documentation

fulltext

CRAN Staff maintained

Full Text of Scholarly Articles Across Many Data Sources

Maintainer

Scott Chamberlain

Description

Provides a single interface to many sources of full text scholarly data, including Biomed Central, Public Library of Science, Pubmed Central, eLife, F1000Research, PeerJ, Pensoft, Hindawi, arXiv preprints, and more. Functionality included for searching for articles, downloading full or partial text, downloading supplementary materials, converting to various data formats.

Scientific use cases

Bauer, P. C., Barbera, P., & Munzert, S. (2016). The Quality of Citations: Towards Quantifying Qualitative Impact in Social Science Research. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2874549
Piper, A. M., Batovska, J., Cogan, N. O. I., Weiss, J., Cunningham, J. P., Rodoni, B. C., & Blacket, M. J. (2019). Prospects and challenges of implementing DNA metabarcoding for high-throughput insect surveillance. GigaScience, 8(8). https://doi.org/10.1093/gigascience/giz092
Mishra, P., & Narayan Tripathi, L. (2019). Characterization of two‐dimensional materials from Raman spectral data. Journal of Raman Spectroscopy. https://doi.org/10.1002/jrs.5744
Vitale, O., Preste, R., Palmisano, D., & Attimonelli, M. (2019). A data and text mining pipeline to annotate human mitochondrial variants with functional and clinical information. Molecular Genetics & Genomic Medicine, 8(2). https://doi.org/10.1002/mgg3.1085
Joo, R., Picardi, S., Boone, M. E., Clay, T. A., Patrick, S. C., Romero-Romero, V. S., & Basille, M. (2020). A decade of movement ecology. arXiv preprint arXiv:2006.00110 https://arxiv.org/pdf/2006.00110.pdf

View Documentation

hunspell

CRAN Staff maintained

High-Performance Stemmer, Tokenizer, and Spell Checker

Maintainer

Jeroen Ooms

Description

Low level spell checker and morphological analyzer based on the famous hunspell library https://hunspell.github.io. The package can analyze or check individual words as well as parse text, latex, html or xml documents. For a more user-friendly interface use the spelling package which builds on this package to automate checking of files, documentation and vignettes in all common formats.

Scientific use cases

Cichosz, P. (2018) A case study in text mining of discussion forum posts: classification with bag of words and global vectors Int. J. Appl. Math. Comput. Sci., Vol. 28, No. 4, 787–801. https://www.amcs.uz.zgora.pl/?action=paper&paper=1469
Yeomans, M., Kantor, A., & Tingley, D. (2018). The politeness Package: Detecting Politeness in Natural Language. The R Journal. https://journal.r-project.org/archive/2018/RJ-2018-067/RJ-2018-067.pdf
Lee, A. J., Jones, B. C., & DeBruine, L. M. (2019, January 21). Investigating the association between mating-relevant self-concepts and mate preferences through a data-driven analysis of online personal descriptions. https://doi.org/10.31234/osf.io/38zef
Liu, Crocker H., Nowak, Adam, and Smith, Patrick S. 2018. Does the Asset Pricing Premium Reflect Asymmetric or IncompleteInformation?. Economics Faculty Working Papers Series. 5. https://researchrepository.wvu.edu/econ_working-papers/5
Nicolas, G., Bai, X., & Fiske, S. T. (2019). Automated Dictionary Creation for Analyzing Text: An Illustration from Stereotype Content. https://psyarxiv.com/afm8k/download?format=pdf
Bayer, D., & Michael, S. (2019). Exploring the Daschle Collection using Text Mining. arXiv preprint arXiv:1904.12623 https://arxiv.org/pdf/1904.12623
Green, E. P., Whitcomb, A., Kahumbura, C., Rosen, J. G., Goyal, S., Achieng, D., & Bellows, B. (2019). What is the best method of family planning for me?: a text mining analysis of messages between users and agents of a digital health service in Kenya. Gates Open Research, 3, 1475. https://doi.org/10.12688/gatesopenres.12999.1
Lin, C., Lou, Y.-S., Tsai, D.-J., Lee, C.-C., Hsu, C.-J., Wu, D.-C., … Fang, W.-H. (2019). Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study. JMIR Medical Informatics, 7(3), e14499. https://doi.org/10.2196/14499
Luc, A., Lê, S., & Philippe, M. (2019). Nudging consumers for relevant data using Free JAR profiling: an application to product development. Food Quality and Preference, 103751. https://doi.org/10.1016/j.foodqual.2019.103751
Ramagopalan, S. V., Malcolm, B., Merinopoulou, E., McDonald, L., & Cox, A. (2019). Automated extraction of treatment patterns from social media posts: an exploratory analysis in renal cell carcinoma. Future Oncology. https://doi.org/10.2217/fon-2019-0406
Cinelli, M., Ficcadenti, V., & Riccioni, J. (2019). The interconnectedness of the economic content in the speeches of the US Presidents. Annals of Operations Research. https://doi.org/10.1007/s10479-019-03372-2
Christensen, A. P., & Kenett, Y. (2019, October 22). Semantic Network Analysis (SemNA): A Tutorial on Preprocessing, Estimating, and Analyzing Semantic Networks. https://doi.org/10.31234/osf.io/eht87
Booth, A., Bell, T., Halhol, S., Pan, S., Welch, V., Merinopoulou, E., … Cox, A. (2019). Using Social Media to Uncover Treatment Experiences and Decisions in Patients With Acute Myeloid Leukemia or Myelodysplastic Syndrome Who Are Ineligible for Intensive Chemotherapy: Patient-Centric Qualitative Data Analysis. Journal of Medical Internet Research, 21(11), e14285. https://doi.org.10.2196/14285
Deng, H., Wang, Q., Turner, D. P., Sexton, K. E., Burns, S. M., Eikermann, M., … Houle, T. T. (2020). Sentiment analysis of real-world migraine tweets for population research. Cephalalgia Reports, 3, 251581631989886. https://doi.org/10.1177/2515816319898867
Cinelli, M. (2019). Generalized rich-club ordering in networks. Journal of Complex Networks, 7(5), 702–719. https://doi.org/10.1093/comnet/cnz002
Funk, B., Sadeh-Sharvit, S., Fitzsimmons-Craft, E. E., Trockel, M. T., Monterubio, G. E., Goel, N. J., … Taylor, C. B. (2020). A Framework for Applying Natural Language Processing in Digital Health Interventions. Journal of Medical Internet Research, 22(2), e13855. https://doi.org/10.2196/13855
Cichosz, P. (2020). Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation. Natural Language Engineering, 1–28. https://doi.org/10.1017/s1351324920000066
Pruchnik, P. (2020). Identification of Trends in the Polish Media on the Example of the Quarterly Studia Medioznawcze The Use of Big Data Tools. Media Studies, 80(1). http://yadda.icm.edu.pl/yadda/element/bwmeta1.element.desklight-e79ed2c7-fd7d-4a91-8895-c322743c8f48/c/04_Pruchnik_EN.pdf
Hamilton, L. M., & Lahne, J. (2020). Fast and automated sensory analysis: Using natural language processing for descriptive lexicon development. Food Quality and Preference, 83, 103926. https://doi.org/10.1016/j.foodqual.2020.103926
DellaPosta, D., & Nee, V. (2020). Emergence of diverse and specialized knowledge in a metropolitan tech cluster. Social Science Research, 86, 102377. https://doi.org/10.1016/j.ssresearch.2019.102377
Geller, J., Davis, S. D., & Peterson, D. (2020, May 23). Sans forgetica is not desirable for learning. https://doi.org/10.31234/osf.io/ku5bz
Morselli, D., Passini, S., & McGarty, C. (2020). Sos Venezuela: an analysis of the anti-Maduro protest movements using Twitter. Social Movement Studies, 1–22. https://doi.org/10.1080/14742837.2020.1770072
Ficcadenti, V., Cerqueti, R., Ausloos, M., & Dhesi, G. (2020). Words ranking and Hirsch index for identifying the core of the hapaxes in political texts. Journal of Informetrics, 14(3), 101054. https://doi.org/10.1016/j.joi.2020.101054

View Documentation

pdftools

CRAN Staff maintained

Text Extraction, Rendering and Converting of PDF Documents

Maintainer

Jeroen Ooms

Description

Utilities based on libpoppler for extracting text, fonts, attachments and metadata from a PDF file. Also supports high quality rendering of PDF documents into PNG, JPEG, TIFF format, or into raw bitmap vectors for further processing in R.

Scientific use cases

Cole, C. B., Patel, S., French, L., & Knight, J. (2016). Semi-Automated Identification of Ontological Labels in the Biomedical Literature with goldi. https://doi.org/10.1101/073460
Krotov, V., & Tennyson, M. (2018). Scraping Financial Data from the Web Using R Language. Journal of Emerging Technologies in Accounting. https://doi.org/10.2308/jeta-52063
Iqbal, J. (2019). Managerial Self-Attribution Bias and Banks’ Future Performance: Evidence from Emerging Economies. Journal of Risk and Financial Management, 12(2), 73. https://doi.org/10.3390/jrfm12020073
Hanna, A., & Hanna, L.-A. (2019). Topic Analysis of UK Fitness to Practise Cases: What Lessons Can Be Learnt? Pharmacy, 7(3), 130. https://doi.org/10.3390/pharmacy7030130
Hwang, L. J., Pauloo, R. A., & Carlen, J. (2019). Assessing Impact of Outreach through Software Citation for Community Software in Geodynamics. Computing in Science & Engineering, 1–1. https://doi.org/10.1109/mcse.2019.2940221
Ulibarri, N., & Scott, T. A. (2019). Environmental hazards, rigid institutions, and transformative change: How drought affects the consideration of water and climate impacts in infrastructure management. Global Environmental Change, 59, 102005. https://doi.org/10.1016/j.gloenvcha.2019.102005
Lope, D. J., & Dolgun, A. (2020). Measuring the inequality of accessible trams in Melbourne. Journal of Transport Geography, 83, 102657. https://doi.org/10.1016/j.jtrangeo.2020.102657
Verde Arregoitia, L. D., Teta, P., & D’Elía, G. (2020). Patterns in research and data sharing for the study of form and function in caviomorph rodents. Journal of Mammalogy. https://doi.org/10.1093/jmammal/gyaa002
Hagan, A. K., Pollet, R. M., & Libertucci, J. (2020). Suggestions for Improving Invited Speaker Diversity To Reflect Trainee Diversity. Journal of Microbiology & Biology Education, 21(1). https://doi.org/10.1128/jmbe.v21i1.2105
Berkel, C., & Cacan, E. (2020). GAB2 and GAB3 are expressed in a tumor stage-, grade- and histotype-dependent manner and are associated with shorter progression-free survival in ovarian cancer. Journal of Cell Communication and Signaling. https://doi.org/10.1007/s12079-020-00582-3
Scott, T. A., Ulibarri, N., & Perez Figueroa, O. (2020). NEPA and National Trends in Federal Infrastructure Siting in the United States. Review of Policy Research. https://doi.org/10.1111/ropr.12399
Roa-Ureta, R. H., Henríquez, J., & Molinet, C. (2020). Achieving sustainable exploitation through co-management in three Chilean small-scale fisheries. Fisheries Research, 230, 105674. https://doi.org/10.1016/j.fishres.2020.105674
Westgate, M. J., Barton, P. S., Lindenmayer, D. B., & Andrew, N. R. (2020). Quantifying shifts in topic popularity over 44 years of Austral Ecology. Austral Ecology, 45(6), 663–671. https://doi.org/10.1111/aec.12938
Marshall, B. M., Strine, C., & Hughes, A. C. (2020). Thousands of reptile species threatened by under-regulated global trade. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-18523-4
Li, B., Trueman, B. F., Rahman, M. S., & Gagnon, G. A. (2021). Controlling lead release due to uniform and galvanic corrosion — An evaluation of silicate-based inhibitors. Journal of Hazardous Materials, 407, 124707. https://doi.org/10.1016/j.jhazmat.2020.124707

View Documentation

roadoi

CRAN Peer-reviewed

Find Free Versions of Scholarly Publications via Unpaywall

Maintainer

Najko Jahn

Description

This web client interfaces Unpaywall https://unpaywall.org/products/api, formerly oaDOI, a service finding free full-texts of academic papers by linking DOIs with open access journals and repositories. It provides unified access to various data sources for open access full-text links including Crossref and the Directory of Open Access Journals (DOAJ). API usage is free and no registration is required.

Scientific use cases

Ashby, M. P. J. (2020, March 6). Three quarters of new criminological knowledge is hidden from policy makers. https://doi.org/10.31235/osf.io/wnq7h
Ashby, M. P. J. (2020). The Open-Access Availability of Criminological Research to Practitioners and Policy Makers. Journal of Criminal Justice Education, 1–21. https://doi.org/10.1080/10511253.2020.1838588
Robinson-Garcia, N., van Leeuwen, T. N., & Torres-Salinas, D. (2020). Measuring Open Access Uptake: Data Sources, Expectations, and Misconceptions. Scholarly Assessment Reports, 2(1). https://doi.org/10.29024/sar.23
Clayson, P. E., Baldwin, S., & Larson, M. J. (2020). The Open Access Advantage for Studies of Human Electrophysiology: Impact on Citations and Altmetrics. https://doi.org/10.31234/osf.io/5xagd

View Documentation

rorcid

CRAN Staff maintained

Interface to the Orcid.org API

Maintainer

Scott Chamberlain

Description

Client for the Orcid.org API (https://orcid.org/). Functions included for searching for people, searching by DOI, and searching by Orcid ID.

View Documentation

rdatacite

CRAN Staff maintained

Client for the DataCite API

Maintainer

Scott Chamberlain

Description

Client for the web service methods provided by DataCite (https://www.datacite.org/), including functions to interface with their RESTful search API. The API is backed by Elasticsearch, allowing expressive queries, including faceting.

Scientific use cases

Jaspers, S., De Troyer, E., & Aerts, M. (2018). Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA. EFSA Supporting Publications, 15(6), 1427E. https://doi.org/10.2903/sp.efsa.2018.EN-1427
White, L., & Santy, S. (2018). DataDepsGenerators.jl: making reusing data easy by automatically generating DataDeps.jl registration code. Journal of Open Source Software, 3(31), 921. https://doi.org/10.21105/joss.00921

View Documentation

pubchunks

CRAN Staff maintained

Fetch Sections of XML Scholarly Articles

Maintainer

Scott Chamberlain

Description

Get chunks of XML scholarly articles without having to know how to work with XML. Custom mappers for each publisher and for each article section pull out the information you want. Works with outputs from package fulltext, xml2 package documents, and file paths to XML documents.

View Documentation

cld3

CRAN Staff maintained

Google's Compact Language Detector 3

Maintainer

Jeroen Ooms

Description

Google’s Compact Language Detector 3 is a neural network model for language identification and the successor of cld2 (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from cld2. See https://github.com/google/cld3#readme for more information.

View Documentation

citecorp

CRAN Staff maintained

Client for the Open Citations Corpus

Maintainer

Scott Chamberlain

Description

Client for the Open Citations Corpus (http://opencitations.net/). Includes a set of functions for getting one identifier type from another, as well as getting references and citations for a given identifier.

View Documentation

microdemic

CRAN Staff maintained

Microsoft Academic API Client

Maintainer

Scott Chamberlain

Description

The Microsoft Academic Knowledge API provides programmatic access to scholarly articles in the Microsoft Academic Graph (https://academic.microsoft.com/). Includes methods matching all ‘Microsoft Academic’ API routes, including search, graph search, text similarity, and interpret natural language query string.

View Documentation

cld2

CRAN Staff maintained

Google's Compact Language Detector 2

Maintainer

Jeroen Ooms

Description

Bindings to Google’s C++ library Compact Language Detector 2 (see https://github.com/cld2owners/cld2#readme for more information). Probabilistically detects over 80 languages in plain text or HTML. For mixed-language input it returns the top three detected languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes). There is also a cld3 package on CRAN which uses a neural network model instead.

Scientific use cases

Martín-Martín, A., Orduna-Malea, E., Thelwall, M., & López-Cózar, E. D. (2018). Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories. arXiv preprint arXiv:1808.05053 https://arxiv.org/abs/1808.05053
Albrecht, U.-V., Hasenfuß, G., & von Jan, U. (2018). Description of Cardiological Apps From the German App Store: Semiautomated Retrospective App Store Analysis. JMIR mHealth and uHealth, 6(11), e11753. https://doi.org/10.2196/11753
Green, E. P., Whitcomb, A., Kahumbura, C., Rosen, J. G., Goyal, S., Achieng, D., & Bellows, B. (2019). What is the best method of family planning for me?: a text mining analysis of messages between users and agents of a digital health service in Kenya. Gates Open Research, 3, 1475. https://doi.org/10.12688/gatesopenres.12999.1
Jaric, I., & Djeric, M. (2019). Curriculum and labor market: Comparative analysis of the curricular outcomes of the study program in sociology at the Faculty of Philosophy, University of Belgrade and the required competences in the labor market. Sociologija, 61(Suppl. 1), 718–741. https://doi.org/10.2298/soc19s1718j

View Documentation

rcrossref

CRAN Staff maintained

Client for Various CrossRef APIs

Maintainer

Scott Chamberlain

Description

Client for various CrossRef APIs, including metadata search with their old and newer search APIs, get citations in various formats (including bibtex, citeproc-json, rdf-xml, etc.), convert DOIs to PMIDs, and vice versa, get citations for DOIs, and get links to full text of articles when available.

Scientific use cases

Jahn, N., & Tullney, M. (2016). A study of institutional spending on open access publication fees in Germany. PeerJ, 4, e2323. https://doi.org/10.7717/peerj.2323
Lammey, R. (2016). Using the Crossref Metadata API to explore publisher content. Sci Ed, 3(2), 109–111. https://doi.org/10.6087/kcse.75
Bauer, P. C., Barbera, P., & Munzert, S. (2016). The Quality of Citations: Towards Quantifying Qualitative Impact in Social Science Research. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2874549
Cho, H., & Yu, Y. (2018). Link prediction for interdisciplinary collaboration via co-authorship network. arXiv preprint arXiv:1803.06249. https://arxiv.org/pdf/1803.06249.pdf
Jaspers, S., De Troyer, E., & Aerts, M. (2018). Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA. EFSA Supporting Publications, 15(6), 1427E. https://doi.org/10.2903/sp.efsa.2018.EN-1427
Hicks, D. J., Coil, D. A., Stahmer, C. G., & Eisen, J. A. (2019). Network analysis to evaluate the impact of research funding on research community consolidation. https://doi.org/10.1101/534495
Olsson-Collentine, A., van Assen, M. A. L. M., & Hartgerink, C. H. J. (2019). The Prevalence of Marginally Significant Results in Psychology Over Time. Psychological Science, 095679761983032. https://doi.org/10.1177/0956797619830326
Matthias, L., Jahn, N., & Laakso, M. (2019). The Two-Way Street of Open Access Journal Publishing - Flip It and Reverse It. Publications. 7(2), 23. https://doi.org/10.3390/publications7020023
Mishra, P., & Narayan Tripathi, L. (2019). Characterization of two‐dimensional materials from Raman spectral data. Journal of Raman Spectroscopy. https://doi.org/10.1002/jrs.5744
Fu, D. Y., & Hughey, J. J. (2019). Releasing a preprint is associated with more attention and citations for the peer-reviewed article. eLife, 8. https://doi.org/10.7554/elife.52646
Fraser, N., Momeni, F., Mayr, P., & Peters, I. (2020). The relationship between bioRxiv preprints, citations and altmetrics. Quantitative Science Studies, 1–21. https://doi.org/10.1162/qss_a_00043
Dion, M. L., Mitchell, S. M., & Sumner, J. L. (2020). Gender, seniority, and self-citation practices in political science. Scientometrics, 125(1), 1–28. https://doi.org/10.1007/s11192-020-03615-1
Puschmann, C., & Pentzold, C. (2020). A field comes of age: tracking research on the internet within communication studies, 1994 to 2018. Internet Histories, 1–19. https://doi.org/10.1080/24701475.2020.1749805
Benard, S., & Correll, S. J. (2010). Normative Discrimination and the Motherhood Penalty. Gender & Society, 24(5), 616–646. https://doi.org/10.1177/0891243210383142
Clayson, P. E., Baldwin, S., & Larson, M. J. (2020). The Open Access Advantage for Studies of Human Electrophysiology: Impact on Citations and Altmetrics. https://doi.org/10.31234/osf.io/5xagd

View Documentation

rcitoid

CRAN Staff maintained

Client for Citoid

Maintainer

Scott Chamberlain

Description

Client for Citoid (https://www.mediawiki.org/wiki/Citoid), an API for getting citations for various scholarly work identifiers found on Wikipedia.

View Documentation

europepmc

CRAN Peer-reviewed

R Interface to the Europe PubMed Central RESTful Web Service

Maintainer

Najko Jahn

Description

An R Client for the Europe PubMed Central RESTful Web Service (see https://europepmc.org/RestfulWebService for more information). It gives access to both metadata on life science literature and open access full texts. Europe PMC indexes all PubMed content and other literature sources including Agricola, a bibliographic database of citations to the agricultural literature, or Biological Patents. In addition to bibliographic metadata, the client allows users to fetch citations and reference lists. Links between life-science literature and other EBI databases, including ENA, PDB or ChEMBL are also accessible. No registration or API key is required. See the vignettes for usage examples.

View Documentation

handlr

CRAN Staff maintained

Convert Among Citation Formats

Maintainer

Scott Chamberlain

Description

Converts among many citation formats, including BibTeX, Citeproc, Codemeta, RDF XML, RIS, Schema.org, and Citation File Format. A low level R6 class is provided, as well as stand-alone functions for each citation format for both read and write.

View Documentation

patentsview

CRAN Peer-reviewed

An R Client to the PatentsView API

Maintainer

Christopher Baker

Description

Provides functions to simplify the PatentsView API (http://www.patentsview.org/api/doc.html) query language, send GET and POST requests to the API’s seven endpoints, and parse the data that comes back.

View Documentation

bibtex

CRAN

Bibtex Parser

Maintainer

James Joseph Balamuta

Description

Utility to parse a bibtex file.

View Documentation

oai

CRAN Peer-reviewed Staff maintained

General Purpose Oai-PMH Services Client

Maintainer

Scott Chamberlain

Description

A general purpose client to work with any OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) service. The OAI-PMH protocol is described at http://www.openarchives.org/OAI/openarchivesprotocol.html. Functions are provided to work with the OAI-PMH verbs: GetRecord, Identify, ListIdentifiers, ListMetadataFormats, ListRecords, and ListSets.

Scientific use cases

Peters, I., Kraker, P., Lex, E., Gumpenberger, C., & Gorraiz, J. I. (2017). Zenodo in the Spotlight of Traditional and New Metrics. Frontiers in Research Metrics and Analytics, 2. https://doi.org/10.3389/frma.2017.00013

View Documentation

rromeo

CRAN Peer-reviewed

Access Publisher Copyright & Self-Archiving Policies via the SHERPA/RoMEO API

Maintainer

Matthias Grenié

Description

Fetches information from the SHERPA/RoMEO API http://www.sherpa.ac.uk/romeo/apimanual.php which indexes policies of journal regarding the archival of scientific manuscripts before and/or after peer-review as well as formatted manuscripts.

Scientific use cases

Ashby, M. P. J. (2020, March 6). Three quarters of new criminological knowledge is hidden from policy makers. https://doi.org/10.31235/osf.io/wnq7h
Ashby, M. P. J. (2020). The Open-Access Availability of Criminological Research to Practitioners and Policy Makers. Journal of Criminal Justice Education, 1–21. https://doi.org/10.1080/10511253.2020.1838588

View Documentation

jstor

CRAN Peer-reviewed

Read Data from JSTOR/DfR

Maintainer

Thomas Klebel

Description

Functions and helpers to import metadata, ngrams and full-texts delivered by Data for Research by JSTOR.

View Documentation

unrtf

CRAN Staff maintained

Extract Text from Rich Text Format (RTF) Documents

Maintainer

Jeroen Ooms

Description

Wraps the unrtf utility to extract text from RTF files. Supports document conversion to HTML, LaTeX or plain text. Output in HTML is recommended because unrtf has limited support for converting between character encodings.

View Documentation

textreuse

CRAN Peer-reviewed

Detect Text Reuse and Document Similarity

Maintainer

Lincoln Mullen

Description

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Scientific use cases

Funk, K. R., & Mullen, L. A. (2017). The Spine of American Law: Digital Text Analysis and US Legal Practice. The American Historical Review. https://doi.org/10.1093/ahr/123.1.132
A. Mullen, L., Benoit, K., Keyes, O., Selivanov, D., & Arnold, J. (2018). Fast, Consistent Tokenization of Natural Language Text. Journal of Open Source Software, 3(23), 655. https://doi.org/10.21105/joss.00655
García, F. T., Villalba, L. J. G., Orozco, A. L. S., Ruiz, F. D. A., Juárez, A. A., & Kim, T. H. (2018). Locating similar names through locality sensitive hashing and graph theory. Multimedia Tools and Applications, 1-14. https://link.springer.com/article/10.1007/s11042-018-6375-9
Catalano, J. (2018). Digitally Analyzing the Uneven Ground: Language Borrowing Among Indian Treaties. Current Research in Digital History, 1. https://doi.org/10.31835/crdh.2018.02
Schmidt, B. (2018). Stable random projection: lightweight, general-purpose dimensionality reduction for digitized libraries. Journal of Cultural Analytics. https://doi.org/10.22148/16.025
Sanger, W., & Warin, T. (2019). Dataset of Jaccard similarity indices from 1,597 European political manifestos across 27 countries (1945–2017). Data in Brief, 103907. https://doi.org/10.1016/j.dib.2019.103907
Jaric, I., & Djeric, M. (2019). Curriculum and labor market: Comparative analysis of the curricular outcomes of the study program in sociology at the Faculty of Philosophy, University of Belgrade and the required competences in the labor market. Sociologija, 61(Suppl. 1), 718–741. https://doi.org/10.2298/soc19s1718j
Marple, T. (2020). The social management of complex uncertainty: Central Bank similarity and crisis liquidity swaps at the Federal Reserve. The Review of International Organizations. https://doi.org/10.1007/s11558-020-09378-x
Callaghan, T., Karch, A., & Kroeger, M. (2020). Model State Legislation and Intergovernmental Tensions over the Affordable Care Act, Common Core, and the Second Amendment. Publius: The Journal of Federalism. https://doi.org/10.1093/publius/pjaa012
Vogler, D., Udris, L., & Eisenegger, M. (2020). Measuring Media Content Concentration at a Large Scale Using Automated Text Comparisons. Journalism Studies, 1–20. https://doi.org/10.1080/1461670x.2020.1761865
Vogler, D., & Schäfer, M. S. (2020). Growing Influence of University PR on Science News Coverage? A Longitudinal Automated Content Analysis of University Media Releases and Newspaper Coverage in Switzerland, 2003‒2017. International Journal of Communication, 14, 22. https://ijoc.org/index.php/ijoc/article/download/13498/3113
James, S., Pagliari, S., & Young, K. L. (2020). The internationalization of European financial networks: a quantitative text analysis of EU consultation responses. Review of International Political Economy, 1–28. https://doi.org/10.1080/09692290.2020.1779781
Hansen, E. R., & Jansa, J. M. (2020). Complexity, Resources, and Text Borrowing in State Legislatures. http://ehansen4.sites.luc.edu/documents/Hansen_Jansa_Complexity.pdf

View Documentation

rtika

CRAN Peer-reviewed

R Interface to Apache Tika

Maintainer

Sasha Goodman

Description

Extract text or metadata from over a thousand file types, using Apache Tika https://tika.apache.org/. Get either plain text or structured XHTML content.

View Documentation

googleLanguageR

CRAN Peer-reviewed

Call Googles Natural Language API, Cloud Translation' API, Cloud Speech API and Cloud Text-to-Speech API

Maintainer

Mark Edmondson

Description

Call Google Cloud machine learning APIs for text and speech tasks. Call the Cloud Translation API https://cloud.google.com/translate/ for detection and translation of text, the Natural Language API https://cloud.google.com/natural-language/ to analyse text for sentiment, entities or syntax, the Cloud Speech API https://cloud.google.com/speech/ to transcribe sound files to text and the Cloud Text-to-Speech API https://cloud.google.com/text-to-speech/ to turn text into sound files.

View Documentation

refsplitr

Peer-reviewed

author name disambiguation, author georeferencing, and mapping of coauthorship networks with Web of Science data

Maintainer

Emilio Bruna

Description

Tools to parse and organize reference records downloaded from the Web of Science citation database into an R-friendly format, disambiguate the names of authors, geocode their locations, and generate/visualize coauthorship networks. This package has been peer-reviewed by rOpenSci (v. 1.0).

Scientific use cases

Hazlett, M. A., Henderson, K. M., Zeitzer, I. F., & Drew, J. A. (2020). The geography of publishing in the Anthropocene. Conservation Science and Practice, 2(10). https://doi.org/10.1111/csp2.270
Smith, T. B., Vacca, R., Krenz, T., & McCarty, C. (2021). Great minds think alike, or do they often differ? Research topic overlap and the formation of scientific teams. Journal of Informetrics, 15(1), 101104. https://doi.org/10.1016/j.joi.2020.101104

View Documentation

seasl

Staff maintained

Citation Style Language (CSL) Utilities

Maintainer

Scott Chamberlain

Description

Tools for working with the Citation Style Language (CSL) (https://citationstyles.org), an XML-based format describing the formatting of citations, notes and bibliographies. Functions are included for downloading and searching for styles and locales, and loading and parsing styles and locales. seasl aims to help users fetch and modify CSL files for work combining code and writing that requires citations.

View Documentation

tif

Text Interchange Format

Maintainer

Taylor Arnold

Description

Provides validation functions for common interchange formats for representing text data in R. Includes formats for corpus objects, document term matrices, and tokens. Other annotations can be stored by overloading the tokens structure.

View Documentation

tidypmc

CRAN

Parse Full Text XML Documents from PubMed Central

Maintainer

Chris Stubben

Description

Parse XML documents from the Open Access subset of Europe PubMed Central https://europepmc.org including section paragraphs, tables, captions and references.

View Documentation

refimpact

Peer-reviewed

API Wrapper for the UK REF 2014 Impact Case Studies Database

Maintainer

Perry Stephenson

Description

Provides wrapper functions around the UK Research Excellence Framework 2014 Impact Case Studies Database API http://impact.ref.ac.uk/. The database contains relevant publication and research metadata about each case study as well as several paragraphs of text from the case study submissions. Case studies in the database are licenced under a CC-BY 4.0 licence http://creativecommons.org/licenses/by/4.0/legalcode.

View Documentation

rAltmetric

CRAN Staff maintained

Retrieves Altmerics Data for Any Published Paper from Altmetric.com

Maintainer

Karthik Ram

Description

Provides a programmatic interface to the citation information and alternate metrics provided by Altmetric. Data from Altmetric allows researchers to immediately track the impact of their published work, without having to wait for citations. This allows for faster engagement with the audience interested in your work. For more information, visit https://www.altmetric.com/.

Scientific use cases

Madden, K., Evaniew, N., Scott, T., Domazetoska, E., Dosanjh, P., Li, C. S., … Sprague, S. (2016). Knowledge Dissemination of Intimate Partner Violence Intervention Studies Measured Using Alternative Metrics Results From a Scoping Review. Journal of Interpersonal Violence. https://doi.org/10.1177/0886260516657914
Na, J.-C., & Ye, Y. E. (2017). Content Analysis of Scholarly Discussions of Psychological Academic Articles on Facebook. Online Information Review, 41(3). https://doi.org/10.1108/oir-02-2016-0058
Ruano, J., Aguilar-Luque, M., Gómez-Garcia, F., Alcalde Mellado, P., Gay-Mimbrera, J., Carmona-Fernandez, P. J., … Isla-Tejera, B. (2018). The differential impact of scientific quality, bibliometric factors, and social media activity on the influence of systematic reviews and meta-analyses about psoriasis. PLOS ONE, 13(1), e0191124. https://doi.org/10.1371/journal.pone.0191124
Nabout, J. C., Teresa, F. B., Machado, K. B., do Prado, V. H. M., Bini, L. M., & Diniz-Filho, J. A. F. (2018). Do traditional scientometric indicators predict social media activity on scientific knowledge? An analysis of the ecological literature. Scientometrics. https://doi.org/10.1007/s11192-018-2678-x
Araujo, R. F., & Alves, M. (2018). The altmetric performance of publications authored by Brazilian researchers: analysis of CNPq productivity scholarship holders. arXiv preprint arXiv:1807.06366. https://arxiv.org/abs/1807.06366
Sun, Z., Cang, J., Ruan, Y., & Zhu, D. (2019). Reporting gaps between news media and scientific papers on outdoor air pollution–related health outcomes: A content analysis. The International Journal of Health Planning and Management. https://doi.org/10.1002/hpm.2894
Fu, D. Y., & Hughey, J. J. (2019). Releasing a preprint is associated with more attention and citations for the peer-reviewed article. eLife, 8. https://doi.org/10.7554/elife.52646
Clayson, P. E., Baldwin, S., & Larson, M. J. (2020). The Open Access Advantage for Studies of Human Electrophysiology: Impact on Citations and Altmetrics. https://doi.org/10.31234/osf.io/5xagd

View Documentation

aRxiv

CRAN

Interface to the arXiv API

Maintainer

Karl Broman

Description

An interface to the API for arXiv (https://arxiv.org), a repository of electronic preprints for computer science, mathematics, physics, quantitative biology, quantitative finance, and statistics.

Scientific use cases

Jaspers, S., De Troyer, E., & Aerts, M. (2018). Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA. EFSA Supporting Publications, 15(6), 1427E. https://doi.org/10.2903/sp.efsa.2018.EN-1427

View Documentation

antiword

CRAN Staff maintained

Extract Text from Microsoft Word Documents

Maintainer

Jeroen Ooms

Description

Wraps the AntiWord utility to extract text from Microsoft Word documents. The utility only supports the old doc format, not the new xml based docx format. Use the xml2 package to read the latter.

View Documentation

tabulizer

CRAN Peer-reviewed

Bindings for Tabula PDF Table Extractor Library

Maintainer

Tom Paskhalis

Description

Bindings for the Tabula http://tabula.technology/ Java library, which can extract tables from PDF documents. The tabulizerjars package https://github.com/ropensci/tabulizerjars provides versioned Java .jar files, including all dependencies, aligned to releases of Tabula.

Scientific use cases

Baquero, O. S., & Machado, G. (2018). Spatiotemporal dynamics and risk factors for human Leptospirosis in Brazil. Scientific Reports, 8(1). https://doi.org/10.1038/s41598-018-33381-3
Prats, J., & Danis, P.-A. (2019). An epilimnion and hypolimnion temperature model based on air temperature and lake characteristics. Knowledge & Management of Aquatic Ecosystems, (420), 8. https://doi.org/10.1051/kmae/2019001

View Documentation

IEEER

Interface to the IEEE Xplore Gateway

Maintainer

Saul Wiggin

Description

An interface to the IEEE Xplore Gateway, for searching IEEE publications.

View Documentation