rOpenSci | Computing Infrastructure

Computing Infrastructure

Workflow Tools for Your Code and Data
Showing 10 of 12
tarchetypes
CRAN Peer-reviewed

Archetypes for Targets

William Michael Landau
Description

Function-oriented Make-like declarative workflows for Statistics and data science are supported in the targets R package. As an extension to targets, the tarchetypes package provides convenient user-side functions to make targets easier to use. By establishing reusable archetypes for common kinds of targets and pipelines, these functions help express complicated reproducible workflows concisely and compactly. The methods in this package were influenced by the drake R package by Will Landau (2018) doi:10.21105/joss.00550.

View Documentation

Dynamic Function-Oriented Make-Like Declarative Workflows

William Michael Landau
Description

As a pipeline toolkit for Statistics and data science in R, the targets package brings together function-oriented programming and Make-like declarative workflows. It analyzes the dependency relationships among the tasks of a workflow, skips steps that are already up to date, runs the necessary computation with optional parallel workers, abstracts files as R objects, and provides tangible evidence that the results match the underlying code and data. The methodology in this package borrows from GNU Make (2015, ISBN:978-9881443519) and drake (2018, doi:10.21105/joss.00550).

View Documentation

Casts (R)Markdown Files to XML and Back Again

Maëlle Salmon
Description

Casts (R)Markdown files to XML and back to allow their editing via XPath.

View Documentation

Helper for rOpenSci Package Developpers

Maëlle Salmon
Description

Provides helpers for rOpenSci package developpers, mostly helping with metadata management (badges, DESCRIPTION) and GitHub infrastructure (GitHub issue and PR templates).

View Documentation

Simple Git Client for R

Jeroen Ooms
Description

Simple git client for R based on libgit2 with support for SSH and HTTPS remotes. All functions in gert use basic R data types (such as vectors and data-frames) for their arguments and return values. User credentials are shared with command line git through the git-credential store and ssh keys stored on disk or ssh-agent.

View Documentation
beautier
CRAN Peer-reviewed

BEAUti from R

Richèl J.C. Bilderbeek
Description

BEAST2 (https://www.beast2.org) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAUti 2 (which is part of BEAST2) is a GUI tool that allows users to specify the many possible setups and generates the XML file BEAST2 needs to run. This package provides a way to create BEAST2 input files without active user input, but using R function calls instead.

View Documentation

CI-Agnostic Workflow Definitions

Kirill Müller
Description

Provides a way to describe common build and deployment workflows for R-based projects: packages, websites (e.g. blogdown, pkgdown), or data processing (e.g. research compendia). The recipe is described independent of the continuous integration tool used for processing the workflow (e.g. GitHub Actions or Circle CI). This package has been peer-reviewed by rOpenSci (v0.3.0.9004).

View Documentation

Compact and Flexible Summaries of Data

Elin Waring
Description

A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the “Using skimr” vignette and the README.

Scientific use cases
  1. Sinval, J., Marques-Pinto, A., Queirós, C., & Marôco, J. (2018). Work Engagement among Rescue Workers: Psychometric Properties of the Portuguese UWES. Frontiers in Psychology, 8. https://doi.org/10.3389/fpsyg.2017.02229
  2. Sinval, J., Pasian, S., Queirós, C., & Marôco, J. (2018). Brazil-Portugal Transcultural Adaptation of the UWES-9: Internal Consistency, Dimensionality, and Measurement Invariance. Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.00353
  3. Almeida, L. S., Pérez Fuentes, M. del C., Casanova, J. R., Gázquez Linares, J. J., & Molero Jurado, M. del M. (2018). Alcohol Expectancy-Adolescent Questionnaire (AEQ-AB): Validation for portuguese college students. Health and Addictions/Salud y Drogas, 18(2), 155. https://doi.org/10.21134/haaj.v18i2.389
  4. António, N., de Almeida, A., & Nunes, L. (2018). Hotel booking demand datasets. Data in Brief. https://doi.org/10.1016/j.dib.2018.11.126
  5. Sinval, J., Casanova, J. R., Marôco, J., & Almeida, L. S. (2018). University student engagement inventory (USEI): Psychometric properties. Current Psychology. https://doi.org/10.1007/s12144-018-0082-6
  6. Rodrigues, S., Sinval, J., Queirós, C., Marôco, J., & Kaiseler, M. (2019). Transitioning from recruit to officer: An investigation of how stress appraisal and coping influence work engagement. International Journal of Selection and Assessment. https://doi.org/10.1111/ijsa.12238
  7. Sinval, J., Sirgy, M. J., Lee, D.-J., & Marôco, J. (2019). The Quality of Work Life Scale: Validity Evidence from Brazil and Portugal. Applied Research in Quality of Life. https://doi.org/10.1007/s11482-019-09730-3
  8. Nalborczyk, L., Grandchamp, R., Koster, E. H. W., Perrone-Bertolotti, M., & Loevenbruck, H. (2019). Can we decode phonetic features in inner speech using surface electromyography? https://doi.org/10.31234/osf.io/8v5yd
  9. Correia, C. N., McLoughlin, K. E., Nalpas, N. C., Magee, D. A., Browne, J. A., Rue-Albrecht, K., … MacHugh, D. E. (2018). RNA Sequencing (RNA-Seq) Reveals Extremely Low Levels of Reticulocyte-Derived Globin Gene Transcripts in Peripheral Blood From Horses (Equus caballus) and Cattle (Bos taurus). Frontiers in Genetics, 9. https://doi.org/10.3389/fgene.2018.00278
  10. Long, J. D., & Turner, D. (2020). Applied R in the Classroom. Australian Economic Review, 53(1), 139–157. https://doi.org/10.1111/1467-8462.12362
  11. Sinval, J., & Marôco, J. (2020). Short Index of Job Satisfaction: Validity evidence from Portugal and Brazil. PLOS ONE, 15(4), e0231474. https://doi.org/10.1371/journal.pone.0231474
  12. Lam, K.-L., Cheng, W.-Y., Su, Y., Li, X., Wu, X., Wong, K.-H., … Cheung, P. C.-K. (2020). Use of random forest analysis to quantify the importance of the structural characteristics of beta-glucans for prebiotic development. Food Hydrocolloids, 108, 106001. https://doi.org/10.1016/j.foodhyd.2020.106001
  13. McKnelly, K. J., Howitz, W. J., Lam, S., & Link, R. D. (2020). Extraction on Paper Activity: An Active Learning Technique to Facilitate Student Understanding of Liquid–Liquid Extraction. Journal of Chemical Education, 97(7), 1960–1965. https://doi.org/10.1021/acs.jchemed.9b00975
  14. Behrendt, I., Fasshauer, M., & Eichner, G. (2020). Gluten intake and metabolic health: conflicting findings from the UK Biobank. European Journal of Nutrition. https://doi.org/10.1007/s00394-020-02351-9
  15. Aragão e Pina, J., Passos, A. M., Maynard, M. T., & Sinval, J. (2021). Self-efficacy, mental models and team adaptation: A first approach on football and futsal refereeing. Psychology of Sport and Exercise, 52, 101787. https://doi.org/10.1016/j.psychsport.2020.101787
  16. España, S., Ochoa de Olza, M., Sala, N., Piulats, J. M., Ferrandiz, U., Etxaniz, O., … Font, A. (2020). PSA Kinetics as Prognostic Markers of Overall Survival in Patients with Metastatic Castration-Resistant Prostate Cancer Treated with Abiraterone Acetate. Cancer Management and Research, Volume 12, 10251–10260. https://doi.org/10.2147/cmar.s270392
  17. Wadley, A. L., Venter, W. D. F., Moorhouse, M., Akpomiemie, G., Serenata, C., Hill, A., … Kamerman, P. R. (2020). High individual pain variability in people living with HIV: A graphical analysis. European Journal of Pain, 25(1), 160–170. https://doi.org/10.1002/ejp.1658
  18. Wadley AL, Venter WDF, Moorhouse M, Akpomiemie G, Serenata C, Hill A, Sokhela S, Mqamelo N, Kamerman PR. High individual pain variability in people living with HIV: A graphical analysis. Eur J Pain 2020. https://doi.org/10.1002/ejp.1658
  19. Schrag, N. F. D., Apley, M. D., Godden, S. M., Lubbers, B. V., & Singer, R. S. (2020). Antimicrobial use quantification in adult dairy cows – Part 1 – Standardized regimens as a method for describing antimicrobial use. Zoonoses and Public Health, 67(S1), 51–68. https://doi.org/10.1111/zph.12766
  20. Nopp-Mayr, U., Reimoser, S., Reimoser, F., Sachser, F., Obermair, L., & Gratzer, G. (2020). Analyzing long-term impacts of ungulate herbivory on forest-recruitment dynamics at community and species level contrasting tree densities versus maximum heights. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-76843-3
View Documentation
beastier
CRAN Peer-reviewed

Call BEAST2

Richèl J.C. Bilderbeek
Description

BEAST2 (https://www.beast2.org) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAST2 is a command-line tool. This package provides a way to call BEAST2 from an R function call.

View Documentation

Working with Sets the Tidy Way

Lluís Revilla Sancho
Description

Implements a class and methods to work with sets, doing intersection, union, complementary sets, power sets, cartesian product and other set operations in a “tidy” way. These set operations are available for both classical sets and fuzzy sets. Import sets from several formats or from other several data structures.

View Documentation
ruODK

An R Client for the ODK Central API

Florian W. Mayer
Description

Access and tidy up data from the ODK Central API.
ODK Central is a clearinghouse for digitally captured data https://docs.getodk.org/central-intro/. The ODK Central API is documented at https://odkcentral.docs.apiary.io/.

View Documentation
RefManageR
CRAN Peer-reviewed

Straightforward BibTeX and BibLaTeX Bibliography Management

Mathew W. McLean
Description

Provides tools for importing and working with bibliographic references. It greatly enhances the bibentry class by providing a class BibEntry which stores BibTeX and BibLaTeX references, supports UTF-8 encoding, and can be easily searched by any field, by date ranges, and by various formats for name lists (author by last names, translator by full names, etc.). Entries can be updated, combined, sorted, printed in a number of styles, and exported. BibTeX and BibLaTeX .bib files can be read into R and converted to BibEntry objects. Interfaces to NCBI Entrez, CrossRef, and Zotero are provided for importing references and references can be created from locally stored PDF files using Poppler. Includes functions for citing and generating a bibliography with hyperlinks for documents prepared with RMarkdown or RHTML.

View Documentation
chlorpromazineR
CRAN

Convert Antipsychotic Doses to Chlorpromazine Equivalents

Eric Brown
Description

As different antipsychotic medications have different potencies, the doses of different medications cannot be directly compared. Various strategies are used to convert doses into a common reference so that comparison is meaningful. Chlorpromazine (CPZ) has historically been used as a reference medication into which other antipsychotic doses can be converted, as “chlorpromazine-equivalent doses”. Using conversion keys generated from widely-cited scientific papers, e.g. Gardner et. al 2010 doi:10.1176/appi.ajp.2009.09060802 and Leucht et al. 2016 doi:10.1093/schbul/sbv167, antipsychotic doses are converted to CPZ (or any specified antipsychotic) equivalents. The use of the package is described in the included vignette. Not for clinical use.

Scientific use cases
  1. Kim, J., Plitman, E., Iwata, Y., Nakajima, S., Mar, W., Patel, R., … Graff-Guerrero, A. (2020). Neuroanatomical profiles of treatment-resistance in patients with schizophrenia spectrum disorders. Progress in Neuro-Psychopharmacology and Biological Psychiatry, 99, 109839. https://doi.org/10.1016/j.pnpbp.2019.109839
View Documentation
jsonvalidate
CRAN

Validate JSON Schema

Rich FitzJohn
Description

Uses the node library is-my-json-valid or ajv to validate JSON against a JSON schema. Drafts 04, 06 and 07 of JSON schema are supported.

View Documentation

A Pipeline Toolkit for Reproducible Computation at Scale

William Michael Landau
Description

A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website https://docs.ropensci.org/drake/ and the online manual https://books.ropensci.org/drake/.

View Documentation

Make Fake Data

Scott Chamberlain
Description

Make fake data, supporting addresses, person names, dates, times, colors, coordinates, currencies, digital object identifiers (DOIs), jobs, phone numbers, DNA sequences, doubles and integers from distributions and within a range.

View Documentation
mauricer
CRAN Peer-reviewed

Install BEAST2 Packages

Richèl J.C. Bilderbeek
Description

BEAST2 (https://www.beast2.org) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAST2 is commonly accompanied by BEAUti 2 (https://www.beast2.org), which, among others, allows one to install BEAST2 package. This package allows to install BEAST2 packages from R.

View Documentation

Model Comparison Using babette

Richèl J.C. Bilderbeek
Description

BEAST2 (https://www.beast2.org) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. mcbette allows to do a Bayesian model comparison over some site and clock models, using babette (https://github.com/ropensci/babette/).

View Documentation

Setup, Run and Analyze NetLogo Model Simulations from R via XML

Jan Salecker
Description

Setup, run and analyze NetLogo (https://ccl.northwestern.edu/netlogo/) model simulations in R. nlrx experiments use a similar structure as NetLogos Behavior Space experiments. However, nlrx offers more flexibility and additional tools for running and analyzing complex simulation designs and sensitivity analyses. The user defines all information that is needed in an intuitive framework, using class objects. Experiments are submitted from R to NetLogo via XML files that are dynamically written, based on specifications defined by the user. By nesting model calls in future environments, large simulation design with many runs can be executed in parallel. This also enables simulating NetLogo experiments on remote high performance computing machines. In order to use this package, Java and NetLogo (>= 5.3.1) need to be available on the executing system.

Scientific use cases
  1. Kaaronen, R. O., & Strelkovskii, N. (2019). Cultural Evolution of Sustainable Behaviours: Pro-Environmental Tipping Points in an Agent-Based Model. https://doi.org/10.31234/osf.io/w6dpa
  2. Wesener, F., Szymczak, A., Rillig, M. C., & Tietjen, B. (2020). Stress priming affects fungal competition – evidence from a combined experimental and modeling study. https://doi.org/10.1101/2020.03.04.976357
  3. Adams, R. I., Bhangar, S., Dannemiller, K. C., Eisen, J. A., Fierer, N., Gilbert, J. A., … Bibby, K. (2016). Ten questions concerning the microbiomes of buildings. Building and Environment, 109, 224–234. https://doi.org/10.1016/j.buildenv.2016.09.001
  4. D’Orazio, M., Bernardini, G., & Quagliarini, E. (2020). Sustainable and resilient strategies for touristic cities against COVID-19: an agent-based approach. arXiv preprint arXiv:2005.12547. https://arxiv.org/pdf/2005.12547.pdf
  5. Kopp, T., & Salecker, J. (2020). How traders influence their neighbours: Modelling social evolutionary processes and peer effects in agricultural trade networks. Journal of Economic Dynamics and Control, 117, 103944. https://doi.org/10.1016/j.jedc.2020.103944
  6. Azizi, A., Mubayi, A., & Mubayi, A. (2020). The Impact of Individual’s Ecological Factors on the Dynamics of Alcohol Drinking among Arizona State University Students: An Application of the Survey Data-driven Agent-based Model. arXiv preprint arXiv:2011.01876 https://arxiv.org/abs/2011.01876.
  7. Widyastuti, K., Imron, M. A., Pradopo, S. T., Suryatmojo, H., Sopha, B. M., Spessa, A., & Berger, U. (2020). PeatFire: an agent-based model to simulate fire ignition and spreading in a tropical peatland ecosystem. International Journal of Wildland Fire. https://doi.org/10.1071/wf19213
View Documentation
assertr
CRAN

Assertive Programming for R Analysis Pipelines

Tony Fischetti
Description

Provides functionality to assert conditions that have to be met so that errors in data used in analysis pipelines can fail quickly. Similar to stopifnot() but more powerful, friendly, and easier for use in pipelines.

Scientific use cases
  1. Petersen, A. H., & Ekstrøm, C. T. (2019). dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R. Journal of Statistical Software, 90(6). https://doi.org/10.18637/jss.v090.i06
  2. van der Loo, M. P., & de Jonge, E. (2019). Data Validation Infrastructure for R. arXiv preprint arXiv:1912.09759. https://arxiv.org/pdf/1912.09759.pdf
  3. Brick, C., McDowell, M., & Freeman, A. L. J. (2020). Risk communication in tables versus text: a registered report randomized trial on “fact boxes.” Royal Society Open Science, 7(3), 190876. https://doi.org/10.1098/rsos.190876
  4. Goel, A., & Vitek, J. (2019). On the design, implementation, and use of laziness in R. Proceedings of the ACM on Programming Languages, 3(OOPSLA), 1–27. doi:10.1145/3360579
View Documentation
timefuzz
Staff maintained

Time Travel to Test Time Dependent Code

Scott Chamberlain
Description

Time travel to test time dependent code.

View Documentation
pendulum
Staff maintained

Time Classes

Scott Chamberlain
Description

Time classes, with hooks for mocking time.

View Documentation

Control BEAST2

Richèl J.C. Bilderbeek
Description

BEAST2 (https://www.beast2.org) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAST2 is commonly accompanied by BEAUti 2, Tracer and DensiTree. babette provides for an alternative workflow of using all these tools separately. This allows doing complex Bayesian phylogenetics easily and reproducibly from R.

View Documentation
git2r
CRAN

Provides Access to Git Repositories

Stefan Widgren
Description

Interface to the libgit2 library, which is a pure C implementation of the Git core methods. Provides access to Git repositories to extract data and running some basic Git commands.

Scientific use cases
  1. Blischak, J. D., Carbonetto, P., & Stephens, M. (2019). Creating and sharing reproducible research code the workflowr way. F1000Research, 8, 1749. https://doi.org/10.12688/f1000research.20843.1
View Documentation
bowerbird
Peer-reviewed

Keep a Collection of Sparkly Data Resources

Ben Raymond
Description

Tools to get and maintain a data repository from third-party data providers.

View Documentation
ezknitr
CRAN

Avoid the Typical Working Directory Pain When Using knitr

Dean Attali
Description

An extension of knitr that adds flexibility in several ways. One common source of frustration with knitr is that it assumes the directory where the source file lives should be the working directory, which is often not true. ezknitr addresses this problem by giving you complete control over where all the inputs and outputs are, and adds several other convenient features to make rendering markdown/HTML documents easier.

View Documentation

R Bindings for ZeroMQ

Jeroen Ooms
Description

Interface to the ZeroMQ lightweight messaging kernel (see http://www.zeromq.org/ for more information).

View Documentation

Read, Tidy, and Display Data from Microtiter Plates

Sean Hughes
Description

Tools for interacting with data from experiments done in microtiter plates. Easily read in plate-shaped data and convert it to tidy format, combine plate-shaped data with tidy data, and view tidy data in plate shape.

View Documentation

Interface to the Open Science Framework (OSF)

Aaron Wolen
Description

An interface for interacting with OSF (https://osf.io). osfr enables you to access open research materials and data, or create and manage your own private or public projects.

Scientific use cases
  1. Corput, D. V. D. (2020). Locked in Syndrome Machine Learning Classification using Sentence Comprehension EEG Data. arXiv preprint arXiv:2006.12336 https://arxiv.org/pdf/2006.12336.pdf
View Documentation
outcomerate
CRAN Peer-reviewed

AAPOR Survey Outcome Rates

Rafael Pilliard Hellwig
Description

Standardized survey outcome rate functions, including the response rate, contact rate, cooperation rate, and refusal rate. These outcome rates allow survey researchers to measure the quality of survey data using definitions published by the American Association of Public Opinion Research (AAPOR). For details on these standards, see AAPOR (2016) https://www.aapor.org/Standards-Ethics/Standard-Definitions-(1).aspx.

View Documentation
outsider
CRAN Peer-reviewed

Install and Run Programs, Outside of R, Inside of R

Dom Bennett
Description

Install and run external command-line programs in R through use of Docker https://www.docker.com/ and online repositories.

View Documentation
tracerer
CRAN Peer-reviewed

Tracer from R

Richèl J.C. Bilderbeek
Description

BEAST2 (https://www.beast2.org) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. Tracer (https://tree.bio.ed.ac.uk/software/tracer/) is a GUI tool to parse and analyze the files generated by BEAST2. This package provides a way to parse and analyze BEAST2 input files without active user input, but using R function calls instead.

View Documentation
gitignore
CRAN Peer-reviewed

Create Useful .gitignore Files for your Project

Philippe Massicotte
Description

Simple interface to query gitignore.io to fetch gitignore templates that can be included in the .gitignore file. More than 450 templates are currently available.

View Documentation
baRcodeR
CRAN Peer-reviewed

Label Creation for Tracking and Collecting Data from Biological Samples

Yihan Wu
Description

Tools to generate unique identifier codes and printable barcoded labels for the management of biological samples. The creation of unique ID codes and printable PDF files can be initiated by standard commands, user prompts, or through a GUI addin for R Studio. Biologically informative codes can be included for hierarchically structured sampling designs.

Scientific use cases
  1. Walker, V. K., Das, P., Li, P., Lougheed, S. C., Moniz, K., Schott, S., … Koch, I. (2020). Identification of Arctic Food Fish Species for Anthropogenic Contaminant Testing Using Geography and Genetics. Foods, 9(12), 1824. https://doi.org/10.3390/foods9121824
View Documentation

Simple Jenkins Client for R

Jeroen Ooms
Description

Manage jobs and builds on your Jenkins CI server https://jenkins.io/. Create and edit projects, schedule builds, manage the queue, download build logs, and much more.

View Documentation

Manage Cached Files

Scott Chamberlain
Description

Suite of tools for managing cached files, targeting use in other R packages. Uses rappdirs for cross-platform paths. Provides utilities to manage cache directories, including targeting files by path or by key; cached directories can be compressed and uncompressed easily to save disk space.

View Documentation
datapack
CRAN

A Flexible Container to Transport and Manipulate Data and Associated Resources

Matthew B. Jones
Description

Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI- ORE standard is described at https://www.openarchives.org/ore/. Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at https://tools.ietf.org/html/draft-kunze-bagit-08.

View Documentation
conditionz
CRAN Staff maintained

Control How Many Times Conditions are Thrown

Scott Chamberlain
Description

Provides ability to control how many times in function calls conditions are thrown (shown to the user). Includes control of warnings and messages.

View Documentation

Client for the cranchecks.info API

Scott Chamberlain
Description

Client for the cranchecks.info API.

View Documentation
DataPackageR
CRAN Peer-reviewed

Construct Reproducible Analytic Data Sets as R Packages

Greg Finak
Description

A framework to help construct R data packages in a reproducible manner. Potentially time consuming processing of raw data sets into analysis ready data sets is done in a reproducible manner and decoupled from the usual R CMD build process so that data sets can be processed into R objects in the data package and the data package can then be shared, built, and installed by others without the need to repeat computationally costly data processing. The package maintains data provenance by turning the data processing scripts into package vignettes, as well as enforcing documentation and version checking of included data objects. Data packages can be version controlled in github, and used to share data for manuscripts, collaboration and general reproducibility.

Scientific use cases
  1. Finak, G., Mayer, B., Fulp, W., Obrecht, P., Sato, A., Chung, E., … Gottardo, R. (2018). DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis. Gates Open Research, 2, 31. https://doi.org/10.12688/gatesopenres.12832.2
View Documentation
staypuft
Staff maintained

Convert Complex Objects to and from R Data Structures

Scott Chamberlain
Description

Convert complex objects to and from R data structures.

View Documentation

Work with GitHub Gists

Scott Chamberlain
Description

Work with GitHub gists from R (e.g., https://en.wikipedia.org/wiki/GitHub#Gist, https://docs.github.com/en/github/writing-on-github/creating-gists/). A gist is simply one or more files with code/text/images/etc. This package allows the user to create new gists, update gists with new files, rename files, delete files, get and delete gists, star and un-star gists, fork gists, open a gist in your default browser, get embed code for a gist, list gist commits, and get rate limit information when authenticated. Some requests require authentication and some do not. Gists website: https://gist.github.com/.

View Documentation
credentials
CRAN Staff maintained

Tools for Managing SSH and Git Credentials

Jeroen Ooms
Description

Setup and retrieve HTTPS and SSH credentials for use with git and other services. For HTTPS remotes the package interfaces the git-credential utility which git uses to store HTTP usernames and passwords. For SSH remotes we provide convenient functions to find or generate appropriate SSH keys. The package both helps the user to setup a local git installation, and also provides a back-end for git/ssh client libraries to authenticate with existing user credentials.

View Documentation
tokenizers
CRAN Peer-reviewed

Fast, Consistent Tokenization of Natural Language Text

Lincoln Mullen
Description

Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, tweets, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the stringi and Rcpp packages for fast yet correct tokenization in UTF-8.

Scientific use cases
  1. A. Mullen, L., Benoit, K., Keyes, O., Selivanov, D., & Arnold, J. (2018). Fast, Consistent Tokenization of Natural Language Text. Journal of Open Source Software, 3(23), 655. https://doi.org/10.21105/joss.00655
  2. Pajo, J. (2018). Quantitative Falsification for Qualitative Findings. Social Science Computer Review, 089443931876795. https://doi.org/10.1177/0894439318767956
  3. Casey, Jerome (2018). Text Analytics Techniques in the Digital World: a Sentiment Analysis Case Study of the Coverage of Climate Change on US News Networks. Irish Communication Review: Vol. 16: Iss. 1, Article 7. https://arrow.dit.ie/icr/vol16/iss1/7
  4. Gye-Soo, K. 2018. Text Mining and Big Data Analysis in the Relational Database with R. International Journal of Trend in Research and Development. 4(5): 384-386. http://www.ijtrd.com/papers/IJTRD12170.pdf
  5. Ficcadenti, V., Cerqueti, R., & Ausloos, M. (2019). A joint text mining-rank size investigation of the rhetoric structures of the US Presidents’ speeches. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2018.12.049
  6. Calderone, A. (2019). A Computational Analysis of Natural Languages to Build a Sentence Structure Aware Artificial Neural Network. arXiv preprint arXiv:1906.05491 https://arxiv.org/pdf/1906.05491.pdf
  7. Ulibarri, N., & Scott, T. A. (2019). Environmental hazards, rigid institutions, and transformative change: How drought affects the consideration of water and climate impacts in infrastructure management. Global Environmental Change, 59, 102005. https://doi.org/10.1016/j.gloenvcha.2019.102005
  8. Claes, M., & Mäntylä, M. (2020). 20-MAD–20 Years of Issues and Commits of Mozilla and Apache Development. arXiv preprint arXiv:2003.14015. https://arxiv.org/pdf/2003.14015.pdf
  9. Scott, T. A., Ulibarri, N., & Perez Figueroa, O. (2020). NEPA and National Trends in Federal Infrastructure Siting in the United States. Review of Policy Research. https://doi.org/10.1111/ropr.12399
  10. Grassl, P., Schraffenberger, H., Zuiderveen Borgesius, F., & Buijzen, M. (2020, July 21). Dark and bright patterns in cookie consent requests. https://doi.org/10.31234/osf.io/gqs5h
  11. López Galán, A., Chung, W.-S., & Marshall, N. J. (2020). Dynamic Courtship Signals and Mate Preferences in Sepia plangon. Frontiers in Physiology, 11. https://doi.org/10.3389/fphys.2020.00845
  12. Brandão, L. A. C., Agrelli, A., Bernardo, L., Paparella, F., Moura, R., & Crovella, S. (2020). PlatCOVID: A Novel Web Tool to Analyze, Curate and Share COVID-19 Literature. doi:10.21203/rs.3.rs-42169/v1
View Documentation
outsider.base
CRAN

Base Package for Outsider

Dom Bennett
Description

Base package for outsider https://github.com/ropensci/outsider. The outsider package and its sister packages enable the installation and running of external, command-line software within R. This base package is a key dependency of the user-facing outsider package as it provides the utilities for interfacing between Docker https://www.docker.com and R. It is intended that end-users of outsider do not directly work with this base package.

View Documentation
Rclean
Peer-reviewed

A Tool for Writing Cleaner, More Transparent Code

Matthew Lau
Description

To create clearer, more concise code provides this toolbox helps coders to isolate the essential parts of a script that produces a chosen result, such as an object, tables and figures written to disk.

View Documentation
outsider.devtools

Build outsider Modules

Dom Bennett
Description

Developer functions and resources for building outsider modules.

View Documentation