rOpenSci | Blog

All posts (Page 50 of 80)

rtimicropem: Using an R package as platform for harmonized cleaning of data from RTI MicroPEM air quality sensors

As you might remember from my blog post about ropenaq, I work as a data manager and statistician for an epidemiology project called CHAI for Cardio-vascular health effects of air pollution in Telangana, India. One of our interests in CHAI is determining exposure, and sources of exposure, to PM2.5 which are very small particles in the air that have diverse adverse health effects. You can find more details about CHAI in our recently published protocol paper....

FedData - Getting assorted geospatial data into R

The package FedData has gone through software review and is now part of rOpenSci. FedData includes functions to automate downloading geospatial data available from several federated data sources (mainly sources maintained by the US Federal government). Currently, the package enables extraction from six datasets: The National Elevation Dataset (NED) digital elevation models (1 and 1/3 arc-second; USGS) The National Hydrography Dataset (NHD) (USGS) The Soil Survey Geographic (SSURGO) database from the National Cooperative Soil Survey (NCSS), which is led by the Natural Resources Conservation Service (NRCS) under the USDA, NA The Daymet gridded estimates of daily weather parameters for North America, version 3, available from the Oak Ridge National Laboratory’s Distributed Active Archive Center (DAAC), and The International Tree Ring Data Bank (ITRDB), coordinated by National Climatic Data Center at NOAA....

Onboarding visdat, a tool for preliminary visualisation of whole dataframes

Take a look at the data This is a phrase that comes up when you first get a dataset. It is also ambiguous. Does it mean to do some exploratory modelling? Or make some histograms, scatterplots, and boxplots? Is it both? Starting down either path, you often encounter the non-trivial growing pains of working with a new dataset. The mix ups of data types - height in cm coded as a factor, categories are numerics with decimals, strings are datetimes, and somehow datetime is one long number....

So you (don't) think you can review a package

Contributing to an open-source community without contributing code is an oft-vaunted idea that can seem nebulous. Luckily, putting vague ideas into action is one of the strengths of the rOpenSci Community, and their package onboarding system offers a chance to do just that. This was my first time reviewing a package, and, as with so many things in life, I went into it worried that I’d somehow ruin the package-reviewing process— not just the package itself, but the actual onboarding infrastructure…maybe even rOpenSci on the whole....

Tesseract and Magick: High Quality OCR in R

Last week we released an update of the tesseract package to CRAN. This package provides R bindings to Google’s OCR library Tesseract. install.packages("tesseract") The new version ships with the latest libtesseract 3.05.01 on Windows and MacOS. Furthermore it includes enhancements for managing language data and using tesseract together with the magick package. 🔗 Installing Language Data The new version has several improvements for installing additional language data. On Windows and MacOS you use the tesseract_download() function to install additional languages:...

Working together to push science forward

Happy rOpenSci users can be found at