Interactive and publication-quality plotting
AdhereR can generate various real-time interactive plots that allow the easy exploration of individual patients, useful both for research and in the clinical practice. These interactive visualizations use Shiny, which allows the intuitive interaction with dynamic plots from a normal WEB browser (such as Google Chrome or Firefox) on pretty much any platform (Windows, Linux or macOS on a laptop or desktop, but also Android or iOS on a smartphone or tablet), without the need to have R installed on the client (e.g., the smartphone used for the visualization). In fact, the actual data and the R engine on which AdhereR runs may be hosted on a dedicated hardware and software infrastructure half-a-world away, everything happening transparently (and securely) over the internet. More details about importing data, plotting it and saving the results can be found in here.
While interactive visualizations are essential, sometimes we may want to also produce publication-quality plots of a (group of) patient(s); this is easily done with AdhereR, as shown by a few examples below (please note that the images themselves are low-size JPEGs, but clicking on them allows the download of high-resolution TIFFs).
Use with big databases
AdhereR can process data from a variety of sources, including from a “flat” file in the CSV (“comma-separated values“, using commas (,) or other separators) format and other file formants (e.g., Excel, Stata, SAS or SPSS) that can be imported into R using various methods (see here or here), but this is appropriate only when the data is relatively small (say, a few tens to thousands of records). However, most real-world clinical data far exceeds these sizes, not to mention that they are usually structured across more than one “table”, are stored remotely (sometimes, centrally) and access is strictly controlled to respond to privacy and security regulations.
AdhereR can access such data stored using a variety of technologies, probably the most popular being variants of “classical” Relational Database Management Systems (or RDBMS) powered by Structured Query Language (or SQL), such as MySQL, MariaDB, SQLite, PostgreSQL, Microsoft SQL Server or Oracle Database. This can be done explicitly (by embedding SQL commands within R), or transparently (by “hiding” the SQL within R calls), either way most of the data selection being performed on the server, with only the strictly required data being sent to the client for processing and plotting. AdhereR also supports newer approaches, such as Apache’s Hadoop (using Map/Reduce).
AdhereR can use data stored in dedicated RDBMs (such as MySQL) using (explicitly or implicitly) SQL, or it can process data from Apache Hadoop (through HDFS and Map/Reduce).
This means that AdhereR can access vast amounts of data stored remotely or locally, and it can process it locally (on the client machine) as well as remotely (on dedicated hardware and software platforms, such as a heterogeneous computer cluster). For more info, please see the dedicated vignette.
AdhereR runs efficiently on almost anything
AdhereR is written in “pure” R, and despite various complaints that R is slow, AdhereR’s kernel is heavily optimised (mostly using data.table) and capable of parallel processing. This ensures that AdhereR is actually quite fast (for example, earlier benchmarks — around 2017 — of version 0.1 on a Core i7-3770 16Gb RAM desktop computer running Linux with a databse containing 500,000 patients with 4,058,110 events computed CMA1 in about 10 minutes when run in parallel on all 4 physical cores; see here for details).
Another frequently cited limitation of R (and, implicitly, of AdhereR) is that it can’t process datasets that don’t fit in the computer’s RAM. However, AdhereR is not affected, because it can processes subsets of the whole dataset individually, sequentially or in parallel. As detailed in this vignette, the data may be stored in an SQL database, from where the data for groups of patients are selected, sent to AdhereR for processing, and the results written back in the database (this can be done in parallel if multiple cores/CPUs/nodes are available — see the vignette for details). In this manner, huge amounts of data can be processed by leveraging parallelism on machines with multiple cores/CPUs or even across heterogeneous clusters. Thus, AdhereR runs on anything in between an Atom-powered tablet, a consumer-grade laptop and a computer cluster, under Windows, macOS or Linux.
AdhereR is not just for R
While AdhereR is targeted at R and currently only implemented in R (for several reasons, including its widespread use in research and business, support for data processing and visualisation, flexibility, openness and available libraries), we are aware that there exist other programming languages (such as Python or Julia) and statistical platforms (such as SAS or Stata) for which the methods implemented by AdhereR would be useful.
One alternative would be to develop in sync multiple versions of AdhereR (say, to have one AdherePy, one AdhereJul, one AdhereSAS and one AdhereSta), but this is a bad idea on several levels (not least to do with our limited development and testing resources). Therefore, we have opted for the next best thing, which is to implement a bridging interface that allows other languages and platforms to transparently use AdhereR (including its interactive plotting).
We provide a full implementation for Python 3 (described in this vignette), which consists of a Python module (called “adherer”) exposing a hierarchy of Python classes that mirror the original R classes. The module is smart enough to find (in most cases) by itself where R and AdhereR are installed, to call them with the appropriate parameters, and to interpret and convert the results back to Python, providing a “full Python” experience to the user (with all the gory details hidden in its code). Also, despite some overhead costs related to data conversion and calling R, the bridge is fast enough to allow real data processing and visualisation in a production environment.