:-==+==---::
            :-=------=*******+==-:
     .:::-=++=--:::--===++***+===+******+==-.
:-------==++*#%%%#####+=-:-===-:::--===++**#%+
+==+++*#%%@@@@@@@@@@@@%=-::::===::::==--:--=*%%*=:
+++*###%@@@@@@@@@@@@@@@#==::::-====*###*=---=+*#%@@#=.                                                       ......:::::
++**##%@@@@@@@@@@@@@@@@#+++=::::-=+%@@@##%*==**====*%%#:                                      ......:-::::::::::::------
**##%@@@@@@@@@@@@@#=:   :++**+=::::=%@@@@@@@@@@@%#**++%@=                               .:::..::--=------+++====++++====
#%@@@@@@@@@@@@@@*.        :*%%%+-:-=#%@*:@@@@@@@@@@@@#++%#.       ..:::--=----:..:::::::--:::----=+=-+*##%%@@%%%%#*+==--
@@@@@@@@@@@@@@*.             ::----=+=+#%@@@@+@@@@@@@#+%%%%   ::-:-+=+*#%+++++:.::-**+**+-:------+%@@@@%%%@@%@%%#*+++=--
@@@@@@@@@@@@+.                      :-+**#%@@- =@@@@@@.:%%@#  :+###%#**++=*%@@+:=*-#@@*-::==----=%@@@@@@@@%%%%%%%##**+++
@@@@@@@@%*-                            =*#@%@%  +@@@@@*  -=-                  .+%@@%*-::-+%%++**%@@@@@@@@@@@@@@@@@@%%%%%
@@@@@#+:                                %@@@#@=  *@@@@*                       .+#*-:-+#%@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
%*=:                                    **:-+=   -@@#%=                    :-=++=*#%@@%%%@@@@@@@@@@@@@#=.    :--=+*##%%%
                                                                          :+====#@@#%@#==*@%#%@@@@@%=
                                                                           :+#@@@#*#@@###%#%@%%%*=:
                                                                            =*@@#*@@+:=+%@=:::.
                                                                            +%@@+#@* -@@@=
                                                                            ==@%+@#  :*+%-
                                                                            -+@%+@-   ==*.
                                                                             :: :.

  

Architecting a Package Manager

Early this year, I spent a lot of time working with R package management. The following are the design decisions and failure classes that made me stop shipping fixes into renv and start building rpx.

The Package Manager is a Package

In most languages, the package manager either comes bundled with the toolchain or is a separate binary you install. This is not the case in R. The runtime gives you an install.packages() method through which you usually make a global installation of the latest version of a package. At its conception, renv’s purpose was to virtualize each project’s dependencies and record which packages were installed. But renv itself was also a package, creating a bootstrapping problem.

Say you clone a repository with a renv generated lockfile and you want to use the same environment. You would first need to install renv, and only then could you install your dependencies. What if you installed the wrong version, one that couldn't read your lockfile? That's why renv generates an activation script you need to commit alongside its lockfile. On a fresh R session, it makes sure the pinned renv version is installed from its GitHub release. Pretty simple, right?

But what if renv needed to implement a complex feature that already exists as a package? Now your environment not only contains renv but also a package that the consumer could also likely use. It would have to decide precedence between the version the user wants, the version renv is compatible with, and whatever was installed globally.

Parallelism in R

The most significant performance gain to be had in package management is parallelizing the web requests necessary for package installs and dependency resolution. The only problem with achieving it is that there is no multi-threading in base R. The only way is... you guessed it, a package.

The most popular one is parallel, which allows for multi-processing, but since our workload is network bound you could also use curl. The async docs say you're supposed to queue multiple requests into a batch and only register handlers for events emitted by each one. This feature leans on the libcurl multi-interface, which despite internally being single-threaded gives us a good hint at how you can run async code in R, namely by using external binaries.

This idea comes to fruition as pak the performant package installer. It's a C binary in an R package that renv can delegate the dependency resolution and installation process to, but it begs the question: do we even need R for package management at all? What are the requirements for dependency resolution?

Breaking changes

CRAN's package indexes are designed with rolling releases in mind. The PACKAGES index only contains the latest version of each package, along with the dependency declarations for that version.

This actually works pretty well given that each package, on submission, is not only tested individually, but each package that depends on it is tested with the new submission too, making sure no breaking changes are introduced. This keeps the current PACKAGES index permanently stable and is why tools like renv tend to snapshot the current state and update all dependencies at once. Posit Package Manager takes the same idea further by offering CRAN-like snapshots for particular dates.

But what if you have a big project with many dependencies where you might want to pin your dependencies? Or you have many internal packages that transitively depend on CRAN?

Let's assume a scenario of private packages A and B, where A depends on B. Both of them have CRAN dependencies. Say a CRAN package introduces a breaking change that affects A and B. You ship a fix into B, and release a new version also with a breaking change. Then you release the new version of A compatible with both breaking changes. All is well, right?

Now we arrive at the version pinning dilemma. A developer who tries to install the n-1 version of package A now receives two dependencies incompatible with it. You might think we could have set an upper bound on its dependencies to be the next major version, but you're forgetting the PACKAGES index, only ever contains the latest version. There is no reliable way of retrieving a complete list of versions for any package.1

The reverse dependency check becomes the de facto standard for guarding breaking changes, but it makes the dangerous assumption that CRAN is the comprehensive2 universe of packages to exist. This assumption is actively hostile against private repositories.

How I made a Private Repository

The primary blocker on setting upper bounds for versions was the availability of version lists. I decided the easiest solution was to make my own. Turns out for $20 in Cloudflare object storage bills per month you too can host a CRAN mirror! rrepo.org exposes a separate API for listing packages, their versions, and the DESCRIPTION files that contain declared dependencies.

What good is an API no one uses? renv does not know my URL structure, and neither does pak. So I made my own package manager! A month of my life and countless Codex credits later we have rpx, a Rust cli, that conveniently does not have the bootstrapping problem.

By default, whenever you declare a root-level dependency, rpx sets a lower bound to the currently installed version and an upper bound to the next major version. A big reason behind choosing PubGrub was its ability to explain the conflicts these bounds will inevitably introduce. Adding upper bounds by default is a deliberate choice: it moves the computationally expensive compatibility check out of the publishing step and into dependency resolution.

Both projects are in the early stage and could use a bit of polishing, but you should be able to create a private repository on rrepo.org and add it as a supplementary package source to an rpx-managed package.

rpx supports CRAN as a package source, but when pointed directly at CRAN it is limited to the package universe exposed by PACKAGES. I plan to extend rrepo.org to be CRAN compatible too. Once I do, since the additional repositories are specified under the standard Additional_repositories: key in the DESCRIPTION file, you should even be able to publish your package on CRAN with rrepo packages in its dependencies, provided your repository is public.

The core idea behind rpx is to make distributed package hosting viable for R, allowing teams to share build artifacts instead of maintaining a single monorepo. If you decide to try it out, please contact me. I’d be happy to help make it work for you!

Footnotes

  1. Technically CRAN exposes this list as a HTML directory listing on src/contrib/Archive, but it's inconsistently available across mirrors and almost universally unavailable on CRAN like repositories. It's not an official feature making it an unreliable source.
  2. It's ironic because CRAN stands for the Comprehensive R Archive Network