<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Econometrics and Free Software</title>
<link>https://b-rodrigues.github.io/</link>
<atom:link href="https://b-rodrigues.github.io/index.xml" rel="self" type="application/rss+xml"/>
<description></description>
<generator>quarto-1.6.37</generator>
<lastBuildDate>Fri, 03 Apr 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>You can just build your own programming language</title>
  <link>https://b-rodrigues.github.io/posts/2026-04-03-tproject.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a> <img src="https://b-rodrigues.github.io/assets/img/tlogo.png" style="width: 50%; height: auto;"> </a>
</p>
</div>
<p>Last summer, while relaxing on the beaches of Berck, a French town known for treating tuberculosis in kids by exposing them to the fresh maritime air (back in 19th century, they have antibiotics these days), I found myself daydreaming about building my own programming language.</p>
<p>Spoiler alert: I don’t know how to build programming languages, but I have developed extremely strong opinions over the years about the features a modern data science language <em>should</em> have. So could I use them fancy LLMs to build one?</p>
<p>Also, let’s get one question answered straight away: why create a new language instead of contributing to existing ones? I certainly do contribute, I maintain several R packages like <code>{rix}</code>, <code>{rixpress}</code>, and <code>{chronicler}</code>, and even have two Python packages (<code>cronista</code> and <code>ryxpress</code>), but I wanted a clean slate to build a system centered around a few non-negotiable principles and features I’ve implemented over the years in R:</p>
<ul>
<li><strong>Reproducibility-First</strong>: A language where reproducibility isn’t a bolt-on afterthought managed by external tools, but the very foundation of the runtime.</li>
<li><strong>Aggressive Re-use</strong>: Instead of reinventing the wheel, this language would stand on the shoulders of giants. It’d use <strong>Nix</strong> for package management and environment isolation, and <strong>Apache Arrow</strong> as its high-performance backbone for data frames. R, Python, Julia and other languages would provide the algorithms and models.</li>
<li><strong>First-Class Pipelines</strong>: Scripts shouldn’t be a sequence of side-effects. In this language, pipelines would be mandatory and first-class citizens.</li>
<li><strong>Fail Early and Loudly</strong>: No silent type conversions or hidden NAs. If something is wrong, the language breaks immediately so you can fix it.</li>
<li><strong>Errors as Objects</strong>: Inspired by functional programming, errors are first-class values that can be inspected and handled gracefully.</li>
<li><strong>Two Pipes</strong>: I want two pipes, one for linear transformations, <code>|&gt;</code>, and a maybe-pipe, <code>?|&gt;</code> for error recovery. Unlike the standard pipe, <code>?|&gt;</code> always forwards its value, including Errors, to the next function, allowing you to write handlers that inspect and potentially recover from them. Since Errors are just values, this composes naturally with the rest of the language.</li>
<li><strong>Polyglot by Design</strong>: Rather than re-implementing every statistical algorithm, this language would be designed to orchestrate and bridge R, Python, and Julia seamlessly.</li>
</ul>
<p>Also, we’re in a post LLM world, and like them or not, they’re here to stay. They’re pretty useful to write boilerplate code and so any new language would be dead on arrival if it didn’t play nicely with LLMs. So such a new language would need to be written for LLMs primarily, because I don’t expect anyone to learn any new language. This is where the declarative nature of Nix is a huge advantage. Because environments are precisely described, it is much easier for LLMs to focus on generating code and not have to fight with environment setup. This is also the reason I took another radical decision: since Nix would be mandatory for setting up the environment, why bother building OS-specific binaries? I’d just build a Nix package for this language and let Nix handle the rest.</p>
<p>This architecture results in a DSL for orchestration, making it trivial to transfer data objects between different ecosystems without the usual FFI (Foreign Function Interface) friction.</p>
<p>With these ideas in mind, I started prompting Gemini to brainstorm and started by generating specification files. Very broad first, but as days went by, more and more focused. The way I went about it (and still go) is that I first brainstorm an idea with an LLM, then I ask it to generate a specification file, then I refine it, ask it to generate a new specification file, and so on. Once I’m happy with the spec, I ask an LLM to generate a minimal implementation of the spec. Usually writing the spec and a first implementation is a task shared between Claude and Gemini (through Antigravity). Then I open a pull request and ask GitHub Copilot to review it (usually with GPT-5.x). I repeat this process until I’m happy with the implementation. I always ask for documentation and unit tests (and golden tests when relevant, more on this later).</p>
<p>I started to really believe that I had something interesting, so I gave it a shot, and called it <strong>T</strong>. I had long joked that the natural successor to R should be called T (because R is the successor to S… and no, I’m not going to call it Q because that sounds like the word for ass in French).</p>
<p>Something else that made me confident I could succeed, besides my own hubris, was that I am pretty familiar with unit testing, test-driven development, trunk-based development and Nix. When you combine all these elements, it makes developing with LLMs quite safe.</p>
<p>So I just started prompting. And now I’m quite happy to announce that there is a beta version of T that you can use today!</p>
<p>By leveraging Nix as a build engine, T can treat complex data science workflows as buildable derivations. A typical T pipeline looks like this:</p>
<pre class="t"><code>p = pipeline {
  -- 1. Python node: read data with pandas
  mtcars_pl = pyn(
    command = &lt;{
import pandas as pd
pd.read_csv("data/mtcars.csv", sep="|")
    }&gt;,
    include = ["data/mtcars.csv"],
    serializer = ^csv
  )

  -- 2. Python node: filter and serialize as CSV
  mtcars_pl_am = pyn(
    command = &lt;{
mtcars_pl[mtcars_pl['am'] == 1]
    }&gt;,
    deserializer = ^csv,
    serializer = ^csv
  )

  -- 3. R node: read CSV and take head using functions.R
  mtcars_head = rn(
    command = &lt;{
my_head(mtcars_pl_am)
    }&gt;,
    functions = ["src/functions.R"],
    deserializer = ^csv,
    serializer = ^csv
  )

  -- 4. R node: select column with dplyr
  mtcars_mpg = rn(
    command = &lt;{
library(dplyr)
mtcars_head %&gt;% select(mpg)
    }&gt;,
    deserializer = ^csv,
    serializer = ^csv
  )

  -- Render Quarto report
  report = node(script = "src/report.qmd", runtime = Quarto)
}

-- Materialize the pipeline
populate_pipeline(p, build = true)
pipeline_copy() -- Copy the outputs from the Nix store to your working directory</code></pre>
<p>As you can see, each node has a <code>command</code> argument where you can write literal R or Python code. It is also possible to provide the path to a script instead. If packages need to be loaded for the code to work, you can just write the calls to load the required packages in the <code>command</code> argument as well.</p>
<p>While T is heavily inspired by the <code>{targets}</code> package in R, it takes the concept a step further by making pipelines <strong>first-class objects</strong> within the language itself. This means you can:</p>
<ul>
<li><strong>Compose Pipelines</strong>: You can define small, modular pipelines and then merge them into larger ones using standard operators.</li>
<li><strong>Static Analysis</strong>: Because the DAG (Directed Acyclic Graph) is defined within the language, T can validate your entire workflow (checking for circular dependencies or missing data) before a single line of code even runs.</li>
<li><strong>Heterogeneous Execution</strong>: A single pipeline can effortlessly mix R, Python, and native T code. Data is passed between these nodes using built-in serializers like <code>^csv</code>, <code>^arrow</code>, or even specialized formats like <code>^pmml</code> for traditional models and <code>^onnx</code> for deep learning architectures. It is also possible to define your own serializers.</li>
<li><strong>Immutable State</strong>: Each node output is managed by Nix, meaning if you haven’t changed the code or the data for a specific node, T (via Nix) will simply pull the cached result from previous runs.</li>
</ul>
<p>But don’t let the “orchestrator” label fool you; T is also a capable language in its own right. It features a selection of built-in packages inspired by the <code>tidyverse</code> for data manipulation. Thanks to its Arrow backend, it is surprisingly fast. I even maintain a CI benchmark running on NYC Taxi data to ensure performance remains competitive.</p>
<p>I made sure that T is pretty easy to use with LLMs by providing a file called <code>summary.md</code> in the root of the GitHub repository. This file is meant to be used by LLMs to quickly learn the language’s syntax and generate code accordingly. You could also provide the whole help documentation to the LLM (found in the repository under <code>help/docs.json</code>), but I found that a summary is usually enough. There is also another experimental feature I’m thinking about, called <code>intent</code> blocks. These blocks would essentially be first-class structured comments that would be used to anchor LLM’s behaviour and make it more deterministic. These blocks would be parsed by T and used to generate code accordingly. I have some ideas how these could look like, something like this:</p>
<pre class="t"><code>intent {
  description: "Customer churn prediction",
  assumptions: ["Age &gt; 18", "NA imputed with multiple imputation"],
  requires: ["dataset.csv"]
}</code></pre>
<section id="is-this-slop" class="level2">
<h2 class="anchored" data-anchor-id="is-this-slop">Is this slop?</h2>
<p>There’s a lot of skepticism about building your own language using LLMs, and I get it. I was pretty skeptical myself. So let me tell you what actually gives me confidence in T’s correctness: as of writing, 1753 unit tests, 122 golden tests, 13 end-to-end tests, and 18 full project demos are executed on every push and PR, on both Linux and macOS via GitHub Actions. That’s the verification regime, and it has to be rigorous precisely because I can’t audit the OCaml implementation by eye. This is actually one of the more interesting lessons from this project: when you can’t rely on code review, you have to over-invest in tests and specifications. The spec files, the enriched changelog, the <code>summary.md</code>, all of that context makes the LLM’s output more predictable, and the test suite tells you immediately when it isn’t.</p>
<p>From personal experience, when I generate R or Python code, the output looks a lot like what I would have written myself. The main failure mode I’ve noticed is lack of context: the more you give the model, the better the result. Letting separate LLMs review PRs and iterating through several loops helps catch what any single model misses.</p>
<p>I’m also confident in T’s safety from a different angle: it’s ultimately orchestrating Python and R code you write yourself, and that you can test independently.</p>
</section>
<section id="interested" class="level2">
<h2 class="anchored" data-anchor-id="interested">Interested?</h2>
<p>If you’re interested in trying it out or contributing, check out the <a href="https://github.com/b-rodrigues/tlang">official repository</a> or the <a href="https://tstats-project.org/">website</a>, and don’t hesitate to open an issue or a PR or contact me on the dedicated Matrix (https://matrix.to/#/#tproject:matrix.org) channel.</p>
</section>
<section id="appendix" class="level1">
<h1>Appendix</h1>
<p>For the interested reader, here’s how to get started with T.</p>
<section id="how-to-get-started" class="level2">
<h2 class="anchored" data-anchor-id="how-to-get-started">How to get started</h2>
<p>If you have Nix installed, getting started with a new project is just a single command away:</p>
<div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 1. Initialize a new project</span></span>
<span id="cb3-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">nix</span> run github:b-rodrigues/tlang <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--</span> init <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--project</span> my_t_project</span>
<span id="cb3-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> my_t_project</span></code></pre></div>
<p>There will be no other way to start a T project. As explained above, I don’t want to have to deal with providing OS-specific binaries, and since Nix is used by T as the build engine, you’ll need to have Nix installed on your system anyways. Might as well reuse it to manage the install T itself!</p>
<p>Inside the project’s folder, you’ll find a <code>tproject.toml</code> file. This is were you list R and Python packages you’ll need. For example:</p>
<div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode toml code-with-copy"><code class="sourceCode toml"><span id="cb4-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[project]</span></span>
<span id="cb4-2"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">name</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"r_py_xgboost_t"</span></span>
<span id="cb4-3"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">description</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A T data analysis project"</span></span>
<span id="cb4-4"></span>
<span id="cb4-5"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[dependencies]</span></span>
<span id="cb4-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># T packages this project depends on</span></span>
<span id="cb4-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Format: package = { git = "repository-url", tag = "version" }</span></span>
<span id="cb4-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Example:</span></span>
<span id="cb4-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># stats = { git = "https://github.com/t-lang/stats", tag = "v0.5.0" }</span></span>
<span id="cb4-10"></span>
<span id="cb4-11"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[r-dependencies]</span></span>
<span id="cb4-12"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">packages</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"yardstick"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb4-13"></span>
<span id="cb4-14"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[py-dependencies]</span></span>
<span id="cb4-15"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">version</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"python313"</span></span>
<span id="cb4-16"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">packages</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"numpy"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pandas"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"scikit-learn"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"xgboost"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb4-17"></span>
<span id="cb4-18"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[additional-tools]</span></span>
<span id="cb4-19"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">packages</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"quarto"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb4-20"></span>
<span id="cb4-21"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[t]</span></span>
<span id="cb4-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Minimum T language version required</span></span>
<span id="cb4-23"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">min_version</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0.51.2"</span></span></code></pre></div>
<p>Under “additional tools” you can add any package that is available in <code>nixpkgs</code>. If you need LaTeX, you can also add this dedicated section:</p>
<div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode toml code-with-copy"><code class="sourceCode toml"><span id="cb5-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[latex]</span></span>
<span id="cb5-2"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">packages</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"amsmath"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"geometry"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"hyperref"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"biblatex"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span></span></code></pre></div>
<p>You may have noticed that there is also a section for T packages; that’s right, T supports user-defined packages. Instead of starting a project you’d start a package:</p>
<div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb6-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">nix</span> run github:b-rodrigues/tlang <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--</span> init <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--package</span> my_package</span>
<span id="cb6-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> my_package</span></code></pre></div>
<p>Instead of a <code>tproject.toml</code> file, you’ll have to fill a <code>DESCRIPTION.toml</code> file:</p>
<div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode toml code-with-copy"><code class="sourceCode toml"><span id="cb7-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[package]</span></span>
<span id="cb7-2"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">name</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"my_package"</span></span>
<span id="cb7-3"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">version</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0.1.0"</span></span>
<span id="cb7-4"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">description</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A brief description of what my_package does"</span></span>
<span id="cb7-5"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">authors</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"brodriguesco"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb7-6"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">license</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EUPL-1.2"</span></span>
<span id="cb7-7"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">homepage</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span></span>
<span id="cb7-8"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">repository</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span></span>
<span id="cb7-9"></span>
<span id="cb7-10"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[dependencies]</span></span>
<span id="cb7-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># T packages this package depends on</span></span>
<span id="cb7-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Format: package = { git = "repository-url", tag = "version" }</span></span>
<span id="cb7-13"></span>
<span id="cb7-14"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[t]</span></span>
<span id="cb7-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Minimum T language version required</span></span>
<span id="cb7-16"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">min_version</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0.5.0"</span></span></code></pre></div>
<p>Another important file is the <code>flake.nix</code> that will be automatically generated. You shouldn’t have to touch it, but this <code>flake.nix</code> is what provides the reproducible development environment for running your project. To do so, simply use:</p>
<div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb8-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">nix</span> develop</span></code></pre></div>
<p>This will install T and activate the environment. If you’ve added stuff to the <code>tproject.toml</code> you’ll have to run <code>t update</code> to sync the packages to the flake, and then rebuild the environment (you’ll need to exit the development environment with <code>exit</code> and rebuild it using <code>nix develop</code> again). Oh and by the way, T requires a Linux-like environment so if you’re on Windows, you’ll have to run T within <strong>WSL2</strong> (Windows Subsystem for Linux).</p>
<p>Once inside the <code>nix develop</code> shell, everything you need, the T interpreter, your specific versions of R/Python, and all project tools, is ready to use. You don’t need to manage virtual environments or Docker containers manually; T handles the heavy lifting via Nix under the hood.</p>
<p>You can browse examples on this <a href="https://github.com/b-rodrigues/t_demos">repository</a>.</p>
</section>
<section id="tooling-and-editor-support" class="level2">
<h2 class="anchored" data-anchor-id="tooling-and-editor-support">Tooling and Editor Support</h2>
<p>A language is only as good as its developer experience. I politely asked LLMs to implement a full Language Server (<strong>LSP</strong>) for T, which provides autocompletion, real-time diagnostics, and “Go to Definition” support.</p>
<ul>
<li>For <strong>VS Code / Positron</strong>: A dedicated extension providing syntax highlighting and LSP integration.</li>
<li>For <strong>Vim / Emacs</strong>: Detailed configuration guides and syntax files are available.</li>
<li>For <strong>Quarto</strong>: T is fully compatible with Quarto for literate programming, allowing you to run executable <code>{t}</code> chunks directly in your documents.</li>
</ul>
<p>For detailed setup instructions, check out the <a href="https://github.com/b-rodrigues/tlang/blob/main/docs/editors.md">Editor Support guide</a> in the official documentation.</p>
<p>There’s much more I haven’t covered here, so <a href="https://github.com/b-rodrigues/tlang">check out the official repository</a> or the <a href="https://tstats-project.org/">website</a>.</p>


</section>
</section>

 ]]></description>
  <category>T</category>
  <category>datascience</category>
  <guid>https://b-rodrigues.github.io/posts/2026-04-03-tproject.html</guid>
  <pubDate>Fri, 03 Apr 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>From scripts to pipelines in the age of LLMs</title>
  <link>https://b-rodrigues.github.io/posts/2026-01-13-data_science_llm_age.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a> <img src="https://b-rodrigues.github.io/assets/img/chad_beach.png" style="width: 80%; height: auto;"> </a>
</p>
</div>
<p>I was recently reading Davis Vaughan’s blog post <a href="https://blog.davisvaughan.com/posts/2026-01-09-claude-200-pull-requests/">Semi-automating 200 Pull Requests with Claude Code</a> and it really resonated with me, as I’ve been using LLMs for tedious tasks like that for some time now. Davis’s key insight: <em>structure = success</em>. When you can scope a task tightly and provide clear context, LLMs become genuinely useful tools.</p>
<p>If you’ve been following my work, you know that reproducible pipelines have been my main focus for some time now. It’s the reason I wrote <code>{rix}</code> for reproducible R environments, <code>{rixpress}</code> for declarative pipelines, and even a Python port called <code>ryxpress</code>. I genuinely believe these tools make data science better: more reproducible, more debuggable, more shareable.</p>
<p>But I also know that getting people to adopt new tools is hard. Learning a new way of structuring your code takes time and effort, and most people are busy enough already. Here’s where LLMs enter the picture: they can help translate your existing scripts into this more structured format. You provide your monolithic script, explain what you want, and the LLM does the grunt work of restructuring it.</p>
<p>The typical way we write analytics scripts (long chains of <code>%&gt;%</code> calls in R or method-chaining in Python) works fine for interactive exploration, but quickly turns into spaghetti that’s hard to modify, test, or debug. Take <a href="../posts/2018-11-14-luxairport.html">my old Luxembourg Airport analysis</a> as an example: it works, but turning that kind of script into a proper pipeline with caching, explicit dependencies, and testability is tedious work.</p>
<p>But we’re in 2026 where LLMs now make this trivial.</p>
<section id="from-implicit-to-explicit-translating-a-script-into-a-rixpress-pipeline" class="level2">
<h2 class="anchored" data-anchor-id="from-implicit-to-explicit-translating-a-script-into-a-rixpress-pipeline">From implicit to explicit: translating a script into a rixpress pipeline</h2>
<p>Let me show you what I mean by translating that old Luxembourg Airport data cleaning code into a <code>{rixpress}</code> pipeline. The original script uses continuous <code>%&gt;%</code> chaining, standard tidyverse style. The <code>{rixpress}</code> version makes every intermediate step explicit.</p>
<p>But here’s the thing: LLMs don’t know about <code>{rixpress}</code> out of the box. It’s a relatively new package, and it’s not in their training data (or if it is, it’s outdated). So I wrote another tool called <a href="https://github.com/b-rodrigues/pkgctx">pkgctx</a> that extracts structured, LLM-ready API specifications from R or Python packages (and guess what, I did not write a single line of code of that tool, LLMs did).</p>
<p>The idea is simple: run <code>pkgctx r rixpress &gt; rixpress.ctx.yaml</code> and you get a compact YAML file describing every function, its arguments, and its purpose. Feed that to an LLM, and suddenly it “knows” the package. All my packages now have such a file generated on each push thanks to GitHub Actions. Here’s how my conversation went:</p>
<blockquote class="blockquote">
<p><strong>Me:</strong> I want to use a package called rixpress, but I am not familiar with it. You’ll help me, and here is a file with a detailed description of the API, for context.</p>
<p><em>[uploaded rixpress.ctx.yaml]</em></p>
<p><strong>LLM:</strong> Got it. I’ll use this API specification as the authoritative reference for <strong>rixpress</strong> and help you write correct, idiomatic pipelines with it.</p>
<p>A few orientation points to set the frame (no action required yet):</p>
<ul>
<li><strong>rixpress is declarative</strong>: you describe <em>derivations</em> (<code>rxp_r()</code>, <code>rxp_py()</code>, <code>rxp_qmd()</code>, …), then materialize them with <code>rxp_populate()</code> and <code>rxp_make()</code>.</li>
<li><strong>Data flows through the DAG</strong> via serialized artifacts (RDS / pickle / Julia Serialization), not in-memory objects.</li>
<li><strong>Language boundaries are explicit</strong> (<code>rxp_r2py()</code>, <code>rxp_py2r()</code>).</li>
<li><strong>Nix is the execution engine</strong>; <code>default.nix</code> (or equivalents) define the runtime environment.</li>
</ul>
<p>When you’re ready, tell me what you want to do […]</p>
</blockquote>
<p>Then I simply asked:</p>
<blockquote class="blockquote">
<p><strong>Me:</strong> Help me translate this R script into a rixpress pipeline: <em>[pasted the old script]</em></p>
</blockquote>
<p>And that’s how I got a working <code>{rixpress}</code> pipeline. The LLM did the tedious restructuring; I reviewed the output, made minor tweaks, and was done. The combination of <code>pkgctx</code> for context and a clear task (“translate this script”) made the LLM genuinely useful.</p>
<p>Now let’s look at what the translated pipeline looks like. First, let’s assume:</p>
<ul>
<li>The data file <code>avia_par_lu.tsv</code> is in the project directory</li>
<li>Required R packages are available via <code>default.nix</code> (we’ll also use an LLM for this one)</li>
<li>The project has been initialized with <code>rxp_init()</code> (this sets up two skeleton files to get started quickly)</li>
</ul>
<details>
<summary>
Click to expand the full rixpress pipeline
</summary>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(rixpress)</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 0: Load the data</span></span>
<span id="cb1-4">avia <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r_file</span>(</span>
<span id="cb1-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> avia,</span>
<span id="cb1-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">path =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"avia_par_lu.tsv"</span>,</span>
<span id="cb1-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">read_function =</span> readr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>read_tsv</span>
<span id="cb1-8">)</span>
<span id="cb1-9"></span>
<span id="cb1-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 1: Select and reshape (wide → long)</span></span>
<span id="cb1-11">avia_long <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb1-12">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> avia_long,</span>
<span id="cb1-13">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span></span>
<span id="cb1-14">    avia <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-15">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"unit,tra_meas,airp_pr</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">time"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">contains</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"20"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-16">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gather</span>(date, passengers, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unit,tra_meas,airp_pr</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>)</span>
<span id="cb1-17">)</span>
<span id="cb1-18"></span>
<span id="cb1-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 2: Split composite key column</span></span>
<span id="cb1-20">avia_split <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb1-21">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> avia_split,</span>
<span id="cb1-22">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span></span>
<span id="cb1-23">    avia_long <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-24">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">separate</span>(</span>
<span id="cb1-25">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unit,tra_meas,airp_pr</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb1-26">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">into =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"unit"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tra_meas"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"air_pr</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">time"</span>),</span>
<span id="cb1-27">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">","</span></span>
<span id="cb1-28">      )</span>
<span id="cb1-29">)</span>
<span id="cb1-30"></span>
<span id="cb1-31"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 3: Recode transport measure</span></span>
<span id="cb1-32">avia_recode_tra_meas <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb1-33">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> avia_recode_tra_meas,</span>
<span id="cb1-34">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span></span>
<span id="cb1-35">    avia_split <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-36">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb1-37">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tra_meas =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_recode</span>(</span>
<span id="cb1-38">          tra_meas,</span>
<span id="cb1-39">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Passengers on board</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PAS_BRD"</span>,</span>
<span id="cb1-40">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Passengers on board (arrivals)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PAS_BRD_ARR"</span>,</span>
<span id="cb1-41">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Passengers on board (departures)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PAS_BRD_DEP"</span>,</span>
<span id="cb1-42">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Passengers carried</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PAS_CRD"</span>,</span>
<span id="cb1-43">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Passengers carried (arrival)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PAS_CRD_ARR"</span>,</span>
<span id="cb1-44">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Passengers carried (departures)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PAS_CRD_DEP"</span>,</span>
<span id="cb1-45">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Passengers seats available</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ST_PAS"</span>,</span>
<span id="cb1-46">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Passengers seats available (arrivals)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ST_PAS_ARR"</span>,</span>
<span id="cb1-47">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Passengers seats available (departures)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ST_PAS_DEP"</span>,</span>
<span id="cb1-48">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Commercial passenger air flights</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CAF_PAS"</span>,</span>
<span id="cb1-49">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Commercial passenger air flights (arrivals)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CAF_PAS_ARR"</span>,</span>
<span id="cb1-50">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Commercial passenger air flights (departures)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CAF_PAS_DEP"</span></span>
<span id="cb1-51">        )</span>
<span id="cb1-52">      )</span>
<span id="cb1-53">)</span>
<span id="cb1-54"></span>
<span id="cb1-55"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 4: Recode unit</span></span>
<span id="cb1-56">avia_recode_unit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb1-57">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> avia_recode_unit,</span>
<span id="cb1-58">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span></span>
<span id="cb1-59">    avia_recode_tra_meas <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-60">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb1-61">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unit =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_recode</span>(</span>
<span id="cb1-62">          unit,</span>
<span id="cb1-63">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Passenger =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PAS"</span>,</span>
<span id="cb1-64">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Flight =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"FLIGHT"</span>,</span>
<span id="cb1-65">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Seats and berths</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SEAT"</span></span>
<span id="cb1-66">        )</span>
<span id="cb1-67">      )</span>
<span id="cb1-68">)</span>
<span id="cb1-69"></span>
<span id="cb1-70"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 5: Recode destination</span></span>
<span id="cb1-71">avia_recode_destination <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb1-72">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> avia_recode_destination,</span>
<span id="cb1-73">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span></span>
<span id="cb1-74">    avia_recode_unit <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-75">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb1-76">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">destination =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_recode</span>(</span>
<span id="cb1-77">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">air_pr</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>,</span>
<span id="cb1-78">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">WIEN-SCHWECHAT</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_AT_LOWW"</span>,</span>
<span id="cb1-79">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">BRUSSELS</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_BE_EBBR"</span>,</span>
<span id="cb1-80">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">GENEVA</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_CH_LSGG"</span>,</span>
<span id="cb1-81">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ZURICH</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_CH_LSZH"</span>,</span>
<span id="cb1-82">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">FRANKFURT/MAIN</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_DE_EDDF"</span>,</span>
<span id="cb1-83">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">HAMBURG</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_DE_EDDH"</span>,</span>
<span id="cb1-84">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">BERLIN-TEMPELHOF</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_DE_EDDI"</span>,</span>
<span id="cb1-85">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">MUENCHEN</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_DE_EDDM"</span>,</span>
<span id="cb1-86">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">SAARBRUECKEN</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_DE_EDDR"</span>,</span>
<span id="cb1-87">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">BERLIN-TEGEL</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_DE_EDDT"</span>,</span>
<span id="cb1-88">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">KOBENHAVN/KASTRUP</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_DK_EKCH"</span>,</span>
<span id="cb1-89">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">HURGHADA / INTL</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_EG_HEGN"</span>,</span>
<span id="cb1-90">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">IRAKLION/NIKOS KAZANTZAKIS</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_EL_LGIR"</span>,</span>
<span id="cb1-91">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">FUERTEVENTURA</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_ES_GCFV"</span>,</span>
<span id="cb1-92">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">GRAN CANARIA</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_ES_GCLP"</span>,</span>
<span id="cb1-93">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">LANZAROTE</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_ES_GCRR"</span>,</span>
<span id="cb1-94">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">TENERIFE SUR/REINA SOFIA</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_ES_GCTS"</span>,</span>
<span id="cb1-95">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">BARCELONA/EL PRAT</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_ES_LEBL"</span>,</span>
<span id="cb1-96">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ADOLFO SUAREZ MADRID-BARAJAS</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_ES_LEMD"</span>,</span>
<span id="cb1-97">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">MALAGA/COSTA DEL SOL</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_ES_LEMG"</span>,</span>
<span id="cb1-98">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">PALMA DE MALLORCA</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_ES_LEPA"</span>,</span>
<span id="cb1-99">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">SYSTEM - PARIS</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_FR_LF90"</span>,</span>
<span id="cb1-100">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">NICE-COTE D'AZUR</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_FR_LFMN"</span>,</span>
<span id="cb1-101">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">PARIS-CHARLES DE GAULLE</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_FR_LFPG"</span>,</span>
<span id="cb1-102">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">STRASBOURG-ENTZHEIM</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_FR_LFST"</span>,</span>
<span id="cb1-103">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">KEFLAVIK</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_IS_BIKF"</span>,</span>
<span id="cb1-104">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">MILANO/MALPENSA</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_IT_LIMC"</span>,</span>
<span id="cb1-105">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">BERGAMO/ORIO AL SERIO</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_IT_LIME"</span>,</span>
<span id="cb1-106">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ROMA/FIUMICINO</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_IT_LIRF"</span>,</span>
<span id="cb1-107">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">AGADIR/AL MASSIRA</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_MA_GMAD"</span>,</span>
<span id="cb1-108">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">AMSTERDAM/SCHIPHOL</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_NL_EHAM"</span>,</span>
<span id="cb1-109">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">WARSZAWA/CHOPINA</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_PL_EPWA"</span>,</span>
<span id="cb1-110">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">PORTO</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_PT_LPPR"</span>,</span>
<span id="cb1-111">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">LISBOA</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_PT_LPPT"</span>,</span>
<span id="cb1-112">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">STOCKHOLM/ARLANDA</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_SE_ESSA"</span>,</span>
<span id="cb1-113">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">MONASTIR/HABIB BOURGUIBA</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_TN_DTMB"</span>,</span>
<span id="cb1-114">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ENFIDHA-HAMMAMET INTERNATIONAL</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_TN_DTNH"</span>,</span>
<span id="cb1-115">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ENFIDHA ZINE EL ABIDINE BEN ALI</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_TN_DTNZ"</span>,</span>
<span id="cb1-116">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">DJERBA/ZARZIS</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_TN_DTTJ"</span>,</span>
<span id="cb1-117">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ANTALYA (MIL-CIV)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_TR_LTAI"</span>,</span>
<span id="cb1-118">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ISTANBUL/ATATURK</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_TR_LTBA"</span>,</span>
<span id="cb1-119">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">SYSTEM - LONDON</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_UK_EG90"</span>,</span>
<span id="cb1-120">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">MANCHESTER</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_UK_EGCC"</span>,</span>
<span id="cb1-121">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">LONDON GATWICK</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_UK_EGKK"</span>,</span>
<span id="cb1-122">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">LONDON/CITY</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_UK_EGLC"</span>,</span>
<span id="cb1-123">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">LONDON HEATHROW</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_UK_EGLL"</span>,</span>
<span id="cb1-124">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">LONDON STANSTED</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_UK_EGSS"</span>,</span>
<span id="cb1-125">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">NEWARK LIBERTY INTERNATIONAL, NJ.</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_US_KEWR"</span>,</span>
<span id="cb1-126">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">O.R TAMBO INTERNATIONAL</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LU_ELLX_ZA_FAJS"</span></span>
<span id="cb1-127">        )</span>
<span id="cb1-128">      )</span>
<span id="cb1-129">)</span>
<span id="cb1-130"></span>
<span id="cb1-131"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 6: Final cleaned dataset</span></span>
<span id="cb1-132">avia_clean <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb1-133">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> avia_clean,</span>
<span id="cb1-134">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span></span>
<span id="cb1-135">    avia_recode_destination <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-136">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">passengers =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(passengers)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-137">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(unit, tra_meas, destination, date, passengers)</span>
<span id="cb1-138">)</span>
<span id="cb1-139"></span>
<span id="cb1-140"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 7: Quarterly arrivals</span></span>
<span id="cb1-141">avia_clean_quarterly <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb1-142">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> avia_clean_quarterly,</span>
<span id="cb1-143">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span></span>
<span id="cb1-144">    avia_clean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-145">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(</span>
<span id="cb1-146">        tra_meas <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Passengers on board (arrivals)"</span>,</span>
<span id="cb1-147">        <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(passengers),</span>
<span id="cb1-148">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(date, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Q"</span>)</span>
<span id="cb1-149">      ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-150">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">yq</span>(date))</span>
<span id="cb1-151">)</span>
<span id="cb1-152"></span>
<span id="cb1-153"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 8: Monthly arrivals</span></span>
<span id="cb1-154">avia_clean_monthly <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb1-155">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> avia_clean_monthly,</span>
<span id="cb1-156">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span></span>
<span id="cb1-157">    avia_clean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-158">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(</span>
<span id="cb1-159">        tra_meas <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Passengers on board (arrivals)"</span>,</span>
<span id="cb1-160">        <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(passengers),</span>
<span id="cb1-161">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(date, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"M"</span>)</span>
<span id="cb1-162">      ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-163">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ymd</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(date, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"01"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-164">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(destination, date, passengers)</span>
<span id="cb1-165">)</span>
<span id="cb1-166"></span>
<span id="cb1-167"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Populate and build the pipeline</span></span>
<span id="cb1-168"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_populate</span>(</span>
<span id="cb1-169">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-170">    avia,</span>
<span id="cb1-171">    avia_long,</span>
<span id="cb1-172">    avia_split,</span>
<span id="cb1-173">    avia_recode_tra_meas,</span>
<span id="cb1-174">    avia_recode_unit,</span>
<span id="cb1-175">    avia_recode_destination,</span>
<span id="cb1-176">    avia_clean,</span>
<span id="cb1-177">    avia_clean_quarterly,</span>
<span id="cb1-178">    avia_clean_monthly</span>
<span id="cb1-179">  )</span>
<span id="cb1-180">)</span>
<span id="cb1-181"></span>
<span id="cb1-182"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_make</span>()</span></code></pre></div>
</details>
<p>Now this is a faithful “translation” of the script into a <code>{rixpress}</code> pipeline, however, the original data is now not available anymore, and recent data sets have changed slightly, which means that this script would need further adaptation to the current data source. Otherwise, this would be it! You can view the updated script <a href="https://github.com/b-rodrigues/rixpress_demos/tree/master/avia">here</a> (I have also removed all the recoding of factors, because there seems to be something wrong with how <code>{rixpress}</code> handles <code></code> `, so writing this blog post actually help me find something to fix!)</p>
</section>
<section id="generating-the-environment" class="level2">
<h2 class="anchored" data-anchor-id="generating-the-environment">Generating the environment</h2>
<p>I also used an LLM to generate the <code>{rix}</code> script that sets up the reproducible environment for this pipeline. I gave it the <code>rix.pkgctx.yaml</code> context file (generated with <code>pkgctx r rix &gt; rix.pkgctx.yaml</code>, which is also available on the rix GitHub repo) and asked: “Using this knowledge, write me an R script that uses rix to set up the right default.nix for this pipeline.”</p>
<p>The LLM correctly identified the packages needed from the pipeline code:</p>
<ul>
<li><code>readr</code> (for <code>read_tsv</code>)</li>
<li><code>dplyr</code> (for <code>select</code>, <code>filter</code>, <code>mutate</code>, <code>%&gt;%</code>)</li>
<li><code>tidyr</code> (for <code>gather</code>, <code>separate</code>)</li>
<li><code>forcats</code> (for <code>fct_recode</code>)</li>
<li><code>lubridate</code> (for <code>yq</code>, <code>ymd</code>)</li>
<li><code>stringr</code> (for <code>str_detect</code>)</li>
<li><code>rixpress</code> (for the pipeline itself)</li>
</ul>
<p>And produced this script:</p>
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(rix)</span>
<span id="cb2-2"></span>
<span id="cb2-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rix</span>(</span>
<span id="cb2-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2026-01-10"</span>,</span>
<span id="cb2-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">r_pkgs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb2-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"readr"</span>,</span>
<span id="cb2-7">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr"</span>,</span>
<span id="cb2-8">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tidyr"</span>,</span>
<span id="cb2-9">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"forcats"</span>,</span>
<span id="cb2-10">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lubridate"</span>,</span>
<span id="cb2-11">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"stringr"</span>,</span>
<span id="cb2-12">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rixpress"</span></span>
<span id="cb2-13">  ),</span>
<span id="cb2-14">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ide =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>,</span>
<span id="cb2-15">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">project_path =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"."</span>,</span>
<span id="cb2-16">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb2-17">)</span></code></pre></div>
<p>There’s only one issue with that script: the selected date is not valid, it should instead be the 12th of January. But that’s actually my fault: the LLM had no way of knowing that. The only way it could have known is if I had told it to look at the csv file that lists all the valid dates on <code>{rix}</code>’s repository. But after changing the date, it becomes possible to run this script, then <code>nix-build</code> to build the environment and <code>nix-shell</code> to drop into it. From there, run your pipeline.</p>
<p>What we’ve done here is use LLMs at every step:</p>
<ol type="1">
<li><strong>Gave context about rixpress</strong> (via <code>pkgctx</code>) and asked the LLM to translate my old script into a pipeline</li>
<li><strong>Gave context about rix</strong> (via <code>pkgctx</code>) and asked the LLM to generate the environment setup</li>
</ol>
<p>The pattern is always the same: context + scoped task = useful output.</p>
</section>
<section id="structure-context-outsourceable-grunt-work" class="level2">
<h2 class="anchored" data-anchor-id="structure-context-outsourceable-grunt-work">Structure + context = outsourceable grunt work</h2>
<p>The point I’m making here isn’t really about <code>{rixpress}</code> pipelines specifically. It’s about a broader principle that both Davis Vaughan and I have observed: LLMs are genuinely useful when you give them enough structure and context.</p>
<p>Davis <a href="https://blog.davisvaughan.com/posts/2026-01-09-claude-200-pull-requests/">pre-cloned repositories, pre-generated <code>.Rprofile</code> files, and pre-created task lists</a> so Claude could focus on the actual fixes rather than git management. I used <code>pkgctx</code> to give the LLM a complete API specification and provided a clear starting point (my old script). In both cases, the formula is the same:</p>
<blockquote class="blockquote">
<p><strong>Structure + Context → Scoped Task → LLM can actually help</strong></p>
</blockquote>
<p>I’ve <a href="../posts/2025-07-03-llm_time.html">written before</a> about how you can outsource grunt work to an LLM, but not expertise. The same applies here. I still had to know what data transformations I needed. I still had to review the output and make adjustments. But the tedious restructuring (turning a monolithic script into a declarative pipeline) is exactly the kind of work LLMs can handle if you set them up properly.</p>
<p>If you want LLMs to help with your data science work:</p>
<ol type="1">
<li><strong>Give them context.</strong> Use tools like <code>pkgctx</code> to feed them API specifications. Paste your existing code. Show them examples.</li>
<li><strong>Scope the task tightly.</strong> “Translate this script into a rixpress pipeline” is a well-defined task. “Make my code better” is not.</li>
<li><strong>Review the output.</strong> LLMs do grunt work; you provide expertise.</li>
</ol>
<p>If you’re not familiar with <code>{rixpress}</code>, check out <a href="../posts/2025-03-20-announcing_rixpress.html">my announcement post</a> or the <a href="../posts/2025-10-23-rixpress_cran.html">CRAN release post</a>. And if you want to give LLMs context about R or Python packages, <a href="https://github.com/b-rodrigues/pkgctx">pkgctx</a> is there to help. For those who want to dive deeper into Nix, <code>{rix}</code>, and <code>{rixpress}</code>, I’ve recently submitted a paper to the Journal of Statistical Software, which you can read <a href="https://b-rodrigues.github.io/rix_paper/">here</a>. For more examples of <code>{rixpress}</code> pipelines, check out the <a href="https://github.com/b-rodrigues/rixpress_demos">rixpress_demos</a> repository.</p>
<p>LLMs aren’t going anywhere: the genie is out of the bottle. I still see plenty of people online claiming that LLMs aren’t useful, but I genuinely believe it comes down to one of two things:</p>
<ul>
<li>They’re not providing enough context or scoping their tasks well enough.</li>
<li>They have a principled objection to LLMs, AI, and automation in general which, ok, whatever, but it’s not a technical argument about usefulness.</li>
</ul>
<p>Some people might even say that to feel good about themselves: <em>what I program is much too complex and important for mere LLMs to be able to help me</em>. Ok perhaps, but not all of us are working for NASA&nbsp;or whatever. I’ll keep on outsourcing the tedious grunt work to LLMs.</p>


</section>

 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2026-01-13-data_science_llm_age.html</guid>
  <pubDate>Tue, 13 Jan 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Toy Post: Per-Post Nix Dependencies</title>
  <dc:creator>Bruno Rodrigues</dc:creator>
  <link>https://b-rodrigues.github.io/posts/2025-12-31-toy-post.html</link>
  <description><![CDATA[ 




<p>This is a toy post demonstrating per-post Nix dependencies.</p>
<p>This post has a corresponding <code>2025-12-31-toy-post.nix</code> file that adds the <code>purrr</code> package to the environment.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(purrr)</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use purrr to demonstrate it's available</span></span>
<span id="cb1-4">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> .x<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(result)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1]  1  4  9 16 25</code></pre>
</div>
</div>
<p>The <code>purrr</code> package is not in the base <code>posts/default.nix</code>, but it’s available here because of the post-specific <code>.nix</code> file!</p>



 ]]></description>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2025-12-31-toy-post.html</guid>
  <pubDate>Wed, 31 Dec 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>What would a keyboard optimised for Luxembourgish look like? Now with an actual keyboard!</title>
  <dc:creator>Bruno Rodrigues</dc:creator>
  <link>https://b-rodrigues.github.io/posts/2025-12-31-qwertz-lux.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a> <img src="https://b-rodrigues.github.io/assets/img/qwertz-lux.png" style="width: 100%; height: auto;"> </a>
</p>
</div>
<section id="an-optimised-layout-for-a-polyglot-country" class="level2">
<h2 class="anchored" data-anchor-id="an-optimised-layout-for-a-polyglot-country">An optimised layout for a polyglot country</h2>
<p><a href="https://brodrigues.co/posts/2020-03-26-bepo_lu.html">5 years ago</a> I discussed what features a keyboard optimised for Luxembourg should have. I’m talking about Luxembourg the country, and not Luxembourgish the language, because in Luxembourg no one types only Luxembourgish. The most commonly typed language is probably French, and English a close second. German follows in third and then Luxembourgish (some people might argue that, in their own experience, German is more widely typed than French, but I seriously doubt it). Whatever the actual ranking, an optimised keyboard for Luxembourg should be a keyboard in which typing these 4 languages is comfortable. That blog post from 5 years ago dug into that (letter frequency and so on), but until today, I left it on the back burner.</p>
<p>Now, with LLMs it was trivial to continue working on this, implement an effort model, and use an optimisation algorithm to minimise said effort. Today, I can present the layout I’ve obtained:</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://b-rodrigues.github.io/posts/2025-12-31-qwertz-lux_files/figure-html/heatmap-qwertz-lux-intro-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>(there is an error in the layout shown here, there are two “G”s; the one on the right of E should not be there, and instead should be a “,”. This is a bug in the function that creates the heatmap, not in the layout itself.)</p>
<blockquote class="blockquote">
<p>⚠️ <strong>This is a work in progress!</strong> The layout, the name, and even the methodology are all open for discussion. I’m sharing this early because I’d love to hear from the community, especially those who type in multiple languages daily. Your feedback will shape the final version.</p>
</blockquote>
<p>Let me also clarify what my goal with this is: do I want to actually suggest a layout that shall be adopted nationally and become a standard? Well, maybe. But I definitely won’t do it alone. So if you’re interested, we could work together and write a layout that could actually be used by different operating systems and pitch it to decision makers.</p>
</section>
<section id="france-shows-the-way" class="level2">
<h2 class="anchored" data-anchor-id="france-shows-the-way">France Shows the Way</h2>
<p>In 2019, France officially adopted two new keyboard standards:</p>
<ol type="1">
<li><strong>A revised AZERTY</strong> that fixes many issues while maintaining familiarity</li>
<li><strong>BÉPO</strong> — a completely redesigned layout optimised for French</li>
</ol>
<p>I’ve been using BÉPO for years, and I can honestly say it’s a game-changer. Typing feels more natural, there’s less finger movement, and the learning curve, while steep at first, pays off significantly.</p>
<p>But here’s the thing: BÉPO was designed primarily for French. Luxembourg is unique, because we need something that works for French <em>and</em> German <em>and</em> Luxembourgish <em>and</em> English.</p>
</section>
<section id="the-qwertz-lux-approach" class="level2">
<h2 class="anchored" data-anchor-id="the-qwertz-lux-approach">The QWERTZ-LUX Approach</h2>
<p>So I started wondering: what would an optimised keyboard layout for Luxembourg look like? You can read about my findings <a href="https://brodrigues.co/posts/2020-03-26-bepo_lu.html">here</a>, but basically, any optimised layout for any of these 4 languages would actually work rather well; this is because 3 out of these 4 languages are Germanic languages, but also, the top 10 most used characters for these 4 languages are essentially the same.</p>
<p>So I had three options:</p>
<ol type="1">
<li>Simply propose that we also adopt BÉPO</li>
<li>Create a fully optimised layout using these 4 languages for setting up the optimisation problem</li>
<li>Keep QWERTZ as a base for familiarity and only move around a minimal set of characters to improve comfort.</li>
</ol>
<p>I chose a middle path, inspired by Carpalx’s <a href="https://mk.bcgsc.ca/carpalx/?partial_optimization">partial optimization</a> philosophy. Martin Krzywinski demonstrated that you don’t need to remap the entire keyboard to reap most of the rewards. Just 5 strategic key swaps can achieve <strong>70% of the effort reduction</strong> of a fully optimised layout, while the learning curve remains manageable.</p>
<p>QWERTZ-LUX (final name pending) follows this principle. It’s essentially a fork of BÉPO’s structure, but with modifications that:</p>
<ul>
<li>Keep familiar elements for QWERTZ users (Ctrl+Z, Ctrl+X, Ctrl+C, Ctrl+V shortcuts)</li>
<li>Optimize the letter arrangement for our multilingual corpus</li>
<li>Provide direct access to accented characters (é, à, ü, ö, ç, ä) without dead keys</li>
</ul>
<p>The home row had to change substantially, because QWERTZ’s home row is simply too inefficient. But I tried to minimize disruption elsewhere, following the partial optimization insight that the first few key swaps matter most. Setting up the optimisation problem with these constraints resulted in a keyboard layout that stays familiar, while being much more efficient.</p>
</section>
<section id="the-corpus-what-does-luxembourg-type" class="level2">
<h2 class="anchored" data-anchor-id="the-corpus-what-does-luxembourg-type">The Corpus: What Does Luxembourg Type?</h2>
<p>To evaluate keyboard layouts fairly, we need text that reflects what people actually type in Luxembourg. I created a balanced multilingual corpus with:</p>
<ul>
<li><strong>30% French</strong> — Administrative and business language</li>
<li><strong>30% English</strong> — International communication</li>
<li><strong>20% German</strong> — Media and education</li>
<li><strong>20% Luxembourgish</strong> — Daily life and national identity</li>
</ul>
<p>Again, your mileage may vary. When I first started working in Luxembourg, I only used English and French. Now I also use Luxembourgish more often, and I know people who only write French. Either way, a national keyboard layout should be able to be comfortable for any of these languages.</p>
<p>What actually matters is the top letters across all languages, which are <strong>E, N, S, T, R, I, A</strong>, the letters you’d want on your home row.</p>
</section>
<section id="the-effort-model-explained" class="level2">
<h2 class="anchored" data-anchor-id="the-effort-model-explained">The Effort Model Explained</h2>
<p>Before comparing layouts, let’s understand how we measure “typing effort”. I wrote an R package called <strong>lbkeyboard</strong> that implements a model inspired by <a href="https://mk.bcgsc.ca/carpalx/?typing_effort">Carpalx</a>, the seminal keyboard layout optimization tool created by Martin Krzywinski.</p>
<section id="carpalx-vs-lbkeyboard" class="level3">
<h3 class="anchored" data-anchor-id="carpalx-vs-lbkeyboard">Carpalx vs lbkeyboard</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 27%">
<col style="width: 31%">
<col style="width: 41%">
</colgroup>
<thead>
<tr class="header">
<th>Aspect</th>
<th>Carpalx</th>
<th>lbkeyboard</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Unit of analysis</strong></td>
<td>Triads (3-key sequences)</td>
<td>Bigrams + individual keys</td>
</tr>
<tr class="even">
<td><strong>Base effort</strong></td>
<td>Finger travel distance</td>
<td>Position-based (row + column)</td>
</tr>
<tr class="odd">
<td><strong>Penalties</strong></td>
<td>Hand, row, finger costs</td>
<td>Same-finger, same-hand, row-change</td>
</tr>
<tr class="even">
<td><strong>Stroke path</strong></td>
<td>Complex path penalties</td>
<td>Simplified via bigram weights</td>
</tr>
<tr class="odd">
<td><strong>Layer support</strong></td>
<td>Not built-in</td>
<td>Shift, AltGr, dead key penalties</td>
</tr>
</tbody>
</table>
<p>Carpalx uses triads and a sophisticated stroke path model. I simplify this to bigrams (2-key sequences) for faster computation while retaining the key ergonomic insights.</p>
</section>
<section id="our-model-components" class="level3">
<h3 class="anchored" data-anchor-id="our-model-components">Our Model Components</h3>
</section>
<section id="base-effort-weight-3.0" class="level3">
<h3 class="anchored" data-anchor-id="base-effort-weight-3.0">1. Base Effort (Weight: 3.0)</h3>
<p>Every key has an inherent effort based on position. Home row keys (where fingers rest) have the lowest effort; reaching up to the top row or down to the bottom row costs more. Finger strength matters too: index fingers are stronger than pinkies.</p>
<p><strong>Example</strong>: Typing “e” on QWERTZ (top row, middle finger reach) costs more than typing “j” (home row, index finger)—even though we use “e” 100× more often!</p>
</section>
<section id="same-finger-bigram-penalty-weight-3.0" class="level3">
<h3 class="anchored" data-anchor-id="same-finger-bigram-penalty-weight-3.0">2. Same-Finger Bigram Penalty (Weight: 3.0)</h3>
<p>Using the same finger twice in a row is slow and uncomfortable. The model heavily penalizes these sequences.</p>
<p><strong>Example</strong>: Typing “de” on QWERTZ uses the same finger (middle finger, D→E). This is penalized. On BÉPO, these letters are on different hands, so no penalty.</p>
</section>
<section id="same-hand-penalty-weight-0.5" class="level3">
<h3 class="anchored" data-anchor-id="same-hand-penalty-weight-0.5">3. Same-Hand Penalty (Weight: 0.5)</h3>
<p>Consecutive keys on the same hand are slightly harder than alternating hands, because one hand must do all the work.</p>
</section>
<section id="row-change-penalty-weight-0.5" class="level3">
<h3 class="anchored" data-anchor-id="row-change-penalty-weight-0.5">4. Row Change Penalty (Weight: 0.5)</h3>
<p>Jumping between rows (e.g., top→bottom) requires more finger travel than staying on the same row.</p>
</section>
<section id="layer-penalties" class="level3">
<h3 class="anchored" data-anchor-id="layer-penalties">5. Layer Penalties</h3>
<p>Characters requiring modifier keys (Shift, AltGr, dead keys) incur extra effort: - <strong>Shift</strong>: +5% effort - <strong>AltGr</strong>: +20% effort - <strong>Dead key</strong> (e.g., ˆ + o = ô): +40% effort (two keystrokes!)</p>
</section>
<section id="putting-it-together" class="level3">
<h3 class="anchored" data-anchor-id="putting-it-together">Putting It Together</h3>
<p>The total effort formula is:</p>
<pre><code>Total = Σ (base × frequency) + Σ (same_finger_penalty) + Σ (same_hand_penalty)
        + Σ (row_change_penalty) + Σ (layer_penalties)</code></pre>
<p><strong>Numerical Example</strong>: Consider typing “the” (one of the most common English trigrams):</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Layout</th>
<th>Key Positions</th>
<th>Same-Finger?</th>
<th>Row Changes</th>
<th>Effort</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>QWERTZ</td>
<td>T(top), H(home), E(top)</td>
<td>No</td>
<td>2</td>
<td>Moderate</td>
</tr>
<tr class="even">
<td>BÉPO</td>
<td>T(home), H(bottom), E(home)</td>
<td>No</td>
<td>1</td>
<td><strong>Low</strong></td>
</tr>
</tbody>
</table>
<p>BÉPO places two letters on the home row, and two characters are typed using index finger (T and E), so it’s very comfortable to type.</p>
</section>
<section id="a-concrete-example-the-pangram" class="level3">
<h3 class="anchored" data-anchor-id="a-concrete-example-the-pangram">A Concrete Example: The Pangram</h3>
<p>Let’s calculate actual effort scores for “the quick brown fox jumps over the lazy dog” (a pangram using every letter of the alphabet):</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load layouts</span></span>
<span id="cb2-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ch_qwertz"</span>)</span>
<span id="cb2-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"afnor_bepo"</span>)</span>
<span id="cb2-4">qwertz_lux <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_qwertz_lux_keyboard</span>()</span>
<span id="cb2-5"></span>
<span id="cb2-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Effort weights (same as used throughout)</span></span>
<span id="cb2-7">effort_weights <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb2-8">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">base =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.0</span>,</span>
<span id="cb2-9">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">same_finger =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.0</span>,</span>
<span id="cb2-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">same_hand =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>,</span>
<span id="cb2-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">row_change =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>,</span>
<span id="cb2-12">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">trigram =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span></span>
<span id="cb2-13">)</span>
<span id="cb2-14"></span>
<span id="cb2-15">pangram <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"the quick brown fox jumps over the lazy dog"</span></span>
<span id="cb2-16"></span>
<span id="cb2-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Calculate effort for each layout</span></span>
<span id="cb2-18">pangram_qwertz <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">calculate_layout_effort</span>(ch_qwertz, pangram, </span>
<span id="cb2-19">                                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">keys_to_evaluate =</span> letters,</span>
<span id="cb2-20">                                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">effort_weights =</span> effort_weights)</span>
<span id="cb2-21">pangram_bepo <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">calculate_layout_effort</span>(afnor_bepo, pangram,</span>
<span id="cb2-22">                                        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">keys_to_evaluate =</span> letters,</span>
<span id="cb2-23">                                        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">effort_weights =</span> effort_weights)</span>
<span id="cb2-24">pangram_lux <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">calculate_layout_effort</span>(qwertz_lux<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>base, pangram,</span>
<span id="cb2-25">                                       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">keys_to_evaluate =</span> letters,</span>
<span id="cb2-26">                                       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">effort_weights =</span> effort_weights)</span>
<span id="cb2-27"></span>
<span id="cb2-28"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(</span>
<span id="cb2-29">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Layout =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"QWERTZ"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"BÉPO"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"QWERTZ-LUX"</span>),</span>
<span id="cb2-30">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Effort =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(pangram_qwertz, pangram_bepo, pangram_lux), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>),</span>
<span id="cb2-31">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">vs QWERTZ</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>((<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(pangram_qwertz, pangram_bepo, pangram_lux) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> </span>
<span id="cb2-32">                               pangram_qwertz) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%"</span>),</span>
<span id="cb2-33">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">check.names =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb2-34">)</span>
<span id="cb2-35"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#&gt;       Layout Effort vs QWERTZ</span></span>
<span id="cb2-36"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#&gt; 1     QWERTZ 338.69        0%</span></span>
<span id="cb2-37"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#&gt; 2       BÉPO 268.79     20.6%</span></span>
<span id="cb2-38"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#&gt; 3 QWERTZ-LUX 303.08     10.5%</span></span></code></pre></div>
</div>
<p>Even for this short 35-letter sentence, the optimised layouts require measurably less effort.</p>
</section>
</section>
<section id="how-much-better-is-it" class="level2">
<h2 class="anchored" data-anchor-id="how-much-better-is-it">How Much Better Is It?</h2>
<p>Using this effort model (with the weights shown above), here’s how the layouts compare:</p>
<div class="cell">
<div class="cell-output-display">
<!-- preamble start -->

    <script>

      function styleCell_on88mmpg0ou10w1hfvrg(i, j, css_id) {
          var table = document.getElementById("tinytable_on88mmpg0ou10w1hfvrg");
          var cell = table.rows[i]?.cells[j];  // Safe navigation to avoid errors
          if (cell) {
              console.log(`Styling cell at (${i}, ${j}) with class ${css_id}`);
              cell.classList.add(css_id);
          } else {
              console.warn(`Cell at (${i}, ${j}) not found.`);
          }
      }
      function insertSpanRow(i, colspan, content) {
        var table = document.getElementById('tinytable_on88mmpg0ou10w1hfvrg');
        var newRow = table.insertRow(i);
        var newCell = newRow.insertCell(0);
        newCell.setAttribute("colspan", colspan);
        // newCell.innerText = content;
        // this may be unsafe, but innerText does not interpret <br>
        newCell.innerHTML = content;
      }
      function spanCell_on88mmpg0ou10w1hfvrg(i, j, rowspan, colspan) {
        var table = document.getElementById("tinytable_on88mmpg0ou10w1hfvrg");
        const targetRow = table.rows[i];
        const targetCell = targetRow.cells[j];
        for (let r = 0; r < rowspan; r++) {
          // Only start deleting cells to the right for the first row (r == 0)
          if (r === 0) {
            // Delete cells to the right of the target cell in the first row
            for (let c = colspan - 1; c > 0; c--) {
              if (table.rows[i + r].cells[j + c]) {
                table.rows[i + r].deleteCell(j + c);
              }
            }
          }
          // For rows below the first, delete starting from the target column
          if (r > 0) {
            for (let c = colspan - 1; c >= 0; c--) {
              if (table.rows[i + r] && table.rows[i + r].cells[j]) {
                table.rows[i + r].deleteCell(j);
              }
            }
          }
        }
        // Set rowspan and colspan of the target cell
        targetCell.rowSpan = rowspan;
        targetCell.colSpan = colspan;
      }
      // tinytable span after
      window.addEventListener('load', function () {
          var cellsToStyle = [
            // tinytable style arrays after
          { positions: [ { i: 3, j: 0 }, { i: 3, j: 1 }, { i: 3, j: 2 }, { i: 3, j: 3 }, { i: 3, j: 4 },  ], css_id: 'tinytable_css_q738uiagwmrnj98zlysu',}, 
          { positions: [ { i: 0, j: 0 }, { i: 0, j: 1 }, { i: 0, j: 2 }, { i: 0, j: 3 }, { i: 0, j: 4 },  ], css_id: 'tinytable_css_3zqx25g8iiiymlzocot4',}, 
          ];

          // Loop over the arrays to style the cells
          cellsToStyle.forEach(function (group) {
              group.positions.forEach(function (cell) {
                  styleCell_on88mmpg0ou10w1hfvrg(cell.i, cell.j, group.css_id);
              });
          });
      });
    </script>

    <style>
      /* tinytable css entries after */
      .table td.tinytable_css_q738uiagwmrnj98zlysu, .table th.tinytable_css_q738uiagwmrnj98zlysu { border-bottom: solid #d3d8dc 0.1em; }
      .table td.tinytable_css_3zqx25g8iiiymlzocot4, .table th.tinytable_css_3zqx25g8iiiymlzocot4 { border-top: solid #d3d8dc 0.1em; border-bottom: solid #d3d8dc 0.05em; }
    </style>
    <div class="container">
      <table class="table table-borderless" id="tinytable_on88mmpg0ou10w1hfvrg" style="width: auto; margin-left: auto; margin-right: auto;" data-quarto-disable-processing="true">
        <thead>
        
              <tr>
                <th scope="col">Layout</th>
                <th scope="col">Effort Score</th>
                <th scope="col">Hand Balance (L/R %)</th>
                <th scope="col">Relative (%)</th>
                <th scope="col">Improvement vs QWERTZ</th>
              </tr>
        </thead>
        
        <tbody>
                <tr>
                  <td>BÉPO          </td>
                  <td> 633624.9</td>
                  <td>43.6/56.4</td>
                  <td>100.0</td>
                  <td>42.1%</td>
                </tr>
                <tr>
                  <td>QWERTZ-LUX    </td>
                  <td> 642547.9</td>
                  <td>52.1/47.9</td>
                  <td>101.4</td>
                  <td>41.3%</td>
                </tr>
                <tr>
                  <td>QWERTZ (Swiss)</td>
                  <td>1094468.2</td>
                  <td>59/41    </td>
                  <td>172.7</td>
                  <td>0%   </td>
                </tr>
        </tbody>
      </table>
    </div>
<!-- hack to avoid NA insertion in last line -->
</div>
</div>
</section>
<section id="visualizing-the-difference-heatmaps" class="level2">
<h2 class="anchored" data-anchor-id="visualizing-the-difference-heatmaps">Visualizing the Difference: Heatmaps</h2>
<p>Heatmaps show where your fingers spend time. Brighter colors = more keystrokes. Ideally, you want the brightness concentrated on the home row.</p>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://b-rodrigues.github.io/posts/2025-12-31-qwertz-lux_files/figure-html/heatmaps-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://b-rodrigues.github.io/posts/2025-12-31-qwertz-lux_files/figure-html/heatmap-qwertz-lux-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<div class="cell">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://b-rodrigues.github.io/posts/2025-12-31-qwertz-lux_files/figure-html/heatmap-bepo-1.png" class="img-fluid figure-img" width="768"></p>
</figure>
</div>
</div>
</div>
<p>Notice how BÉPO and QWERTZ-LUX concentrate activity on the home row, while QWERTZ spreads effort across all rows.</p>
<section id="key-features" class="level3">
<h3 class="anchored" data-anchor-id="key-features">Key Features</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 50%">
<col style="width: 50%">
</colgroup>
<thead>
<tr class="header">
<th>Feature</th>
<th>Benefit</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Optimized home row</strong></td>
<td>High-frequency letters (E, T, N, R, I, S) within easy reach</td>
</tr>
<tr class="even">
<td><strong>Direct accents</strong></td>
<td>é, à, ü, ö, ç, ä accessible without dead key combinations</td>
</tr>
<tr class="odd">
<td><strong>ZXCV preserved</strong></td>
<td>Common shortcuts (Ctrl+Z/X/C/V) stay in familiar positions</td>
</tr>
<tr class="even">
<td><strong>BÉPO-compatible</strong></td>
<td>Same modifier layer philosophy as BÉPO</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="whats-next" class="level2">
<h2 class="anchored" data-anchor-id="whats-next">What’s Next?</h2>
<p>So I’d like to know what others think. Is this something you’d be interested in exploring further? The next step would be to write a <code>layout.ini</code> file for <em>Portable Keyboard Layout</em> for Windows, and write layout files for Linux and macOS, and have real people test it in real-world settings. We could see if it feels comfortable and what else could be optimised.</p>
<p>But is there a chance that such a layout will ever get adopted? After all, there are obstacles:</p>
<ul>
<li>People just don’t care and won’t ever switch</li>
<li>Even if people cared, Luxembourg is too small a market and keyboards will never ship with a Luxembourg-specific layout</li>
</ul>
<p>In my opinion, rolling out such a keyboard would require a phased and gradual approach:</p>
<ul>
<li>Have kids learn to touch type in school on the optimised layout</li>
<li>Distribute keyboard layout stickers, which are cheap</li>
</ul>
<p>After some years, the switch would be complete, so I think it’s feasible, but only if there is a community around this layout. Also, who knows, this layout could technically be adopted in Switzerland and other German-speaking nations (maybe as a variant without direct access to French accented letters).</p>
</section>
<section id="about-the-name" class="level2">
<h2 class="anchored" data-anchor-id="about-the-name">About the Name</h2>
<p>The current working name “QWERTZ-LUX” is descriptive but not final. The first letters of the layout are <strong>QWFOG</strong>, which doesn’t exactly roll off the tongue like “QWERTY” or “BÉPO.” Here are some alternatives I’m considering:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 40%">
<col style="width: 60%">
</colgroup>
<thead>
<tr class="header">
<th>Name</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>QWERTZ-LUX</strong></td>
<td>Current name; descriptive (QWERTZ variant for Luxembourg)</td>
</tr>
<tr class="even">
<td><strong>LÉTZ</strong></td>
<td>Short for <em>Lëtzebuergesch</em> (Luxembourgish); catchy and culturally meaningful</td>
</tr>
<tr class="odd">
<td><strong>LUXO</strong></td>
<td>Luxembourg Optimized</td>
</tr>
<tr class="even">
<td><strong>TRIGLOT</strong></td>
<td>Reflects the trilingual focus (French, German, Luxembourgish)</td>
</tr>
</tbody>
</table>
<p>What do you think? I’m genuinely open to suggestions: a good name helps adoption!</p>
<p>Oh, and happy new year I guess.</p>


</section>

 ]]></description>
  <category>R</category>
  <category>keyboard</category>
  <guid>https://b-rodrigues.github.io/posts/2025-12-31-qwertz-lux.html</guid>
  <pubDate>Wed, 31 Dec 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Of course, someone has to write imperative code to build reproducible data science pipelines. It doesn’t have to be you.</title>
  <link>https://b-rodrigues.github.io/posts/2025-10-29-imperative-vs-function.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a> <img src="https://b-rodrigues.github.io/assets/img/xkcd-nix.png" style="width: 100%; height: auto;"> </a>
</p>
</div>
<p><a href="https://brodrigues.co/posts/2025-10-23-rixpress_cran.html">Last time</a> I quickly introduced my latest package, <code>{rixpress}</code>, but I think that to really understand what <code>{rixpress}</code> brings to the table, one needs to solve the same problem without it. And incidentally, I think that this exercise also show what makes Nix actually so good.</p>
<p>The goal is to build a data science pipeline. The example here is purely illustrative, and compare a Nix-based approach to a non Nix-based approach. So, I built the same polyglot Real Business Cycle model pipeline twice. First, I did it without <code>{rixpress}</code> (nor <code>{rix}</code>), using a combination of Docker, Make, and a bunch of wrapper scripts. Then, I did it with <code>{rix}</code> and <code>{rixpress}</code>.</p>
<p>Both pipelines produce the exact same result. But the way to get there is fundamentally different.</p>
<section id="juggling-imperative-tools" class="level2">
<h2 class="anchored" data-anchor-id="juggling-imperative-tools">Juggling imperative tools</h2>
<p>Without Nix, you have to use language-specific package managers and tooling to first set up the environment. So for Python I’ve used <code>uv</code> (which is fantastic to be honest), then to install the right version of R I’ve used <code>rig</code> and a Posit CRAN snapshot for packages and for Julia I’ve simply downloaded a pre-compiled package of the version I needed, and used its built-in package manager to install specific versions of packages as well.</p>
<p>Also, to deal with system level dependencies, I’ve bundled everything inside a Docker image. This is a sketch of the <code>Dockerfile</code>:</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode dockerfile code-with-copy"><code class="sourceCode dockerfile"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add R repository and install specific version</span></span>
<span id="cb1-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">apt-get</span> update <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">apt-get</span> install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-y</span> software-properties-common</span>
<span id="cb1-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">add-apt-repository</span> ppa:...</span>
<span id="cb1-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-L</span> https://rig.r-pkg.org/... <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sh</span></span>
<span id="cb1-5"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">rig</span> add 4.5.1</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install Python with uv</span></span>
<span id="cb1-8"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-LsSf</span> https://astral.sh/uv/install.sh <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sh</span></span>
<span id="cb1-9"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">uv</span> python install 3.13</span>
<span id="cb1-10"></span>
<span id="cb1-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Download and extract Julia</span></span>
<span id="cb1-12"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-fsSL</span> https://julialang-s3.julialang.org/... <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-o</span> julia.tar.gz</span>
<span id="cb1-13"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-xzf</span> julia.tar.gz <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-C</span> /opt/</span>
<span id="cb1-14"></span>
<span id="cb1-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install packages for each language separately</span></span>
<span id="cb1-16"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">echo</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'options(repos = c(CRAN = ...))'</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> /root/.Rprofile</span>
<span id="cb1-17"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Rscript</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-e</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'install.packages(...)'</span></span>
<span id="cb1-18"></span>
<span id="cb1-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install Python packages using uv with specific versions for reproducibility.</span></span>
<span id="cb1-20"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">echo</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pandas==2.3.3"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> /tmp/requirements.txt <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb1-21">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">echo</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"scikit-learn==1.7.2"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;</span> /tmp/requirements.txt <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb1-22">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ... more packages ...</span></span>
<span id="cb1-23"></span>
<span id="cb1-24"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">uv</span> pip install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--no-cache</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-r</span> /tmp/requirements.txt <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb1-25">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rm</span> /tmp/requirements.txt</span>
<span id="cb1-26"></span>
<span id="cb1-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install specific versions of Julia packages for reproducibility</span></span>
<span id="cb1-28"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">julia</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-e</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'using Pkg; \</span></span>
<span id="cb1-29"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    Pkg.add(name="Arrow", version="2.8.0"); \</span></span></code></pre></div>
<p>This <em>traditional</em> approach feels like you’re a sysadmin first and a data scientist second. The <code>Dockerfile</code> is a long, step-by-step, imperative script of shell commands. You have to write <em>how</em> stuff needs to be installed, and this of course varies for each language. Each language needs its own special treatment, its own package installation command, and its own set of dependencies. For example, for Python, I actually even needed more configuration than what I’ve shown above:</p>
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode dockerfile code-with-copy"><code class="sourceCode dockerfile"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Ensure the installed binary is on the `PATH`</span></span>
<span id="cb2-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ENV</span> PATH=<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/root/.local/bin/:$PATH"</span></span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install the specified Python version using uv.</span></span>
<span id="cb2-5"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">uv</span> python install <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">${PYTHON_VERSION}</span></span>
<span id="cb2-6"></span>
<span id="cb2-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Setup default virtual env</span></span>
<span id="cb2-8"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">uv</span> venv /opt/venv</span>
<span id="cb2-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use the virtual environment automatically</span></span>
<span id="cb2-10"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ENV</span> VIRTUAL_ENV=/opt/venv</span>
<span id="cb2-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Place entry points in the environment at the front of the path</span></span>
<span id="cb2-12"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ENV</span> PATH=<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/opt/venv/bin:$PATH"</span></span></code></pre></div>
<p>This is because I needed to set the virtual environment installed by <code>uv</code> as the one to be used by default. This is ok inside Docker, but that’s not something you’d likely want to do on a real machine. The final <code>Dockerfile</code> for our “simple” example was over 100 lines long (including comments).</p>
<p>Now that the environment is said, we actually need to orchestrate the workflow. I’ve used <code>Make</code> for this, which means writing a <code>Makefile</code>. Honestly, nowadays, thanks to LLMs that’s not so much of an issue. But before LLMs, it would be quite annoying, because you need to manually define which file depends on which other file. Here’s what it looks like:</p>
<div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode makefile code-with-copy"><code class="sourceCode makefile"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ==============================================================================</span></span>
<span id="cb3-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Makefile for the Polyglot RBC Model Pipeline</span></span>
<span id="cb3-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ==============================================================================</span></span>
<span id="cb3-4"></span>
<span id="cb3-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define the interpreters for each language.</span></span>
<span id="cb3-6"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">JULIA</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> julia</span></span>
<span id="cb3-7"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">PYTHON</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> python</span></span>
<span id="cb3-8"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">RSCRIPT</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> Rscript</span></span>
<span id="cb3-9"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">QUARTO</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> quarto</span></span>
<span id="cb3-10"></span>
<span id="cb3-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define directory variables for better organization.</span></span>
<span id="cb3-12"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">DATA_DIR</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> data</span></span>
<span id="cb3-13"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">PLOTS_DIR</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> plots</span></span>
<span id="cb3-14"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">REPORT_DIR</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> report</span></span>
<span id="cb3-15"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">FUNCTIONS_DIR</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> functions</span></span>
<span id="cb3-16"></span>
<span id="cb3-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define the final and intermediate data files.</span></span>
<span id="cb3-18"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">SIMULATED_DATA</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">$(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">DATA_DIR</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">/simulated_rbc_data.arrow</span></span>
<span id="cb3-19"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">PREDICTIONS</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">$(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">DATA_DIR</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">/predictions.arrow</span></span>
<span id="cb3-20"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">FINAL_PLOT</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">$(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">PLOTS_DIR</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">/output_plot.png</span></span>
<span id="cb3-21"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">FINAL_REPORT</span> <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">:=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">$(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">REPORT_DIR</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">/readme.html</span></span>
<span id="cb3-22"></span>
<span id="cb3-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Main Rules ---</span></span>
<span id="cb3-24"></span>
<span id="cb3-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The default 'all' rule now points to the final compiled HTML report.</span></span>
<span id="cb3-26"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">all:</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;"> </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">$(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">FINAL_REPORT</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span></span>
<span id="cb3-27"></span>
<span id="cb3-28"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rule to render the final Quarto report.</span></span>
<span id="cb3-29"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Depends on the Quarto source file and the plot from the R step.</span></span>
<span id="cb3-30"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">$(FINAL_REPORT):</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;"> readme.qmd </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">$(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">FINAL_PLOT</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;"> | </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">$(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">REPORT_DIR</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span></span>
<span id="cb3-31"><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">    </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">@</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">echo </span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"--- [Quarto] Compiling final report ---"</span></span>
<span id="cb3-32">    <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">$(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">QUARTO</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span> render <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">$&lt;</span> --to html --output-dir <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">$(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">REPORT_DIR</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span></span>
<span id="cb3-33"></span>
<span id="cb3-34">... and so on ...</span></code></pre></div>
<p>That’s another 65 lines for the orchestration.</p>
<p>Finally, and probably worst of all, is that you end up writing tons of “glue code.” Because <code>make</code> just runs scripts, every step of your analysis (the Julia simulation, the Python training) needs to be wrapped in a script that does nothing but parse command-line arguments, read an input file, call your <em>actual</em> analysis function, and write an output file. That’s a lot of code just to get things talking to each other.</p>
<p>The final tally for the traditional, imperative, approach? <strong>Nine separate files</strong> just to manage the environment and run the pipeline. It’s a fragile, complicated house of cards, but it takes only 3 minutes to run on a standard Ubuntu GitHub Actions runner.</p>
</section>
<section id="nix-declarative-simple-and-clean" class="level2">
<h2 class="anchored" data-anchor-id="nix-declarative-simple-and-clean">Nix: Declarative, Simple, and Clean</h2>
<p>Nix makes this whole process so much easier, it’s actually not even fair. Instead of telling the computer <em>how</em> to do everything, you just declare <em>what</em> you want. You describe your requirements, and Nix figures the rest out. But because Nix is not that easy to get into, I wrotk the <code>{rix}</code> and <code>{rixpress}</code> packages as high-level interfaces to Nix’s power.</p>
<p>For example, to set up the environment, you just list the R, Python, and Julia packages you need, and <code>{rix}</code> handles everything else. It figures out how to install them, resolves all the system-level dependencies, and generates the complex Nix expression for you. You don’t need to be a sysadmin; you just need to know what packages your analysis requires. This is because all the <em>sysadminy</em> work was handled upstream by Nix package maintainers (real MVPs); Nix maintainers encode the build recipes, dependency graphs, and patches needed for each package, so you don’t have to. (Reminds me of this quote from <a href="https://speakerdeck.com/jennybc/row-oriented-workflows-in-r-with-the-tidyverse?slide=16">Jenny Bryan</a>: <em>Of course, someone has to write for loops. It doesn’t have to be you</em>, but here it’s unglamorous Nix code to make packages work well instead of loops.)</p>
<p>Here’s what the <code>gen-env.R</code> script looks like:</p>
<div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rix</span>(</span>
<span id="cb4-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2025-10-14"</span>,</span>
<span id="cb4-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">r_pkgs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ggplot2"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"arrow"</span>),</span>
<span id="cb4-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">jl_conf =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">jl_version =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lts"</span>, ...),</span>
<span id="cb4-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">py_conf =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">py_version =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"3.13"</span>, ...),</span>
<span id="cb4-6">  ...</span>
<span id="cb4-7">)</span></code></pre></div>
<p>Then, for the pipeline, it’s the same story. You just write what you need, not how it’s done. Nix can handle this. Here’s what the <code>gen-pipeline.R</code> script looks like:</p>
<div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># gen-pipeline.R - a small part</span></span>
<span id="cb5-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_jl</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> simulated_rbc_data, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"simulate_rbc_model(...)"</span>),</span>
<span id="cb5-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> predictions, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train_model(simulated_rbc_data)"</span>),</span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> output_plot, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"plot_predictions(predictions)"</span>),</span>
<span id="cb5-6">  ...</span>
<span id="cb5-7">)</span></code></pre></div>
<p>Dependencies are inferred automatically. <code>{rixpress}</code> sees that <code>predictions</code> uses the <code>simulated_rbc_data</code> object and knows to run the Julia step first. It handles all the I/O for you as well. Objects get serialised and unserialised transparently for you.</p>
<p>Your scientific code now lives in pure functions, free of any command-line parsing or file I/O. You can focus entirely on the analysis.</p>
<p>The final tally for the Nix-based approach? <strong>Six files</strong>, and four of them (<code>gen-env.R</code>, <code>gen-pipeline.R</code>, and the two <code>functions</code> files) are simple, clean declarations of what you need and what you want to do. The whole set up of the environment and execution of the pipeline takes 5 minutes on a standard GitHub Actions runner. That’s 2 minutes longer that the imperative approach, but I think it’s a small price to pay. Plus, you’re not setting up the environment from scratch each time you execute the pipeline, so subsequent executions will only take seconds.</p>
<p>The biggest difference isn’t just the simplicity; it’s the guarantee. The Docker approach gives you reproducibility <em>today</em>. But a year from now, if you rebuild the <code>Dockerfile</code>, mutable base images and shifting package dependencies mean you might get a subtly different environment. The underlying base Docker image will change, and in some years, will completely stop functioning (Ubuntu 24.04, which is quite often used as the base image, will reach of end of life in 2029).</p>
<p>The Nix approach, by pinning everything to a specific date, gives you <strong>temporal reproducibility</strong>. Your environment will build the exact same way today, next year, or five years from now, for as long as the <code>nixpkgs</code> GitHub repository will stay online (we can hope for a 1000 years if Microsoft doesn’t fuck it up). It’s a level of long-term stability that the traditional stack simply can’t match without a heroic amount of manual effort. But also, it’s just so much <em>simpler</em>!</p>


</section>

 ]]></description>
  <category>R</category>
  <category>Nix</category>
  <guid>https://b-rodrigues.github.io/posts/2025-10-29-imperative-vs-function.html</guid>
  <pubDate>Wed, 29 Oct 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Orchestrating Polyglot, Reproducible Data Science with Nix and {rixpress}</title>
  <link>https://b-rodrigues.github.io/posts/2025-10-23-rixpress_cran.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a> <img src="https://b-rodrigues.github.io/assets/img/polyglot-dag.png" style="width: 100%; height: auto;"> </a>
</p>
</div>
<p><em>TL;DR: <code>{rixpress}</code> lets you build multi-language data pipelines (R, Python, Julia) where each step runs in its own reproducible environment. Uses Nix under the hood. Now on CRAN, and there’s even a Python port on PyPI!</em></p>
<p><code>{rixpress}</code> is now on CRAN! As discussed in previous blog posts, <code>{rixpress}</code> is a package heavily inspired by <code>{targets}</code> that uses Nix as the underlying build automation tool to build reproducible data science pipelines.</p>
<p>But I also wanted <code>{rixpress}</code> to be a language-agnostic build automation tool: pipelines do get defined as an R list, but they can include R, Julia and Python <em>derivations</em> (think of a derivation as a build step).</p>
<p><code>{rixpress}</code> allows you to define and execute complex, multi-language pipelines where each step runs in its own perfectly reproducible, hermetically sealed environment.</p>
<p>Because installing stuff is so easy with Nix, the cost of using Python or Julia for a project is really low. Before Nix, I’d try my hardest to find equivalent R packages, just to avoid having to setup a Python environment, but now, if I really have to use Python, I don’t mind that much (also because since I can delegate writing Python to an LLM).</p>
<p>Suppose you have a project that uses Julia, Python and R: without Nix, <code>{rix}</code> and <code>{rixpress}</code>, setting everything up and executing the code is going to be quite annoying. But with the aforementioned tools? Easy as pie.</p>
<p>Let’s consider an example from economics, where Julia is used to define a structural Real Business Cycle model (and simulate data from it), Python (with its package <code>xgboost</code>) is used to make predictions from the simulated data, and R to visualise, using <code>{ggplot2}</code>. In truth, one could have use just one of these three languages, but for the sake of argument, let’s use them all.</p>
<p>With <code>{rixpress}</code>, this entire polyglot workflow is defined declaratively in a single R script. Each step is a function call, making the pipeline easy to read and manage.</p>
<p>Start the project with <code>rixpress::rxp_init()</code>, which generates two files, <code>gen-env.R</code> and <code>gen-pipeline.R</code>. In <code>gen-env.R</code>, you’ll define the environment you need:</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(rix)</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rix</span>(</span>
<span id="cb1-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Pin the environment to a specific date to ensure that all package</span></span>
<span id="cb1-5">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># versions are resolved as they were on this day.</span></span>
<span id="cb1-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2025-10-14"</span>,</span>
<span id="cb1-7"></span>
<span id="cb1-8">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 1. R Packages</span></span>
<span id="cb1-9">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We need packages for plotting, data manipulation, and reading arrow files.</span></span>
<span id="cb1-10">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We also include reticulate as it can be useful for rixpress internals.</span></span>
<span id="cb1-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">r_pkgs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb1-12">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ggplot2"</span>,</span>
<span id="cb1-13">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ggdag"</span>,</span>
<span id="cb1-14">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr"</span>,</span>
<span id="cb1-15">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"arrow"</span>,</span>
<span id="cb1-16">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rix"</span>,</span>
<span id="cb1-17">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rixpress"</span>,</span>
<span id="cb1-18">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"quarto"</span></span>
<span id="cb1-19">  ),</span>
<span id="cb1-20"></span>
<span id="cb1-21">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2. Julia Configuration</span></span>
<span id="cb1-22">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We specify the Julia version and the list of packages needed</span></span>
<span id="cb1-23">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># for our manual RBC model simulation.</span></span>
<span id="cb1-24">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">jl_conf =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-25">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">jl_version =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lts"</span>,</span>
<span id="cb1-26">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">jl_pkgs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb1-27">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Distributions"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For creating random shocks</span></span>
<span id="cb1-28">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"DataFrames"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For structuring the output</span></span>
<span id="cb1-29">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Arrow"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For saving the data in a cross-language format</span></span>
<span id="cb1-30">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Random"</span></span>
<span id="cb1-31">    )</span>
<span id="cb1-32">  ),</span>
<span id="cb1-33"></span>
<span id="cb1-34">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 3. Python Configuration</span></span>
<span id="cb1-35">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We specify the Python version and the packages needed for the</span></span>
<span id="cb1-36">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># machine learning step.</span></span>
<span id="cb1-37">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">py_conf =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-38">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">py_version =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"3.13"</span>,</span>
<span id="cb1-39">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">py_pkgs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb1-40">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pandas"</span>,</span>
<span id="cb1-41">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"scikit-learn"</span>,</span>
<span id="cb1-42">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"xgboost"</span>,</span>
<span id="cb1-43">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pyarrow"</span>,</span>
<span id="cb1-44">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ryxpress"</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Python port of rixpress</span></span>
<span id="cb1-45">    )</span>
<span id="cb1-46">  ),</span>
<span id="cb1-47"></span>
<span id="cb1-48">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We set the IDE to 'none' for a minimal environment. You could change</span></span>
<span id="cb1-49">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this to "rstudio" if you prefer to work interactively in RStudio.</span></span>
<span id="cb1-50">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ide =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>,</span>
<span id="cb1-51"></span>
<span id="cb1-52">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define the project path and allow overwriting the default.nix file.</span></span>
<span id="cb1-53">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">project_path =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"."</span>,</span>
<span id="cb1-54">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb1-55">)</span></code></pre></div>
<p>If you are on a system where Nix is available, you can drop into a temporary shell with R and <code>{rix}</code> available to generate the required <code>default.nix</code> (which is the Nix expression that once built, provides the environment):</p>
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">nix-shell</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-I</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb2-2">  nixpkgs=https://github.com/rstats-on-nix/nixpkgs/tarball/2025-10-20 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-p</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb2-3">  R rPackages.rix</span></code></pre></div>
<p>then simply start R, and then <code>source("gen-env.R")</code>. This will generate the <code>default.nix</code>. Then leave R, leave the temporary shell (by typing <code>exit</code> or using <code>CTRL-D</code>) and build the environment with <code>nix-build</code>. Wait for it to finish. Then we can tackle the pipeline. I show the full script below, but you won’t be writing this in one go. Instead, you would add a derivation, build the pipeline, load the artefact into memory by using <code>rxp_load("artefact_name")</code>, look at it, play with it, and then continue. If you’re familiar with <code>{targets}</code> you should feel at ease.</p>
<p>Here’s the full script:</p>
<div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This script defines and orchestrates the entire reproducible analytical</span></span>
<span id="cb3-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># pipeline using the {rixpress} package.</span></span>
<span id="cb3-3"></span>
<span id="cb3-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(rixpress)</span>
<span id="cb3-5"></span>
<span id="cb3-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb3-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># STEP 0: Define RBC Model Parameters as Derivations</span></span>
<span id="cb3-8">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This makes the parameters an explicit part of the pipeline.</span></span>
<span id="cb3-9">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Changing a parameter will cause downstream steps to rebuild.</span></span>
<span id="cb3-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_jl</span>(alpha, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>), <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Capital's share of income</span></span>
<span id="cb3-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_jl</span>(beta, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.01</span>), <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Discount factor</span></span>
<span id="cb3-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_jl</span>(delta, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.025</span>), <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Depreciation rate</span></span>
<span id="cb3-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_jl</span>(rho, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.95</span>), <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Technology shock persistence</span></span>
<span id="cb3-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_jl</span>(sigma, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>), <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Risk aversion (log-utility)</span></span>
<span id="cb3-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_jl</span>(sigma_z, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>), <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Technology shock standard deviation</span></span>
<span id="cb3-16"></span>
<span id="cb3-17">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># STEP 1: Julia - Simulate a Real Business Cycle (RBC) model.</span></span>
<span id="cb3-18">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This derivation runs our Julia script to generate the source data.</span></span>
<span id="cb3-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_jl</span>(</span>
<span id="cb3-20">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> simulated_rbc_data,</span>
<span id="cb3-21">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"simulate_rbc_model(alpha, beta, delta, rho, sigma, sigma_z)"</span>,</span>
<span id="cb3-22">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">user_functions =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions/functions.jl"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The file containing the function</span></span>
<span id="cb3-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">encoder =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"arrow_write"</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The function to use for saving the output</span></span>
<span id="cb3-24">  ),</span>
<span id="cb3-25"></span>
<span id="cb3-26">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># STEP 2.1: Python - Prepare features (lagging data)</span></span>
<span id="cb3-27">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py</span>(</span>
<span id="cb3-28">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> processed_data,</span>
<span id="cb3-29">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"prepare_features(simulated_rbc_data)"</span>,</span>
<span id="cb3-30">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">user_functions =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions/functions.py"</span>,</span>
<span id="cb3-31">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Decode the Arrow file from Julia into a pandas DataFrame</span></span>
<span id="cb3-32">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">decoder =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"feather.read_feather"</span></span>
<span id="cb3-33">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Note: No encoder needed here. {rixpress} will use pickle by default</span></span>
<span id="cb3-34">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># to pass the DataFrame between Python steps.</span></span>
<span id="cb3-35">  ),</span>
<span id="cb3-36"></span>
<span id="cb3-37">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># STEP 2.2: Python - Split data into training and testing sets</span></span>
<span id="cb3-38">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py</span>(</span>
<span id="cb3-39">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> X_train,</span>
<span id="cb3-40">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"get_X_train(processed_data)"</span>,</span>
<span id="cb3-41">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">user_functions =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions/functions.py"</span></span>
<span id="cb3-42">  ),</span>
<span id="cb3-43"></span>
<span id="cb3-44">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py</span>(</span>
<span id="cb3-45">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> y_train,</span>
<span id="cb3-46">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"get_y_train(processed_data)"</span>,</span>
<span id="cb3-47">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">user_functions =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions/functions.py"</span></span>
<span id="cb3-48">  ),</span>
<span id="cb3-49"></span>
<span id="cb3-50">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py</span>(</span>
<span id="cb3-51">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> X_test,</span>
<span id="cb3-52">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"get_X_test(processed_data)"</span>,</span>
<span id="cb3-53">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">user_functions =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions/functions.py"</span></span>
<span id="cb3-54">  ),</span>
<span id="cb3-55"></span>
<span id="cb3-56">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py</span>(</span>
<span id="cb3-57">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> y_test,</span>
<span id="cb3-58">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"get_y_test(processed_data)"</span>,</span>
<span id="cb3-59">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">user_functions =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions/functions.py"</span></span>
<span id="cb3-60">  ),</span>
<span id="cb3-61"></span>
<span id="cb3-62">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># STEP 2.3: Python - Train the model</span></span>
<span id="cb3-63">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py</span>(</span>
<span id="cb3-64">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> trained_model,</span>
<span id="cb3-65">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train_model(X_train, y_train)"</span>,</span>
<span id="cb3-66">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">user_functions =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions/functions.py"</span></span>
<span id="cb3-67">  ),</span>
<span id="cb3-68"></span>
<span id="cb3-69">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># STEP 2.4: Python - Make predictions</span></span>
<span id="cb3-70">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py</span>(</span>
<span id="cb3-71">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> model_predictions,</span>
<span id="cb3-72">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"make_predictions(trained_model, X_test)"</span>,</span>
<span id="cb3-73">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">user_functions =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions/functions.py"</span></span>
<span id="cb3-74">  ),</span>
<span id="cb3-75"></span>
<span id="cb3-76">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># STEP 2.5: Python - Format final results for R</span></span>
<span id="cb3-77">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py</span>(</span>
<span id="cb3-78">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> predictions,</span>
<span id="cb3-79">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"format_results(y_test, model_predictions)"</span>,</span>
<span id="cb3-80">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">user_functions =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions/functions.py"</span>,</span>
<span id="cb3-81">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We need an encoder here to save the final DataFrame as an Arrow file</span></span>
<span id="cb3-82">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># so the R step can read it.</span></span>
<span id="cb3-83">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">encoder =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"save_arrow"</span></span>
<span id="cb3-84">  ),</span>
<span id="cb3-85"></span>
<span id="cb3-86">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># STEP 3: R - Visualize the predictions from the Python model.</span></span>
<span id="cb3-87">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This final derivation depends on the output of the Python step.</span></span>
<span id="cb3-88">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb3-89">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> output_plot,</span>
<span id="cb3-90">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_predictions</span>(predictions), <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The function to call from functions.R</span></span>
<span id="cb3-91">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">user_functions =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions/functions.R"</span>,</span>
<span id="cb3-92">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Specify how to load the upstream data (from Python) into R.</span></span>
<span id="cb3-93">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">decoder =</span> arrow<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>read_feather</span>
<span id="cb3-94">  ),</span>
<span id="cb3-95"></span>
<span id="cb3-96">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># STEP 4: Quarto - Compile the final report.</span></span>
<span id="cb3-97">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_qmd</span>(</span>
<span id="cb3-98">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> final_report,</span>
<span id="cb3-99">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">additional_files =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"_rixpress"</span>,</span>
<span id="cb3-100">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">qmd_file =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"readme.qmd"</span></span>
<span id="cb3-101">  )</span>
<span id="cb3-102">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-103">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_populate</span>(</span>
<span id="cb3-104">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">py_imports =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb3-105">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pandas =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"import pandas as pd"</span>,</span>
<span id="cb3-106">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pyarrow =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"import pyarrow.feather as feather"</span>,</span>
<span id="cb3-107">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sklearn =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"from sklearn.model_selection import train_test_split"</span>,</span>
<span id="cb3-108">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xgboost =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"import xgboost as xgb"</span></span>
<span id="cb3-109">    ),</span>
<span id="cb3-110">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">project_path =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"."</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The root of our project</span></span>
<span id="cb3-111">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">build =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set to TRUE to execute the pipeline immediately</span></span>
<span id="cb3-112">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">verbose =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb3-113">  )</span></code></pre></div>
<p><em>(helper functions are defined in separate scripts, inside the <code>functions/</code> folder which I don’t show here)</em></p>
<p>The magic here is twofold. First, <code>{rixpress}</code> seamlessly handles passing data between language environments, using efficient formats like Apache Arrow via <code>encoder</code> and <code>decoder</code> functions. Second, because each step is a Nix derivation, it runs in its own isolated environment. The Julia simulation can have its own dependencies, completely separate from the Python and R steps, eliminating “dependency hell” forever. Also, the artefacts built by the pipeline are actually children of the environment. Meaning, that if you change the environment (for example, by adding a package), this invalidates everything, and the whole pipeline gets rebuilt. This is quite useful, because sometimes changing the environment could break the downstream artefacts in subtle ways, but with classical build automation tools, the artefacts and the environment are not tied, and so a rebuild would not be triggered.</p>
<p>Once built, you can interactively explore artifacts:</p>
<div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># From R</span></span>
<span id="cb4-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_load</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"simulated_rbc_data"</span>)</span>
<span id="cb4-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_load</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"output_plot"</span>)</span></code></pre></div>
<div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Or from Python (using ryxpress)</span></span>
<span id="cb5-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> ryxpress <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rxp_make, rxp_load</span>
<span id="cb5-3">rxp_load(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"predictions"</span>)</span></code></pre></div>
<p>The pipeline automatically caches results, so changing one step only rebuilds what’s affected. <code>{rixpress}</code> (and <code>ryxpress</code>) will try its best to show you to convert objects seamlessly from R to Python and vice-versa. If you try to load an object built inside a Python environment, <code>{rixpress}</code> will use <code>{reticulate}</code> (if you’ve added it to the list of R packages) to convert it to an equivalent R object. From a Python session, if you added the <code>rds2py</code> Python package, the same will happen, but converting an R object into the equivalent Python object (since Python doesn’t have a native data frame implementation, use <code>biocframe</code> to convert from R data frames into Python <em>bioc</em>frames, which come with a method to convert to <code>pandas</code> or <code>polars</code> data frames).</p>
<p>You can find the code for this example <a href="https://github.com/b-rodrigues/rixpress_demos/blob/master/rbc/">here</a>.</p>
<p>If you’re primary a Python user, I think that you could still find <code>{rixpress}</code> useful. Defining the pipeline as an R list shouldn’t be too much of an issue, and you can explore the pipeline and artefacts with the Python port, <code>ryxpress</code>. This Python port makes it easy to build the pipeline and load and explore artefacts from a Python session.</p>
<p>Another Python-related caveat is that while Nix’s package repository, <code>nixpkgs</code>, is vast, the Python ecosystem (PyPI) is famously heterogeneous. Not every Python package or specific version you might need is available directly in <code>nixpkgs</code>.</p>
<p>To solve this, it is possible to install <code>uv</code>, a modern and fast Python package manager, with Nix, and let <code>uv</code> handle the Python packages and Python interpreter, but let Nix handle everything else:</p>
<div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rix</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2025-10-20"</span>,</span>
<span id="cb6-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">r_pkgs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rix"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"chronicler"</span>),</span>
<span id="cb6-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_pkgs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"uv"</span>),</span>
<span id="cb6-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">project_path =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"."</span>,</span>
<span id="cb6-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div>
<p>This approach gives you the best of both worlds: you use <code>{rix}</code> to define the core, reproducible environment. This includes, critical system libraries (like GDAL or HDF5), and all your R and Julia dependencies. This part of your environment is bit-for-bit reproducible. Then, within this Nix-managed environment, you use standard <code>uv</code> commands (e.g., <code>uv pip install pandas</code>) to manage your Python packages. <code>uv</code> creates a <code>uv.lock</code> file that pins the exact versions and hashes of your Python dependencies, ensuring a reproducible Python package set.</p>
<p>While this hybrid model trades the full build-time determinism of a pure-Nix approach for Python packages, it offers immense flexibility and solves the issue of <code>nixpkgs</code> not mirrorring PyPI.</p>
<p>I think that the biggest hurdle for <code>{rix}</code> and <code>{rixpress}</code> adoption for Python data scientists is their love of Jupyter Notebooks.</p>
<p>By the way, it’s possible to use an IDE alongside Nix and <code>{rix}</code> and <code>{rixpress}</code>. I think I’ll make a video for that, though, but for those of you that prefer reading, <a href="https://docs.ropensci.org/rix/articles/e-configuring-ide.html">read this</a>.</p>



 ]]></description>
  <category>R</category>
  <category>Nix</category>
  <guid>https://b-rodrigues.github.io/posts/2025-10-23-rixpress_cran.html</guid>
  <pubDate>Thu, 23 Oct 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Python needs its CRAN</title>
  <link>https://b-rodrigues.github.io/posts/2025-08-22-pypan.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a> <img src="https://b-rodrigues.github.io/assets/img/pkg_soyjak.png" style="width: 100%; height: auto;"> </a>
</p>
</div>
<p>How is it that in the year 2025 of our Lord installing a Python package is still such a gamble?</p>
<p>This post comes from someone that rarely uses Python, but consider the following:</p>
<ul>
<li>the rare times I need to use it, I’m often confronted to dependency hell (and if you think it’s a skill issue, hold that thought and keep reading);</li>
<li>I’m one of the maintainers of the R ecosystem for Nix, but also package some Python packages every once in a while for Nix.</li>
</ul>
<p>This last point is quite important, which I believe gives me a good perspective on the issue this blog post is about. When it comes to R packages, we know we can simply mirror CRAN and Bioconductor, as the upstream CRAN team already did a lot of curation efforts: we know packages work with each other. However, the same cannot be done for Python: the curation is on us.</p>
<p>If you use Python to analyse data (I’m sorry for you) you’ve probably hit this issue: you install one package that requires <code>numpy &lt; 2</code>, and another that requires <code>numpy &gt;= 2</code>. You’re <em>cooked</em>, as the youths say. The resolver can’t help you, because the requirements are literally incompatible. No one nor anything can help you. No amount of Rust-written package managers can help you. The problem is PyPI.</p>
<section id="cran-doesnt-tolerate-this-nonsense" class="level2">
<h2 class="anchored" data-anchor-id="cran-doesnt-tolerate-this-nonsense">CRAN doesn’t tolerate this nonsense</h2>
<p>In R, this situation simply doesn’t happen. Why? Because CRAN enforces a system where packages are tested not only in isolation, but against their reverse dependencies. If <code>{ggplot2}</code> or <code>{dplyr}</code> changes in a way that breaks others, CRAN catches it. Package authors get a warning, and if they don’t fix things (within 2 weeks!), their package gets archived, which means that when users try to install it with <code>install.packages("foo")</code>, it won’t work. Which means that if a package is on CRAN, <code>install.packages("foo")</code> will work. Not “works if you’re lucky.” Not “works if you pin the right versions.” It just works (of course, as long as the right system-level dependencies are available if you need to compile it, which isn’t an issue if you’re installing binaries though). Actually, you can’t even publish a package that has constraints on the version of its dependencies. Your package has to work with all packages on CRAN forever and ever. Honestly, quite impressive for something that’s not even a real programming language, right? (this is sarcastic btw)</p>
<p>And CRAN manages this consistency across <strong>27000 packages</strong>. PyPI is much bigger, granted, but I doubt that many more than 30k packages get actually used frequently. In fact, probably a couple thousand, maybe even a couple hundred do (especially for data analysis).</p>
</section>
<section id="pypi-is-a-warehouse-not-an-ecosystem" class="level2">
<h2 class="anchored" data-anchor-id="pypi-is-a-warehouse-not-an-ecosystem">PyPI is a warehouse, not an ecosystem</h2>
<p>PyPI doesn’t do this. It’s a dumping ground for tarballs and wheels. No global checks, no compatibility guarantees, no consistency across the ecosystem. If package A and package B declare mutually exclusive requirements, PyPI shrugs and hosts them both.</p>
<p>We then spend enormous effort building tools to try to deal with this mess: Conda, Poetry, Hatch, uv, pipx and Nix (well Nix was not specifically made for Python, but it can also be used to set up virtual environments). They’re all great tools, but they can’t solve the core problem: if the constraints themselves are impossible, no resolver can save you. At best, these tools give you a way to freeze a working mess before it collapses. Just pray to whichever deity you fancy that adding a new package down the line doesn’t explode your environment.</p>
<p>This is not an ecosystem. It’s chaos with good packaging tools.</p>
<p>But Nix does help a bit more; at least with Nix, you can patch a package’s <code>pyproject.toml</code> to try to relax imcompatible dependencies, like I did for <code>saiph</code>:</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode nix code-with-copy"><code class="sourceCode nix"><span id="cb1-1">postPatch = <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">''</span></span>
<span id="cb1-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  # Remove these constraints</span></span>
<span id="cb1-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  substituteInPlace pyproject.toml \</span></span>
<span id="cb1-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    --replace 'numpy = "^1"' 'numpy = "&gt;=1"' \</span></span>
<span id="cb1-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    --replace 'msgspec = "^0.18.5"' 'msgspec = "&gt;=0.18.5"'</span></span>
<span id="cb1-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">''</span>;</span></code></pre></div>
<p>This step relaxed the constraints directly in the <code>pyproject.toml</code>, but that might not be a good idea: these constraints might have been there for a good reason. Unit tests did pass though (more than 150 of them) so in this particular case I think I’m good. If PyPI was managed like CRAN, <code>saiph</code>’s authors would have had 2 weeks to make sure that <code>saiph</code> worked well with Numpy 2, which seems to be the case here. But patching packages is certainly not a solution for everything.</p>
</section>
<section id="the-scale-myth" class="level2">
<h2 class="anchored" data-anchor-id="the-scale-myth">The scale myth</h2>
<p>“But Python is too big and diverse for CRAN-style governance!” I hear you yell. This is simply false. CRAN manages 27000 packages across domains as varied as bioinformatics, finance, web scraping, geospatial analysis, and machine learning, and this is without counting old packages that have been archived through the years. The R ecosystem isn’t small or homogeneous. It is smaller than PyPI in absolute numbers, yes, but honestly, I doubt there are more data analytics packages on PyPI than on CRAN, and if older unmaintained Python packages would get removed, the number of PyPI packages would also be much smaller. If anyone has hard statistics on it, I’d be happy to read them.</p>
<p>The difference isn’t technical capacity or ecosystem size. It’s <strong>governance philosophy</strong>. CRAN chose consistency over permissiveness. PyPI chose the opposite.</p>
</section>
<section id="and-no-conda-forge-isnt-enough" class="level2">
<h2 class="anchored" data-anchor-id="and-no-conda-forge-isnt-enough">And no, conda-forge isn’t enough</h2>
<p>Conda-forge is curated in that its builds are consistent, compilers are pinned, migrations are coordinated. That’s great, and it proves Python packaging can work at scale.</p>
<p>But if package A wants <code>numpy &lt; 2</code> and package B wants <code>numpy &gt;= 2</code>, conda-forge will host them both, and you’re still stuck. There’s no enforcement mechanism that forces the ecosystem to resolve contradictions. CRAN has that. Conda-forge doesn’t.</p>
<p>Conda-forge is a step in the right direction, but a tighter governance is needed.</p>
</section>
<section id="what-python-actually-needs-pypan" class="level2">
<h2 class="anchored" data-anchor-id="what-python-actually-needs-pypan">What Python actually needs: PyPAN</h2>
<p>Python needs a curated layer on top of PyPI that enforces consistency. Call it PyPAN: the Python Package Archive Network.</p>
<p>Here’s what PyPAN would do:</p>
<ul>
<li>Mirror packages from PyPI, but only those that pass ecosystem-wide checks</li>
<li>Test every package against its reverse dependencies, not just itself</li>
<li>Coordinate migrations for major breaking changes (e.g.&nbsp;<code>numpy 2.0</code>)</li>
<li>Archive packages that refuse to adapt</li>
<li>Publish consistent, installable snapshots of the entire ecosystem</li>
</ul>
<p>In other words: CRAN, but for Python.</p>
<p>If CRAN can maintain consistency across 27’000 packages (by such a <a href="https://cran.r-project.org/CRAN_team.htm">small team</a> by the way), Python can too. The question isn’t whether it’s technically possible but whether the Python community is willing to prioritize ecosystem stability over individual package autonomy.</p>
</section>
<section id="why-developers-would-submit-to-pypan" class="level2">
<h2 class="anchored" data-anchor-id="why-developers-would-submit-to-pypan">Why developers would submit to PyPAN</h2>
<p>Why would a package author bother? Simple:</p>
<ul>
<li><strong>Visibility</strong>: users will prefer packages on PyPAN because they actually install and work</li>
<li><strong>Less support burden</strong>: fewer bug reports about broken installs or dependency hell</li>
<li><strong>Shared responsibility</strong>: migration effort spread across the ecosystem, not left to individual maintainers</li>
<li><strong>Credibility</strong>: “on PyPAN” becomes a mark of quality and stability — especially for scientific and industry projects</li>
</ul>
<p>If you don’t opt in, fine. But eventually, users will prefer packages that are part of the curated, consistent set. Just like people prefer CRAN packages in R and avoid installing from GitHub if possible.</p>
</section>
<section id="maybe-lets-start-small" class="level2">
<h2 class="anchored" data-anchor-id="maybe-lets-start-small">Maybe let’s start small</h2>
<p>CRAN’s model proves that ecosystem-wide consistency is achievable, and I’m of the opinion that it could be also achievable at at Python’s scale. Conda-forge proves that curated Python packaging works.</p>
<p>Until Python has something like PyPAN, nothing changes. Dependency hell will keep developers up at night.</p>
<p>But we could start small. PyPAN could begin by focusing on data science, analysis, and statistics packages - the core scientific Python ecosystem. This subset is:</p>
<ul>
<li><strong>More manageable</strong>: ~500-1000 packages (I made up this range, could be more could be less, point is, it’s not the 300000 PyPi packages) instead of the entire PyPI</li>
<li><strong>Highly interconnected</strong>: numpy, pandas, scikit-learn, matplotlib, scipy form a natural dependency graph</li>
<li><strong>Stability-focused</strong>: data scientists prioritize reproducible results over bleeding-edge features</li>
<li><strong>Community-minded</strong>: scientific Python already coordinates major migrations (Python 2→3, NumPy 2.0)</li>
<li><strong>Proven demand</strong>: these users already gravitate toward conda-forge for stability</li>
</ul>
<p>A PyPAN-DS (Data Science) could demonstrate the model works, build trust, and create momentum for broader adoption. Once people see that <code>pip install pandas</code> (or <code>uv</code> if you prefer) can work as reliably as <code>install.packages('dplyr')</code>, expanding to web frameworks and other domains becomes much easier to sell.</p>
<p>The scientific Python community has the cohesion, the need, and the precedent for this kind of coordination. They could be Python’s CRAN pilot program.</p>
<p>Soooo… who’s building this?</p>


</section>

 ]]></description>
  <category>R</category>
  <guid>https://b-rodrigues.github.io/posts/2025-08-22-pypan.html</guid>
  <pubDate>Fri, 22 Aug 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>You can outsource the grunt work to an LLM, not expertise</title>
  <link>https://b-rodrigues.github.io/posts/2025-07-03-llm_time.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a> <img src="https://b-rodrigues.github.io/assets/img/llm_nope_lmao.png" style="width: 100%; height: auto;"> </a>
</p>
</div>
<p>The more I use LLMs for programming, the more it seems to me that they can only be used successfully if you ask them to do things that you could do yourself.</p>
<p>This seems to be the case because:</p>
<ul>
<li>you know exactly what you want/need and thus can exactly describe it;</li>
<li>you know exactly if the LLM is actually delivering quality code or not;</li>
<li>you know exactly if something the LLM suggests that you hadn’t thought about actually makes sense;</li>
</ul>
<p>This reminds me of my consulting years, where it was quite easy to predict if a consulting project would be successful. If the client could do it themselves <em>if they had time</em>, the project would always be successful. They knew exactly what they needed and could describe it to us, and most importantly, there was a very tight feedback loop between our intermediary outputs and their review. But when we were brought in and clients didn’t even understand what their problem was (but thought they knew), this is where things were difficult.</p>
<p>It seems to me that as long as people cannot communicate their needs clearly, developers will keep their jobs.</p>
<p>Now, this doesn’t mean that you cannot do things outside of your expertise with LLMs, but you must then use the LLM to teach you enough (alongside more traditional methods), or you must do something so trivial and done a billion times before and low stakes enough that you can blindly trust the output.</p>
<p>I’ve used an LLM recently to write code to parse json and XML files, which is something I’ve done in the past and which I’m quite happy to likely never have to do myself again. The output was quite good, and only required minor correction before working. To help the LLM generate a correct output, I gave it one XML file as context.</p>
<p>Another thing I ask the LLM to do is to write code to get data from the Openalex api using the <code>{openalexR}</code> package. To help it, I gave it the package’s and api’s documentation. Here again, the code worked flawlessly, and again, this is something I <em>could</em> have done myself, so my prompt was quite precise and I knew I had to give the LLM <em>something</em> to ensure it generated valid code.</p>
<p>Btw, I’ve been using Claude Sonnet 4 and it works quite well for R. But I also like Gemini because of its very large context window.</p>



 ]]></description>
  <category>R</category>
  <guid>https://b-rodrigues.github.io/posts/2025-07-03-llm_time.html</guid>
  <pubDate>Thu, 03 Jul 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>ggplot2 4.0.0 is coming and why ultimately it’s on YOU to ensure your environments are reproducible</title>
  <link>https://b-rodrigues.github.io/posts/2025-06-21-ggplot4.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a> <img src="https://b-rodrigues.github.io/assets/img/midnight.jpg" style="width: 50%; height: auto;"> </a>
</p>
</div>
<p>It looks like a major update to <code>{ggplot2}</code> is coming (version 4.0.0), where Posit is switching the internals from S3 to S7. This will break many reverse dependencies of <code>{ggplot2}</code> (a reverse dependency is a package that depends on <code>{ggplot2}</code>), and so Posit is following the recommendation of the CRAN policies, which state that they should give a heads-up to devs of reverse dependencies and give them enough time to fix their packages. Posit even goes beyond that and is opening PRs to offer fixes themselves, which I think is really great.</p>
<p>However, this seems to be a bit trickier in the case of R packages hosted on Bioconductor: my understanding of Bioconductor is that they have two releases per year, and packages cannot be updated in between releases. Now I’m not entirely sure if that is exactly the case, or if some exceptions can be made and packages can perhaps be fixed in between releases. That being said, it seems like this upgrade will cause some issues, and there is apparently quite a heated discussion on Bioconductor’s community chat (which I don’t have access to).</p>
<p>Whatever is going to happen, and whatever is going on in this discussion, and whatever you think of Posit, CRAN, or Bioconductor, as an end-user, there are not a million things that you can do to make sure that upgrading to the latest <code>{ggplot2}</code> (or whichever packages) won’t break projects you’re currently working on:</p>
<ul>
<li>only use dependency-free packages like those from the <a href="https://www.tinyverse.org/">tinyverse</a> or even just base R</li>
<li>use something like <code>{renv}</code> or <code>{groundhog}</code> to snapshot package versions, or better yet, Nix using my <code>{rix}</code> package</li>
<li>just don’t care and hope for the best.</li>
</ul>
<p>Ultimately, it is on YOU to ensure that your projects are reproducible, and that you can work with stable environments. Relying on infrastructure or upstream developers you don’t control is not a valid strategy.</p>



 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2025-06-21-ggplot4.html</guid>
  <pubDate>Sat, 21 Jun 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Multi-language pipelines with rixpress</title>
  <link>https://b-rodrigues.github.io/posts/2025-05-13-test_rixpress.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a> <img src="https://b-rodrigues.github.io/assets/img/polyglot_dag.png" style="width: 50%; height: auto;"> </a>
</p>
</div>
<p>
If you want to watch a 2-Minute video introduction to <code>{rixpress}</code>, click the image below:
</p>
<p><a href="https://www.youtube.com/watch?v=a1eNG9TFZ_o" target="_blank" rel="noopener noreferrer"> <img src="https://raw.githubusercontent.com/b-rodrigues/rixpress/refs/heads/main/video_thumbnail.png" alt="Video Thumbnail" style="width:100%; max-width:560px; height:auto; display:block; margin:0 auto;"> </a></p>
<p>In <a href="https://brodrigues.co/posts/2024-08-28-nix_for_r_part_12.html">August last year</a> I tried to see how one could use Nix as a built automation tool for data science pipelines, and in March this year, I’ve started working on an R package that would make setting up such pipelines easy, which I already discussed in my <a href="https://brodrigues.co/posts/2025-03-20-announcing_rixpress.html">previous post</a>.</p>
<p>After some weeks of work, I think that <code>{rixpress}</code> is at stage where it can already be quite useful to a lot of people. <code>{rixpress}</code> helps you set up your projects as a pipeline of completely reproducible steps. <code>{rixpress}</code> is a sister package to <code>{rix}</code> and together they make true computational reproducibility easier to achieve. <code>{rix}</code> makes it easy to capture and rebuild the exact computational environment in which the code was executed, and <code>{rixpress}</code> helps you move away from script-based workflows that can be difficult to execute and may require manual intervention.</p>
<p>When I first introduced <code>{rixpress}</code>, it was essentially a proof of concept. It could manage some basic R and Python interplay, but it was clearly in its early stages. I’ve since then added some features that I think really show why using Nix as the underlying build engine is a good idea.</p>
<p>Just like for its sister package <code>{rix}</code>, I’ve taken the step to submit <code>{rixpress}</code> for peer review by rOpenSci. <code>{rix}</code> really benefitted from rOpenSci’s peer review and I believe that it’ll be the same for <code>{rixpress}</code>.</p>
<section id="current-capabilities-of-rixpress" class="level2">
<h2 class="anchored" data-anchor-id="current-capabilities-of-rixpress">Current Capabilities of {rixpress}</h2>
<p>Here are the features currently available in <code>{rixpress}</code>:</p>
<ul>
<li><p>A key motivation was to simplify building pipelines where different steps might require different language environments. With <code>{rixpress}</code>, this is a central feature:</p></li>
<li><p>Define steps in R (<code>rxp_r()</code>, <code>rxp_r_file()</code>) or Python (<code>rxp_py()</code>, <code>rxp_py_file()</code>).</p></li>
<li><p>Importantly, each step can be configured to run in its own Nix-defined environment (for example, use <code>nix_env = "my-python-env.nix"</code> for a Python step, or <code>nix_env = "my-r-env.nix"</code> for an R step). These environments can be generated using my other package, <code>{rix}</code>.</p></li>
<li><p>Pass data between R and Python steps. <code>{rixpress}</code> manages the serialization, using <code>reticulate</code> by default for R/Python object conversion, and also allows custom functions for other formats like JSON or model-specific files.</p></li>
<li><p>Build Quarto (or R Markdown) documents using <code>rxp_quarto()</code> (and <code>rxp_rmd()</code>). These documents can access any artifact (<code>rxp_read("my_artifact")</code>) from preceding steps, regardless of the language used to generate it. Quarto rendering can also occur within its own dedicated Nix environment.</p></li>
<li><p>Every step in a <code>{rixpress}</code> pipeline is treated as a Nix derivation. This means hermetic builds, sandboxed execution, and content-addressable caching, leading to a high degree of reproducibility (as expected with Nix).</p></li>
<li><p>As pipelines grow, visualization is helpful. <code>rxp_ggdag()</code> (using <code>{ggdag}</code>) and <code>rxp_visnetwork()</code> (using <code>{visNetwork}</code>) provide a visual overview of dependencies. <code>dag_for_ci()</code> exports the DAG as an <code>{igraph}</code> dot file format, which can then be used for text-based visualisation on CI.</p></li>
<li><p>For CI, <code>rxp_ga()</code> can generate a GitHub Actions workflow to run the pipeline on each push. This workflow includes caching of Nix store paths between runs (using <code>export_nix_archive()</code> and <code>import_nix_archive()</code>) to avoid unnecessary rebuilds.</p></li>
<li><p>There is ample documentation, and even a vignette detailling how to use <code>{cmdstanr}</code> within a <code>{rixpress}</code> pipeline. <code>{cmdstanr}</code> works in a specific way, by compiling Stan models to C++, and so this requires careful management of Stan model compilation and sampling within the Nix sandbox, demonstrating that complex tools can be integrated.</p></li>
<li><p>It is possible to retrieve outputs from previous pipeline executions. <code>{rixpress}</code> maintains timestamped build logs. Functions like <code>rxp_list_logs()</code>, <code>rxp_inspect(which_log = "...")</code>, and <code>rxp_read("derivation_name", which_log = "...")</code> allow you to access the history of your pipeline’s execution and retrieve specific artifacts.</p></li>
</ul>
</section>
<section id="an-invitation-for-feedback" class="level2">
<h2 class="anchored" data-anchor-id="an-invitation-for-feedback">An Invitation for Feedback</h2>
<p>Considerable effort has gone into making <code>{rixpress}</code> robust and useful. A collection of examples is available at the <a href="https://github.com/b-rodrigues/rixpress_demos">rixpress_demos GitHub repository</a> to illustrate various use cases (R-only, Python-only, R/Python, Quarto, <code>{cmdstanr}</code>, and an XGBoost example).</p>
<p>I’m now looking for feedback from users: * I encourage you to try it out. I recommend watching this <a href="https://youtu.be/IXKd5ySzzSU?si=D-AbU0JYdMP-iKvB">tutorial video</a> to get started quickly. * Install it, explore the examples, and perhaps apply it to one of your projects. * Any observations on what works well, what might be confusing, or any issues encountered would be helpful. * Your feedback would be very valuable. Please feel free to open an issue on the <a href="https://github.com/b-rodrigues/rixpress">{rixpress} GitHub repository</a> with bug reports, feature suggestions, or questions.</p>
</section>
<section id="why-use-rixpress-instead-of-targets" class="level2">
<h2 class="anchored" data-anchor-id="why-use-rixpress-instead-of-targets">Why use {rixpress} instead of {targets}?</h2>
<p><code>{targets}</code> is a fantastic package, and the main source of inspiration of <code>{rixpress}</code>. If you have no need for multilanguage pipelines, then running <code>{targets}</code> inside of a Nix environment, as described <a href="https://docs.ropensci.org/rix/articles/z-advanced-topic-reproducible-analytical-pipelines-with-nix.html">here</a> is perfectly valid. But I think that <code>{rixpress}</code> has its place if:</p>
<ul>
<li>you need to use multiple languages, as you don’t need adapt Python code to work with <code>{reticulate}</code>,</li>
<li>you’re already convinced by Nix and use <code>{rix}</code>,</li>
<li>want to use a simple pipeline-tool, with a smaller scope.</li>
</ul>


</section>

 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2025-05-13-test_rixpress.html</guid>
  <pubDate>Tue, 13 May 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Announcing rixpress</title>
  <link>https://b-rodrigues.github.io/posts/2025-03-20-announcing_rixpress.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a> <img src="https://b-rodrigues.github.io/assets/img/announcing_rixpress.png" style="width: 50%; height: auto;"> </a>
</p>
</div>
<p>As I’ve already discussed in <a href="https://docs.ropensci.org/rix/articles/z-advanced-topic-reproducible-analytical-pipelines-with-nix.html">this vignette of my {rix} package</a>, it is very easy to run a <code>{targets}</code> pipeline inside of a Nix environment for increased reproduciblity. The main drawback of <code>{targets}</code> though, is that it is not possible to compute one particular object in one particular environment, and another object in another environment. It is also not possible to compute a target using Python for instance, unless you use <code>{reticulate}</code>.</p>
<p>But we can go a step further: you see, Nix is a very versatile tool, and the Nix programming language is a domain-specific language made to package software. If you assume that, say, a statistical or machine learning model is just software, then why not use Nix to build it? This thought is what made me want to write <code>{rixpress}</code>.</p>
<section id="rixpress-a-package-to-define-reproducible-analytical-pipelines" class="level2">
<h2 class="anchored" data-anchor-id="rixpress-a-package-to-define-reproducible-analytical-pipelines">rixpress, a package to define reproducible analytical pipelines</h2>
<p>The Nix programming language is a domain specific language used to package and build software, and “software” can have a very broad definition. As I explored in <a href="../posts/2024-08-28-nix_for_r_part_12.html">this blog post</a>, Nix (the programming language) can be used to define a polyglot pipeline to build, for example, a Quarto report using R and Python. I have now built a package called <code>{rixpress}</code> which is heavily inspired by <code>{targets}</code> (if you are not familiar with <code>{targets}</code>, I introduce it at the end of this blog post) to generate such pipelines and build them using Nix. Below is a complete example which starts by using Python and the Polars library to load a dataset, then transforms it a bit, and converts the data to a Pandas dataframe then passes it to R (conversion is done via <code>reticulate::py_load_object()</code> under the hood, also why I had to convert the Polars dataframe to a Pandas dataframe) and finally compiles a Quarto document (you can find the code <a href="https://github.com/b-rodrigues/rixpress_pipeline_demo">here</a>):</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(rixpress)</span>
<span id="cb1-2"></span>
<span id="cb1-3">d0 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py_file</span>(</span>
<span id="cb1-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> mtcars_pl,</span>
<span id="cb1-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">path =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/mtcars.csv'</span>,</span>
<span id="cb1-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">read_function =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda x: polars.read_csv(x, separator='|')"</span>,</span>
<span id="cb1-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nix_env =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"py-env.nix"</span></span>
<span id="cb1-8">)</span>
<span id="cb1-9"></span>
<span id="cb1-10">d1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py</span>(</span>
<span id="cb1-11">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># reticulate doesn't support polars DFs yet, so need to convert</span></span>
<span id="cb1-12">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># first to pandas DF</span></span>
<span id="cb1-13">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> mtcars_pl_am,</span>
<span id="cb1-14">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">py_expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mtcars_pl.filter(polars.col('am') == 1).to_pandas()"</span>,</span>
<span id="cb1-15">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nix_env =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"py-env.nix"</span></span>
<span id="cb1-16">)</span>
<span id="cb1-17"></span>
<span id="cb1-18">d2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py2r</span>(</span>
<span id="cb1-19">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> mtcars_am,</span>
<span id="cb1-20">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> mtcars_pl_am</span>
<span id="cb1-21">)</span>
<span id="cb1-22"></span>
<span id="cb1-23">d3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb1-24">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> mtcars_head,</span>
<span id="cb1-25">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">my_head</span>(mtcars_am),</span>
<span id="cb1-26">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">additional_files =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions.R"</span></span>
<span id="cb1-27">)</span>
<span id="cb1-28"></span>
<span id="cb1-29">d4 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb1-30">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> mtcars_tail,</span>
<span id="cb1-31">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tail</span>(mtcars_head)</span>
<span id="cb1-32">)</span>
<span id="cb1-33"></span>
<span id="cb1-34">d5 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb1-35">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> mtcars_mpg,</span>
<span id="cb1-36">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(mtcars_tail, mpg)</span>
<span id="cb1-37">)</span>
<span id="cb1-38"></span>
<span id="cb1-39">doc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_quarto</span>(</span>
<span id="cb1-40">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> page,</span>
<span id="cb1-41">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">qmd_file =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"page.qmd"</span>,</span>
<span id="cb1-42">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">additional_files =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"content.qmd"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"images"</span>),</span>
<span id="cb1-43">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nix_env =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"quarto-env.nix"</span></span>
<span id="cb1-44">)</span>
<span id="cb1-45"></span>
<span id="cb1-46">rxp_list <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(d0, d1, d2, d3, d4, d5, doc)</span>
<span id="cb1-47"></span>
<span id="cb1-48"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rixpress</span>(rxp_list, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">project_path =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"."</span>)</span>
<span id="cb1-49"></span>
<span id="cb1-50"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_dag</span>()</span></code></pre></div>
</div>
<p>Let’s go through this code:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">d0 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py_file</span>(</span>
<span id="cb2-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> mtcars_pl,</span>
<span id="cb2-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">path =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data/mtcars.csv'</span>,</span>
<span id="cb2-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">read_function =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda x: polars.read_csv(x, separator='|')"</span>,</span>
<span id="cb2-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nix_env =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"py-env.nix"</span></span>
<span id="cb2-6">)</span></code></pre></div>
</div>
<p><code>rxp_py_file()</code> uses Python to load a local file. In this case, it’s the <code>mtcars.csv</code> dataset under the <code>data/</code> folder. The read function must be a function of only one parameter, the path to the data, so I use an anonymous function wrapping <code>polars.read_csv</code> which allows me to set the separator to the unix pipe <code>|</code>. Also, this code is executed inside the environment defined by the <code>py-env.nix</code> file. This file can be generated by my other package, <code>{rix}</code> and lists the Python packages needed (you’ll find it in the repo).</p>
<p>Then:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">d1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py</span>(</span>
<span id="cb3-2">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># reticulate doesn't support polars DFs yet, so need to convert</span></span>
<span id="cb3-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># first to pandas DF</span></span>
<span id="cb3-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> mtcars_pl_am,</span>
<span id="cb3-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">py_expr =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mtcars_pl.filter(polars.col('am') == 1).to_pandas()"</span>,</span>
<span id="cb3-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nix_env =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"py-env.nix"</span></span>
<span id="cb3-7">)</span></code></pre></div>
</div>
<p><code>rxp_py()</code> executes Python code, and saves the output into the <code>name</code> argument. In this case, I filter the Polars dataframe and convert it to a Pandas dataframe. This again happens inside the environment defined by <code>py-env.nix</code>, it’s a pure Python env, no <code>{reticulate}</code> needed at this stage.</p>
<p>Then:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">d2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_py2r</span>(</span>
<span id="cb4-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> mtcars_am,</span>
<span id="cb4-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> mtcars_pl_am</span>
<span id="cb4-4">)</span></code></pre></div>
</div>
<p><code>rxp_py2r()</code> calls <code>reticulate::py_load_object()</code> to convert the Pandas dataframe to an R dataframe. We can now continue using it using R! You’ll notice that no <code>nix_env</code> argument is passed to this function. When no argument is provided to <code>nix_env</code>, the default environment, <code>default.nix</code> gets used. This one must always be present and in this case contains the required R packages for the pipeline.</p>
<p>Then:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">d3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_r</span>(</span>
<span id="cb5-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> mtcars_head,</span>
<span id="cb5-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">my_head</span>(mtcars_am),</span>
<span id="cb5-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">additional_files =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions.R"</span></span>
<span id="cb5-5">)</span></code></pre></div>
</div>
<p>This one uses an argument we don’t know yet, <code>additional_files</code>. It allows you to pass R scripts that define functions. In this case, <code>functions.R</code> contains the definition of <code>my_head()</code> which is used on <code>mtcars_am</code>.</p>
<p><code>d4</code> and <code>d5</code> are self-explanatory, so now let’s take a look at <code>rxp_quarto()</code>:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">doc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rxp_quarto</span>(</span>
<span id="cb6-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> page,</span>
<span id="cb6-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">qmd_file =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"page.qmd"</span>,</span>
<span id="cb6-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">additional_files =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"content.qmd"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"images"</span>),</span>
<span id="cb6-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nix_env =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"quarto-env.nix"</span></span>
<span id="cb6-6">)</span></code></pre></div>
</div>
<p>This compiles the <code>page.qmd</code> document, which requires additional files: <code>content.qmd</code> which gets included into <code>page.qmd</code> and the <code>images/</code> folder, that contains images required to compile the document. This file is compiled using the <code>quarto-env.nix</code> environment.</p>
<p>Putting all these derivations into a list and passing it to <code>rixpress()</code> doesn’t build the pipeline just yet, but generates a <code>pipeline.nix</code> file which is the Nix expression that will build the output, in this case our Quarto document. You can also take a look at the DAG using <code>plot_dag()</code>:</p>
<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/rixpress_dag.png" width="100%">
</p>
</div>
<p>and it’s also possible to retrieve objects in an interactive sessions using <code>rxp_read()</code> (to read them) or <code>rxp_load()</code> (to load them in the global environment). When reading or loading Python objects, this will get converted using <code>{reticulate}</code> on the fly.</p>
<p>To build the pipeline, run <code>rxp_make()</code>. Subsequent runs don’t build everything, as intermediary outputs are cached in the <em>Nix store</em>. So if you change only the Quarto document, only this one derivation gets built anew. It is also possible to export and import the outputs using <code>export_nix_archive()</code> and <code>import_nix_archive()</code>, pretty useful for CI!</p>
</section>
<section id="caveats" class="level2">
<h2 class="anchored" data-anchor-id="caveats">Caveats</h2>
<p>This package is still in the prototype stage, so don’t use it for anything serious. There are still some things I need to work on, for now debugging a faulty pipeline is really hard because intermediary outputs are difficult to find if the pipeline wasn’t completely built.</p>
<p>Also, due to how Nix works, every computation happens in a completely isolated sandbox. This is why the <code>rxp_*()</code> functions have that <code>additional_files</code> argument, because in case something external is required, Nix needs to copy it over into the sandbox. This means also that functions that require Internet access to work will fail. But I was able to work around that for <code>rxp_file()</code>: so if a resource is online, the function that reads it should be able to get to it.</p>
<p>Now, let me introduce <code>{targets}</code>, my main source of inspiration for this package</p>
</section>
<section id="the-targets-package-my-source-of-inspiration-for-rixpress" class="level2">
<h2 class="anchored" data-anchor-id="the-targets-package-my-source-of-inspiration-for-rixpress">The targets package, my source of inspiration for rixpress</h2>
<p>I’m a huge fan of the <code>{targets}</code> package and think that it’s truly one of the best packages ever made. No other build/pipeline automation tool comes close in my opinion. Most of these tools require you to define your pipeline in another language (such as yaml) or force you to use some very specific syntax where you explicitely need to define the objects to compute, their inputs and outputs. But <code>{targets}</code> allows you to define your pipeline as a series of R calls:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># _targets.R file</span></span>
<span id="cb7-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(targets)</span>
<span id="cb7-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tarchetypes)</span>
<span id="cb7-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_source</span>()</span>
<span id="cb7-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_option_set</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">packages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"readr"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ggplot2"</span>))</span>
<span id="cb7-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb7-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(file, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data.csv"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"file"</span>),</span>
<span id="cb7-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(data, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_data</span>(file)),</span>
<span id="cb7-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(model, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit_model</span>(data)),</span>
<span id="cb7-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(plot, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_model</span>(model, data))</span>
<span id="cb7-11">)</span></code></pre></div>
</div>
<p>This may look foreign to many R users, but if you look closely, you’ll realise that most of this code is boilerplate:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># _targets.R file</span></span>
<span id="cb8-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(targets)</span>
<span id="cb8-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tarchetypes)</span>
<span id="cb8-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_source</span>()</span>
<span id="cb8-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_option_set</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">packages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"readr"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ggplot2"</span>))</span>
<span id="cb8-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb8-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(....),</span>
<span id="cb8-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(....),</span>
<span id="cb8-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(....),</span>
<span id="cb8-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(....)</span>
<span id="cb8-11">)</span></code></pre></div>
</div>
<p>and what matters is defined inside the <code>tar_target()</code> functions. Remove the boilerplate, and you end up with essentially correct R code, after a few adjustments:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data.csv"</span></span>
<span id="cb9-2">data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_data</span>(file)</span>
<span id="cb9-3">model <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit_model</span>(data)</span>
<span id="cb9-4">plot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_model</span>(model, data)</span></code></pre></div>
</div>
<p>but why go through the trouble of using <code>{targets}</code>? Well, the biggest reason is that <code>{targets}</code> figures out the dependencies between the objects you want to compute, and caches them. So in the example above, if you only change the code of the <code>fit_model()</code> function, only <code>model</code> and <code>plot</code> are re-computed. But if you change <code>file</code> and point the path to an updated <code>data.csv</code> file, then everything gets computed anew. Watch the <a href="https://books.ropensci.org/targets/walkthrough.html">intro video</a> from the official walkthrough for a visual explanation: but trust me, <code>{targets}</code> is in this class of tools that make you wonder how you could possibly have gotten anything done before using it.</p>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>I think that <code>{rixpress}</code> can become quite an useful package, so I will likely submit it for rOpenSci peer review in due time.</p>
<p>And thanks to <a href="https://grantmcdermott.com/">Grant McDermott</a> for suggesting the name “rixpress”!</p>


</section>

 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2025-03-20-announcing_rixpress.html</guid>
  <pubDate>Thu, 20 Mar 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Why we forked nixpkgs</title>
  <link>https://b-rodrigues.github.io/posts/2025-02-17-rstats-on-nix.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a> <img src="https://b-rodrigues.github.io/assets/img/fork.webp" style="width: 50%; height: auto;"> </a>
</p>
</div>
<section id="heres-why" class="level2">
<h2 class="anchored" data-anchor-id="heres-why">Here’s why</h2>
<p><code>nixpkgs</code> is a GitHub repository that contains tens of thousands of Nix expressions used by the Nix package manager to install software. By default, the nix package manager will pull expressions from <code>NixOS/nixpkgs</code>, but when using <code>{rix}</code> our fork <code>rstats-on-nix/nixpkgs</code> is used instead.</p>
<p>Because forks can sometimes be a bit controversial, we decided a blog post was in order.</p>
<p>First of all, let’s make something clear: this doesn’t mean that we don’t contribute to upstream anymore, quite the contrary. But Nix is first and foremost the package manager of a Linux distribution, NixOS, and as such, the way it does certain things only make sense in that context. For our needs, having a fork gives us more flexibility. Let me explain.</p>
<p>As you’ll know, if you’ve been using <code>{rix}</code> and thus Nix, it is possible to use a commit of the <code>nixpkgs</code> GitHub repository as the source for your packages. For example, the <code>6a9bda32519e710a0c0ab8ecfabe9307ab90ef0c</code> commit of <code>nixpkgs</code> will provide <code>{dplyr}</code> version 1.1.4 while this commit <code>407f8825b321617a38b86a4d9be11fd76d513da2</code> will provide version 1.0.7.</p>
<p>While it is technically possible for Nix to provide many versions of the same package (for example, you can install the latest Emacs by installing the <code>emacs</code> package, or Emacs 28 by installing <code>emacs28</code>) this ultimately depends on whether the maintainer wishes to do so, or whether it is practical. As you can imagine, with more than 20’000 CRAN and Bioconductor packages, that is not possible for us (by “us”, I mean the maintainers of the R ecosystem for Nix). So for a given <code>nixpkgs</code> commit, you won’t be able to <em>easily</em> install a specific version of <code>{dplyr}</code> that is not included in that particular <code>nixpkgs</code> commit. Instead, you can install it from source, and this is possible with <code>{rix}</code> by writing something like:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rix</span>(..., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">r_pkgs =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr@1.0.7"</span>, ...)</span></code></pre></div>
</div>
<p>but because this attempts to install the package from source, it can fail if that package needs Nix-specific fixes to work.</p>
<p>Also, it isn’t practical to update the whole of the R packages set on Nix every day: so while CRAN and Bioconductor get updates daily, the R packages set on Nix gets updated only around new releases of R. Again, this is a consequence of Nix being first and foremost the package manager of a Linux distribution with its own governance and way of doing things.</p>
<p>This is where the <code>rstats-on-nix</code> fork of <code>nixpkgs</code> is interesting: because it is a fork, we can afford to do things in a way that could not be possible or practical for upstream.</p>
<p>The first thing this fork allows us to do is offer a daily snapshot of CRAN. Every day, thanks to Github Actions, the R packages set gets updated, and the result commited to a dated branch. This has been going on since the 14th of December 2024 (see <a href="https://github.com/rstats-on-nix/nixpkgs/tree/2024-12-14">here</a>). So when you set a date as in <code>rix(date = "2024-12-14", ...)</code> this the fork that is going to get used. But this doesn’t mean that we recommend you use any date from the <code>rstats-on-nix/nixpkgs</code> fork: instead, each Monday, another action uses this fork and tries to build a set of popular packages on Linux and macOS, and only if this succeeds is the date added through a PR to the list of available dates on <code>{rix}</code>!</p>
<p>The reason this is done like this is to manage another <em>risk</em> of the upstream <code>nixpkgs</code>. As you know, <code>nixpkgs</code> is huge, and though the utmost care is taken by contributors and the PR review process is very strict, it can happen that updating packages breaks other packages. For example recently RStudio was in a broken state due to an issue in one its dependencies, <code>boost</code>. This is not the fault of anyone in particular: it’s just that packages get updated and packages that depend on them should get updated as well: but if that doesn’t happen quickly enough, the <code>nixpkgs</code> maintainer faces a conundrum. Either he or she doesn’t update the package because it breaks others, but not updating a package could be a security vulnerability, or he or she updates the package, but now others, perhaps less critical packages are broken and need to be fixed, either by their upstream developers, or by the <code>nixpkgs</code> maintainer of said packages. In the case of RStudio a fix was proposed and promptly merged, but if you wanted to install RStudio during the time it took to fix it, you would have faced an error message, which isn’t great if all you want is use Nix shells as development environments.</p>
<p>So for us, having a fork allows us to backport these fixes and so if you try to install RStudio using the latest available date, which is <code>"2025-02-10"</code>, it’s going to work, whereas if you tried to build it on that date using upstream <code>nixpkgs</code> you’d be facing an error!</p>
<p>We spent quite some time backporting fixes: we went back all the way to 2019. The way this works, is that we start by checking out a <code>nixpkgs</code> commit on selected dates, then we “update” the R packages set by using the Posit CRAN and Bioconductor daily snapshots. Then, we backport as many fixes as possible, and ensure that a selection of popular packages work on both x86-linux (which includes Windows, through WSL) and aarch64-darwin (the M-series of Macs). Then we commit everything to a dated branch of the <code>rstats-on-nix/nixpkgs</code> fork. You can check out all the available dates by running: <code>rix::available_dates()</code>. We’re pretty confindent that you should not face any issues when using Nix to build reproducible environments for R. However, should you face a problem, don’t hesitate to open an issue!</p>
<p>We have now packages and R versions working on Linux and macOS from March 2019 to now. See <a href="https://github.com/rstats-on-nix/daily_cran/blob/master/readme.md">this repository</a> that contains the scripts that allowed us to do it. Backporting fixes was especially important for Apple Silicon computers, as it took some time for this platform to work correctly on Nix. By backporting fixes, we can now provide olders versions of these packages for Apple Silicon as well!</p>
<p>Using this approach, our fork now contains many more versions of working R packages than upstream. <code>{rix}</code> will thus likely keep pointing towards our fork in the future, and not upstream anymore. This should provide a much better user experience. An issue with our fork though, is that by backporting fixes, we essentially create new Nix packages that are not included in upstream, and thus, these are not built by Hydra, Nix’s CI platform which builds binary packages. In practice this means that anyone using our fork will have to compile many packages from source. Now this is pretty bad, as building packages from source takes quite some time. But fear not, because thanks to <a href="https://www.cachix.org/">Cachix</a> we now also have a dedicated binary cache of packages that complements the default, public Nix cache! We provide instructions on how to use Cachix, it’s very easy, it’s just running 2 additional commands after installing Nix. Using Cachix speeds up the installation process of packages tremendously. I want to give my heartfelt thanks to <a href="https://www.cachix.org/about">Domen Kožar</a> for sponsoring the cache!</p>
<p>Another thing we do with our fork is run an action every day at midnight, that monitors the <em>health</em> of the R packages set. Of course, we don’t build every CRAN package, merely a handful, but these are among the most popular or the most <em>at-risk</em> of being in a broken state. See <a href="https://github.com/rstats-on-nix/monitor_health/actions">here</a>.</p>
</section>
<section id="also-theres-a-new-rix-release-on-cran" class="level2">
<h2 class="anchored" data-anchor-id="also-theres-a-new-rix-release-on-cran">Also, there’s a new rix release on CRAN</h2>
<p><code>{rix}</code> now handles remote packages that have remote dependencies (themselves with remote dependencies) much better thanks to code by <a href="https://github.com/mihem">Michael Heming</a>.</p>
<p>We also spent quite some time making <code>{rix}</code> work better with IDEs and have also documented that in a <a href="https://docs.ropensci.org/rix/articles/e-configuring-ide.html">new vignette</a>. The difference with previous releases of <code>{rix}</code>, is that now when a user supplies an IDE name to the <code>ide</code> argument of the <code>rix()</code> function, that IDE will get installed by Nix, which was previously not the case. This only really affects VS Code, as before, setting <code>ide = "code"</code> would only add the <code>{languageserver}</code> server package to the list of R packages to install. That was confusing, because if <code>ide = "rstudio"</code>, then RStudio would be installed. So we decided that if <code>ide = "some editor"</code>, then that editor should be installed by Nix. The vignette linked above explains in great detail how you can configure your editor to work with Nix shells.</p>
<p>If you decide to give <code>{rix}</code> a try, please let us know how it goes!</p>


</section>

 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2025-02-17-rstats-on-nix.html</guid>
  <pubDate>Mon, 17 Feb 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Using options() to inject a function’s internal variable for reproducible testing</title>
  <link>https://b-rodrigues.github.io/posts/2025-02-13-testthat.html</link>
  <description><![CDATA[ 




<p><em>No image this time</em></p>
<p>Imagine you have a function that does something complicated, and in the middle of its definition it generates a variable. Now suppose that you want to save this variable and then re-use it for tests, what I mean is that you want your function to always reproduce this intermediary variable, regardless of what you give it as inputs. This can be useful for testing, if computing this intermediate variable is costly.</p>
<p>In my <code>{rix}</code> package, the <code>rix()</code> function generates valid Nix expressions from R input and these Nix expressions can then be used to build reproducible development environments that include R, R packages, development libraries, and so on. If you want a 5-minute intro to <code>{rix}</code>, click <a href="https://www.youtube.com/watch?v=OOu6gjQ310c">here</a>.</p>
<p>Anyways, sometimes, computing these expressions can take some time, especially if the users wants to include remote dependencies that have themselves remote dependencies. <code>rix()</code> will try to look for suitable GitHub commits to pin all the packages for reproducibility purposes, and this can imply quite a lot of api calls. Now for my tests, I wanted to use an already generated <code>default.nix</code> file (which contains the generated Nix expression) but I didn’t want to have to recompute it every time I ran the test and I couldn’t simply use it as is for the test either. You see, that <code>default.nix</code> was in an intermediary state, before <code>rix()</code> is supposed to do some post-processing to it, which is what I actually want to test (I want to actually test the argument that makes <code>rix()</code> skip this post-processing step).</p>
<p>So suppose <code>rix()</code> looks like this:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb1-1">rix <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(a,b,c){</span>
<span id="cb1-2">  ... <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># lots of code</span></span>
<span id="cb1-3">  ... <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># lots of code</span></span>
<span id="cb1-4">  default.nix_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ... <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># it's generated here</span></span>
<span id="cb1-5">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Then a bunch of things happen to it</span></span>
<span id="cb1-6">  out <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(default.nix_file)</span>
<span id="cb1-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">writeLines</span>(out, path) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this is what's written</span></span>
<span id="cb1-8">}</span></code></pre></div>
</div>
<p>Now what I want is to be able to “overwrite” the <code>default.nix_file</code> variable on line 4 when testing, to provide what I want. This way, I can call <code>rix()</code> with some “easy” parameters that make the computations up to that point very quick. My goal is essentially to test <code>f()</code> (line 6), which begs the question, why not write <code>f()</code> as a separate function and test it? This would be the best practice, however, I don’t really have such an <code>f()</code>, rather it’s a series of complicated steps that follow and rewriting everything to make it easily testable would just take too much time.</p>
<p>Instead, I opted for the following:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb2-1">rix <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(a,b,c){</span>
<span id="cb2-2">  ... <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># lots of code</span></span>
<span id="cb2-3">  ... <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># lots of code</span></span>
<span id="cb2-4"></span>
<span id="cb2-5">  stub_default.nix <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">getOption</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"TESTTHAT_DEFAULT.NIX"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">default =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>)</span>
<span id="cb2-6"></span>
<span id="cb2-7">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(stub_default.nix)){</span>
<span id="cb2-8">    default.nix_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">readLines</span>(stub_default.nix)</span>
<span id="cb2-9">  } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb2-10">    default.nix_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ... <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># it's generated here if not being tested</span></span>
<span id="cb2-11">  }</span>
<span id="cb2-12">  out <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(default.nix_file)</span>
<span id="cb2-13">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Then a bunch of things happen to it</span></span>
<span id="cb2-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">writeLines</span>(out, path) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this is what's written</span></span>
<span id="cb2-15">}</span></code></pre></div>
</div>
<p>On line 5, I get the option <code>"TESTTHAT_DEFAULT.NIX"</code> and if it doesn’t exist, <code>stub_default.nix</code> will be set to <code>NULL</code>. So if it’s <code>NULL</code> it’s business as usual, if not, then that <code>default.nix</code> file dedicated for testing gets passed further down. In a sense, I injected the variable I needed in the spot I needed.</p>
<p>Then, my tests looks like this:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb3-1">testthat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">test_that</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"remove_duplicate_entries(), don't remove duplicates if skip"</span>, {</span>
<span id="cb3-2"></span>
<span id="cb3-3"></span>
<span id="cb3-4">  dups_entries_default.nix <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(</span>
<span id="cb3-5">    testthat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">test_path</span>(),</span>
<span id="cb3-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/testdata/default-nix_samples/dups-entries_default.nix"</span>)</span>
<span id="cb3-7"></span>
<span id="cb3-8">  tmpdir <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tempdir</span>()</span>
<span id="cb3-9"></span>
<span id="cb3-10">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This copies the file I need in the right path</span></span>
<span id="cb3-11">  destination_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tempdir</span>(), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">basename</span>(dups_entries_default.nix))</span>
<span id="cb3-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.copy</span>(dups_entries_default.nix, destination_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb3-13"></span>
<span id="cb3-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">on.exit</span>(</span>
<span id="cb3-15">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlink</span>(tmpdir, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">recursive =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">force =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb3-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">add =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb3-17">  )</span>
<span id="cb3-18"></span>
<span id="cb3-19">  removed_dups <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(destination_file) {</span>
<span id="cb3-20"></span>
<span id="cb3-21">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set the option to the file path and clean the option afterwards</span></span>
<span id="cb3-22">    op <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">options</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"TESTTHAT_DEFAULT.NIX"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> destination_file)</span>
<span id="cb3-23">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">on.exit</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">options</span>(op), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">add =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">after =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb3-24"></span>
<span id="cb3-25">    out <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rix</span>(</span>
<span id="cb3-26">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2025-02-10"</span>,</span>
<span id="cb3-27">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">project_path =</span> tmpdir,</span>
<span id="cb3-28">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb3-29">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">skip_post_processing =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># &lt;- this is actually want I wanted to test</span></span>
<span id="cb3-30">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(destination_file)</span>
<span id="cb3-31">  }</span>
<span id="cb3-32"></span>
<span id="cb3-33"></span>
<span id="cb3-34">  testthat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expect_snapshot_file</span>(</span>
<span id="cb3-35">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">path =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">removed_dups</span>(destination_file),</span>
<span id="cb3-36">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"skip-dups-entries_default.nix"</span>,</span>
<span id="cb3-37">  )</span>
<span id="cb3-38">})</span></code></pre></div>
</div>
<p>On line 22, I set the option and on line 23 I write code to remove that option once the test is done, to not mess up subsequent tests. This is a snapshot test, so now I can take a look at the resulting file, and indeed make sure that post-processing was skipped, as expected.</p>
<p>How would you have done this?</p>



 ]]></description>
  <category>R</category>
  <category>data-science</category>
  <guid>https://b-rodrigues.github.io/posts/2025-02-13-testthat.html</guid>
  <pubDate>Thu, 13 Feb 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>New year, new blog</title>
  <link>https://b-rodrigues.github.io/posts/2025-01-31-new_blog.html</link>
  <description><![CDATA[ 




<div style="text-align: center;">
<p>
<a href="https://www.youtube.com/watch?v=n__GJuqLb00"> <img src="https://b-rodrigues.github.io/assets/img/shadow.png" style="width: 40%; height: auto;"> </a>
</p>
</div>
<p>Happy new year! The blog has a new look! Well it’s not that different on the surface. But under the hood, it is quite different indeed!</p>
<p>My previous setup was: GitHub to host the code, on each push the build process would get started on Netlify and then it would be hosted there. The engine was Hugo.</p>
<p>This blog now still uses GitHub to host the code, but now also uses GitHub pages for hosting and the engine is <a href="https://quarto.org/docs/websites/website-blog.html">Quarto</a>. The blog also gets built on GitHub Actions inside of a Nix environment: so I just need to push and everything gets built! <a href="https://github.com/b-rodrigues/blog/blob/master/.github/workflows/build_publish.yaml">Here’s the workflow that achieves this</a>.</p>
<p>What’s really amazing with Nix, is that I can preview my blog locally using <em>exactly</em> the same environment as the one that will be used for building it on GitHub actions. So if it <em>works on my machine</em> it’s going to <em>work anywhere</em>.</p>
<p>You’ll notice that the last step uses the <code>rstats-on-nix/quarto-nix-actions/publish@main</code> action that is a fork of the <a href="https://github.com/quarto-dev/quarto-actions">quarto-dev/quarto-actions</a> actions that just makes them work inside of a Nix shell! This fork is hosted on the <code>rstats-on-nix</code> organization: I have a lot to say about this organization, but that’s for a future blog post!</p>
<p>Migrating the pages was a rather long process, as I needed to make sure everything was rendering correctly: because the folder structure of Quarto blogs is different than the structure of Hugo blogs, I had to update many paths. This was quite tedious and I didn’t want to use a script for this as I also wanted to take this opportunity to make some adjustments, such as centering images properly and correcting some typos if I saw some. It was also quite interesting to re-read some of my old blog posts.</p>
<p>One neat thing about Quarto is the possibility to use pre- and post-render scripts that can be written in R. I’m using one to correctly sort the blog posts in the main page, as for some reason they weren’t being sorted properly. <a href="https://github.com/b-rodrigues/blog/blob/master/order_posts.R">Here’s the post-render script in question.</a></p>
<p>Now I can go back to working on <a href="https://docs.ropensci.org/rix/">rix</a>.</p>



 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2025-01-31-new_blog.html</guid>
  <pubDate>Fri, 31 Jan 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Reproducible data science with Nix, part 13 – {rix} is on CRAN!</title>
  <link>https://b-rodrigues.github.io/posts/2024-09-27-nix_part_13.html</link>
  <description><![CDATA[ 




<div data-align="center">
<p>
<a href="https://docs.ropensci.org/rix"> <img src="https://b-rodrigues.github.io/assets/img/rix-logo.png" width="100%" height="auto"> </a>
</p>
</div>
<p>
<em>Simplifies the creation of reproducible data science environments using the ‘Nix’ package manager, as described in Dolstra (2006) <a href="https://dspace.library.uu.nl/handle/1874/7540">&lt;ISBN 90-393-4130-3&gt;</a>. The included ‘rix()’ function generates a complete description of the environment as a ‘default.nix’ file, which can then be built using ‘Nix’. This results in project specific software environments with pinned versions of R, packages, linked system dependencies, and other tools. Additional helpers make it easy to run R code in ‘Nix’ software environments for testing and production.</em>
</p>
<p>
After 15 months of coding, 1364 commits, 143 closed issues, 175 closed PRs, an rOpenSci pre-review, an rOpenSci review, <code>{rix}</code> is finally on <a href="https://cran.r-project.org/web/packages/rix/index.html">CRAN</a>!
</p>
<p>
You can now install <code>{rix}</code> using good old <code>install.packages()</code>. Soon, <code>{rix}</code> will also be included into the <code>nixpkgs</code> collection of packages, meaning that you will be able to install <code>{rix}</code> with Nix.
</p>
<p>
Important sidenote: as it so happened, there is currently a bug in the released CRAN version that we thought we had solved, which we did, but only partially. When running <code>rix::rix()</code> two files should be generated: a <code>default.nix</code> and an <code>.Rprofile</code> for your project. It turns out that this file can be empty. If it is, run <code>rix::rix_init(rprofile_action = “overwrite”)</code> to generate a proper <code>.Rprofile</code>. This is important, especially on Mac or if you have a system-wide library of packages! We will submit a fix asap.
</p>
<p>
If you want to watch a 5-Minute video introduction:
</p>
<div data-align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/OOu6gjQ310c?si=tQ-s9ZgEBxak8k8G" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="">
</iframe>
</div>
<p>
Btw, here is what <a href="https://github.com/boyter/scc">scc</a> has to say about the estimated cost of the project:
</p>
<p>
<code>scc –format=html-table –avg-wage 100000 .</code>
</p>
<div data-align="center">
<table class="table">
<colgroup>
<col width="15%">
<col width="11%">
<col width="11%">
<col width="11%">
<col width="13%">
<col width="10%">
<col width="16%">
<col width="11%">
</colgroup>
<thead>
<tr class="header">
<th>
<strong>Language</strong>
</th>
<th align="right">
<strong>Files</strong>
</th>
<th align="right">
<strong>Lines</strong>
</th>
<th align="right">
<strong>Blank</strong>
</th>
<th align="right">
<strong>Comment</strong>
</th>
<th align="right">
<strong>Code</strong>
</th>
<th align="right">
<strong>Complexity</strong>
</th>
<th align="right">
<strong>Bytes</strong>
</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>
YAML
</td>
<td align="right">
61
</td>
<td align="right">
2798
</td>
<td align="right">
320
</td>
<td align="right">
174
</td>
<td align="right">
2304
</td>
<td align="right">
0
</td>
<td align="right">
69187
</td>
</tr>
<tr class="even">
<td>
R
</td>
<td align="right">
33
</td>
<td align="right">
4515
</td>
<td align="right">
483
</td>
<td align="right">
1225
</td>
<td align="right">
2807
</td>
<td align="right">
389
</td>
<td align="right">
153288
</td>
</tr>
<tr class="odd">
<td>
Nix
</td>
<td align="right">
10
</td>
<td align="right">
781
</td>
<td align="right">
95
</td>
<td align="right">
0
</td>
<td align="right">
686
</td>
<td align="right">
32
</td>
<td align="right">
18644
</td>
</tr>
<tr class="even">
<td>
Markdown
</td>
<td align="right">
5
</td>
<td align="right">
1371
</td>
<td align="right">
339
</td>
<td align="right">
0
</td>
<td align="right">
1032
</td>
<td align="right">
0
</td>
<td align="right">
63758
</td>
</tr>
<tr class="odd">
<td>
JSON
</td>
<td align="right">
1
</td>
<td align="right">
147
</td>
<td align="right">
0
</td>
<td align="right">
0
</td>
<td align="right">
147
</td>
<td align="right">
0
</td>
<td align="right">
4637
</td>
</tr>
<tr class="even">
<td>
Plain Text
</td>
<td align="right">
1
</td>
<td align="right">
41
</td>
<td align="right">
0
</td>
<td align="right">
0
</td>
<td align="right">
41
</td>
<td align="right">
0
</td>
<td align="right">
2269
</td>
</tr>
<tr class="odd">
<td>
<strong>Total</strong>
</td>
<td align="right">
<strong>111</strong>
</td>
<td align="right">
<strong>9653</strong>
</td>
<td align="right">
<strong>1237</strong>
</td>
<td align="right">
<strong>1399</strong>
</td>
<td align="right">
<strong>7017</strong>
</td>
<td align="right">
<strong>421</strong>
</td>
<td align="right">
<strong>311783</strong>
</td>
</tr>
</tbody>
</table>
</div>
<p>
Estimated Cost to Develop (organic) $371,264 - Estimated Schedule Effort (organic) 7.59 months - Estimated People Required (organic) 2.45
</p>
<p>
Don’t hesitate to give <code>{rix}</code> a try and let us know how it goes!
</p>



 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2024-09-27-nix_part_13.html</guid>
  <pubDate>Fri, 27 Sep 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Reproducible data science with Nix, part 12 – Nix as a polyglot build automation tool for data science</title>
  <link>https://b-rodrigues.github.io/posts/2024-08-28-nix_for_r_part_12.html</link>
  <description><![CDATA[ 




<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/guess_we_doin_pdfs.png" width="60%">
</p>
</div>
<p>
Nix is not only a package manager, but also a build automation tool, and you can use it to build polyglot data science pipelines in a completely reproducible way.
</p>
<p>
For example, suppose that you need to mix Python, R and maybe some others tools for a project (by the way, some believe this will become the norm in the coming years, use your favourite search engine to look for “polyglot data science” and you’ll see), and suppose that you want to define your project as a nice reproducible pipeline, and not simply a series of scripts. What are the options available to you?
</p>
<p>
One option would be to use the <code>{targets}</code> package for R, which allows you to do lay out your project as pipeline. But as amazing as <code>{targets}</code> is, it only works with R. If you also need Python, you would then need to also use the <code>{reticulate}</code> package to interface with it. But what do you do if you need some other command line tools? Well, you could wrap them in an R function using <code>system()</code> or <code>system2()</code>. But what if you need yet another language, like Julia? There might be a way to call Julia from R, but as you see, the more diverse tools you need, the more complex it gets. And it doesn’t really matter if you switch from <code>{targets}</code> to another such package that exists for, say, Python, you would always need to write wrappers or use packages that allow you to call the other programming languages that you need.
</p>
<p>
Another possibility is to use good old <code>make</code>. <code>make</code> is a tool from the GNU project that allows you to define <em>targets</em>, which would be the outputs of a script or call to some cli tool by writing so-called <code>Makefiles</code>. For an example of a <code>Makefile</code> in research, take a look at <a href="https://github.com/grantmcdermott/skeptic-priors/blob/master/Makefile">this one</a> from a <a href="https://link.springer.com/article/10.1007/s10584-021-03089-x">paper</a> by <a href="https://mastodon.social/@gmcd">Grant McDermott</a>. You can use <code>make</code> as a to orchestrate several programming languages or cli tools, but you will need to write code to pass data from one script to the other. <code>{targets}</code> deals with that transparently by serialising all the targets’ outputs using <code>saveRDS()</code> but this only works because only R is supported. But if you’re trying to make R, Python, and whatever else work together, you will need to deal with this manually and find a common interface to pass data around.
</p>
<p>
Despite this, using <code>make</code>, or some other tool on top of the required programming languages (and not tied to either one), is likely the best solution and it turns out that Nix can be used just like that! But why use Nix and not <code>make</code> then? Well, using Nix guarantees that whatever you produce will be completely reproducible. With <code>make</code>, you would need to either run it inside a Docker image or… inside a development environment built with Nix! I did something similar in <a href="../posts/2023-07-19-nix_for_r_part2.html">this blog post</a> where I ran a <code>{targets}</code> pipeline inside a Nix environment to make the analysis reproducible.
</p>
<p>
But if I’m already defining a reproducible development environment using Nix, why not go all the way and build a complete project using Nix? After all, Nix allows you to package <em>software</em> and what is <em>software</em> but 0’s and 1’s? And what is a trained model, a paper or report in the PDF format, predictions exported into a CSV file, etc, if not 0’s and 1’s?
</p>
<p>
Just like with any other build automation tool, Nix will only rebuild the project if something changes, and will only rebuild the parts that need to be rebuilt. So if you change a file somewhere, only whatever depends on this file will get rebuilt, just like with <code>{targets}</code>, or <code>make</code>.
</p>
<p>
In the <a href="https://github.com/b-rodrigues/nixbat/tree/master">following repository</a> you can find an example of this.
</p>
<p>
This is a very simple project: two functions are defined in the <code>python_functions.py</code> script. These functions are nothing special, and could be used interactively. One function reads a <code>.csv</code> file from the Internet and returns it, the other does some basic cleaning. Here are these two functions included in the <code>python_functions.py</code> file:
</p>
<pre><code>from pandas import read_csv

def download_iris(iris_csv_url):
    # Read the CSV file
    df = read_csv(iris_csv_url)

    return df

def process_iris(iris_csv_path):
    # Read the CSV file
    df = read_csv(iris_csv_path)

    # Replace the species numbers with their corresponding names
    species_mapping = {0: "setosa", 1: "virginica", 2: "versicolor"}
    df['species'] = df['species'].replace(species_mapping)

    return df</code></pre>
<p>
Then, I want to use <code>{ggplot2}</code> to plot this data. You will notice the lack of R script in the repo. I did this on purpose, because I wanted to show how you could directly write R code inside of a Nix expression. But in practice, it is better to have Python code in a Python script, R code in an R script, and then use Nix to orchestrate the whole thing. But I just wanted to show you that you could, if you wanted to, have a completely self-contained Nix expression that encapsulates the business logic as well.
</p>
<p>
There’s also a <code>.Qmd</code> file: this is the file that will get compiled into a PDF document, and is the output of the whole project. It could be anything else! As I stated above, this is just 0’s and 1’s so it could very well be some other output, it doesn’t really matter.
</p>
<p>
Let’s now take a look at the <code>default.nix</code> that builds the whole thing. Let’s start by the top-level definitions:
</p>
<pre><code>let
  pkgs =
    import
      (fetchTarball "https://github.com/NixOS/nixpkgs/archive/27285241da3bb285155d549a11192e9fdc3a0d04.tar.gz")
      { };

  tex = (
    pkgs.texlive.combine {
      inherit (pkgs.texlive) scheme-small;
    }
  );

  # Because building happens in sandbox that cannot connect to the internet
  # we need to download assets beforehand
  iris_path = pkgs.fetchurl {
    url = "https://raw.githubusercontent.com/b-rodrigues/nixbat/7c319bcdbe15e7f7182e7685b8de176a40d0bde9/iris.csv";
    hash = "sha256-2H6THCXKxIt4yxnDDY+AZRmbxqs7FndCp4MqaAR1Cpw=";
  };

  # Common python dependencies to use in my intermediary inputs
  pythonEnv = pkgs.python312.withPackages (ps: with ps; [ pandas ]);

  # Common python sources
  python_src = pkgs.lib.fileset.toSource {
    root = ./.;
    fileset = ./python_functions.py;
  };</code></pre>
<p>
Some variables are defined there:
</p>
<ul>
<li>
<code>pkgs</code>: this is the set of Nix packages to be used. All the dependencies of the project will get built using the Nix expressions available in the <code>nixpkgs</code> Github repository at a specific commit. This ensures that the output of this expression will always be exactly the same.
</li>
<li>
<code>tex</code>: defines the set of LaTeX packages I need to compile the PDF.
</li>
<li>
<code>iris_path</code>: the Python function I use to load the data takes a path, or url, to read the iris dataset. Because building a derivation happens in a sandbox, I need to download assets beforehand. This is what the <code>fetchurl</code> function does. I can then refer to the file path using <code>${iris_path}</code> later on.
</li>
<li>
<code>pythonEnv</code>: This lists the dependencies I will need to run my Python functions.
</li>
<li>
<code>pythonSrc</code>: Defines the path to the <code>python_functions.py</code> file.
</li>
</ul>
<p>
Then, I want to call each of my functions separately, and I want them to produce a single output. So for this, I now build a derivation, one per output. I start with the first one:
</p>
<pre><code>downloadCsv = pkgs.stdenv.mkDerivation {
  name = "download-csv";
  buildInputs =  [ pythonEnv ];
  src = pythonSrc;
  buildPhase = ''
      python -c "
import pandas as pd
from python_functions import download_iris

iris_raw = download_iris('${iris_path}')

iris_raw.to_csv('iris_raw.csv', index=False)
      "
    '';
  installPhase = ''
    mkdir -p $out
    cp iris_raw.csv $out/
  '';
  };</code></pre>
<p>
At first sight, there might seem that a lot is going on, but let’s take a closer look:
</p>
<ul>
<li>
first I give it a name: <code>name = “download-csv”</code>
</li>
<li>
second, I list its dependencies in <code>buildInputs</code>. This is what’s required to build the target!
</li>
<li>
then, I provide the source, in this case the <code>python_functions.py</code> file
</li>
</ul>
<p>
Then, I need to run the code, and this is what happens in the <code>buildPhase</code>. This is exactly the code you would write if you were using a script to glue your functions together. See how I use <code>${iris_path}</code> to refer to the path to the file defined above. Finally, in the <code>installPhase</code> I copy the <code>.csv</code> file to <code>$out/</code>, which essentially copies the file into the Nix store, making it available for the next derivations.
</p>
<p>
In the next derivation, I now use the second Python function to clean the data:
</p>
<pre><code>cleanCsv = pkgs.stdenv.mkDerivation {
    name = "clean-csv";
    buildInputs =  [ pythonEnv ];
    src = pythonSrc;
    buildPhase = ''
      python -c "
import pandas as pd
from python_functions import process_iris

iris = process_iris('${downloadCsv}/iris_raw.csv')

iris.to_csv('iris.csv', index=False)
      "
    '';
    installPhase = ''
      mkdir -p $out
      cp iris.csv $out/
    '';
  };</code></pre>
<p>
This is not very different than what I did before. Just notice how I refer to the output of the first derivation: <code>${downloadCsv}/iris_raw.csv</code>.
</p>
<p>
Now comes the last intermediary derivation, the one that uses R to create a plot:
</p>
<pre><code>generatePlot = pkgs.stdenv.mkDerivation {
    name = "generate-plot";
    buildInputs = with pkgs; [
      R
      rPackages.ggplot2
      rPackages.janitor
    ];
    dontUnpack = true;
    buildPhase = ''
            Rscript -e "

      library(ggplot2)
      library(janitor)

      iris &lt;- read.csv('${cleanCsv}/iris.csv') |&gt;
        clean_names() |&gt;
        transform(species = as.character(species))

      p &lt;- ggplot(iris,
                  aes(x = sepal_length, y = sepal_width, color = species)) +
          geom_point(size = 3) +
          labs(title = 'Sepal Length vs Sepal Width',
               x = 'Sepal Length',
               y = 'Sepal Width') +
          theme_minimal() +
          theme(plot.title = element_text(hjust = 0.5))


      ggsave('plot.png', plot = p, width = 6, height = 4, dpi = 300)

      "
    '';
    installPhase = ''
      mkdir -p $out
      cp plot.png $out/
    '';
  };</code></pre>
<p>
As I said above, to make this better, it would need to be a function defined in its own R script, as this way there’s a nice separation of concerns. On one hand, there’s the business logic in Python and R scripts, and on the other there’s the orchestration in Nix. Putting R code in the Nix expression makes this less flexible, but I wanted to show you that this is also a possibility!
</p>
<p>
Now comes the last part of the Nix expression, the actual thing I want to build, a PDF that uses the generated plot as an input:
</p>
<pre><code>in
# Derivation to generate the PDF report from Markdown
pkgs.stdenv.mkDerivation {
  name = "generate-report";
  buildInputs = [
    pkgs.quarto
    tex
  ];
  src = pkgs.lib.fileset.toSource {
        root = ./.;
        # Only include report.Qmd in the source
        fileset = ./report.Qmd;
  };
  buildPhase = ''

    cp ${generatePlot}/plot.png .

    # Deno needs to add stuff to $HOME/.cache
    # so we give it a home to do this
    mkdir home
    export HOME=$PWD/home
    quarto render report.Qmd --to pdf

  '';

  installPhase = ''
    mkdir -p $out
    cp report.pdf $out/
  '';
}</code></pre>
<p>
Notice the dependencies of this derivation: <code>quarto</code> and <code>tex</code> (<code>tex</code> is the variable I defined right at the beginning that lists LaTeX packages). I then need to specify <code>report.Qmd</code> as the source of this derivation, and copy the plot generated before in R into the working/build directory. There’s also a idiosyncrasy where a dependency of Quarto, Deno, needs to have a directory to save some stuff in it. Nix being Nix, we need to manually define such a home directory for reproducibility purposes. If it would be using my <code>home/</code> directory on my machine, this wouldn’t be reproducible! We finish the <code>buildPhase</code> by rendering the document, and then <em>install</em> it into <code>$out/</code>. To build this project, you need to have Nix installed and then type <code>nix-build</code>, or alternatively, <code>nix-build -Q</code> which hides all the output of the build phases (so you don’t see any warnings or messages thrown by either Python or R).
</p>
<p>
This will build the PDF, which you can then find in the Nix store. You’ll notice a file called <code>result</code> appear next to all your other files from the project. In a terminal, call <code>readlink result</code> and this will show you the path to the generated PDF, which you can now read!
</p>
<p>
In conclusion, I think that this is a really useful way to orchestrate code written in different programming languages, but I would not use this for monolingual projects. For R, I’ll keep using <code>{targets}</code> together with a Nix shell to ensure reproducibility. Also, to really benefit from this, your code needs, ideally, to be written as a series of functions, each outputting a single object. Instead, if you write a script to orchestrate the whole thing in R or Python, and then put a Nix expression on top of it, I’m not sure it’s really worth it. Might as well just use a Nix shell then and execute your scripts in it.
</p>
<p>
Also, let me state that this is my first attempt at using Nix for such a purpose, and there might be a better/more elegant way of doing it, so if you have any input, don’t hesitate!
</p>
<p>
<em>Thanks to <a href="https://discourse.nixos.org/t/derivation-gets-always-rebuilt/51246/3">the amazing Nix community for helping out!</a></em>
</p>



 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2024-08-28-nix_for_r_part_12.html</guid>
  <pubDate>Wed, 28 Aug 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Reproducible data science with Nix, part 11 – build and cache binaries with Github Actions and Cachix</title>
  <link>https://b-rodrigues.github.io/posts/2024-04-04-nix_for_r_part_11.html</link>
  <description><![CDATA[ 




<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/own_cache.jpg" width="60%">
</p>
</div>
<section id="intro" class="level2">
<h2 class="anchored" data-anchor-id="intro">
Intro
</h2>
<p>
I have this package on CRAN called <code>{chronicler}</code> and last month I got an email from CRAN telling me that building the package was failing, and I had two weeks to fix it.
</p>
<p>
I immediately thought that some dependency that my package depends on got updated, and somehow broke something. But when I checked the results of the build, I was surprised, to say the least:
</p>
<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/chronicler_check_results.png" width="80%">
</p>
</div>
<p>
How come my package was only failing on Fedora? Now that was really weird. There was no way this was right. Also, I couldn’t reproduce this bug on my local machine… but I could reproduce it on Github Actions, on Ubuntu (but it was ok on CRAN’s Debian which is really close to Ubuntu!), but couldn’t reproduce it either on Windows! What was going on? So I started digging, and my first idea was to look at the list of packages that got released on CRAN on that day (March 12th 2024) or just before, and saw something that caught my eye: a new version of <code>{tidyselect}</code> had just been released and even though my package doesn’t directly depend on it, I knew that this package was likely a dependency of some direct dependency of <code>{chronicler}</code>. So I looked into the release notes, and there it was:
</p>
<pre><code>* `eval_select()` out-of-bounds errors now use the verb "select" rather than
  "subset" in the error message for consistency with `dplyr::select()` (#271).</code></pre>
<p>
I knew this was what I was looking for, because the unit test that was failing to pass was a test that should error because <code>dplyr::select()</code> was being used on a column that didn’t exist. So the success of that test was defined as <em>finding the following error message in the log</em>, which contained the word <em>subset</em> but now it should be <em>select</em>.
</p>
<p>
But why was this failing only on Fedora on CRAN and on Ubuntu on Github Actions (but ok on Debian on CRAN)? And why couldn’t I reproduce the bug on my OpenSuse Linux computer, even though I was building a bleeding edge development environment using Nix?
</p>
<p>
And then it hit me like my older brother used to.
</p>
<p>
When building packages, CRAN doesn’t seem to use pre-compiled binaries on Fedora, so packages get built from source. This means that it takes longer to test on Fedora, as packages have to be built from source, but it also means that only the very latest releases of packages get used. On other platforms, pre-compiled binaries get used if available, and because <code>{tidyselect}</code> had just come out that very day, older binaries of <code>{tidyselect}</code> were being used on these platforms, but not on Fedora. And because these older binaries didn’t include this change, the unit test was still passing successfully on there.
</p>
<p>
On Github Actions, code coverage was computed using <code>covr::codecov()</code> which installs the package in a temporary directory and seems to pull its dependencies directly from CRAN. Because CRAN doesn’t offer Linux binaries packages got compiled from source, hence why the test was failing there, as the very latest version of <code>{tidyselect}</code> was being used (btw, use Dirk Eddelbuettel’s <a href="https://github.com/eddelbuettel/r2u">r2u</a> if you binaries for Ubuntu).
</p>
<p>
And on my local machine, even though I was using the latest commit of <code>nixpkgs</code> to have the most bleeding edge packages for my environment, I had forgotten that the R packages on <code>nixpkgs</code> always lag behind the CRAN releases.
</p>
<p>
This is because R packages on <code>nixpkgs</code> tend to get updated alongside a new release of R, and the reason is to ensure a certain level of quality. You see, the vast majority of CRAN (and Bioconductor) packages are made available through <code>nixpkgs</code> in a fully automated way. But some packages do require some manual intervention to work on Nix. And we only know this if we try to build these packages, but building packages requires quite a lot of resources. I go into more detail <a href="../posts/2024-02-29-nix_for_r_part_10.html">here</a>, but in summary we can’t build CRAN packages every single day to see if everything works well, so we only rebuild the whole tree whenever there’s a new release of R. Packages get built on a CI infrastructure called <em>Hydra</em>, and then get cached on <code>cache.nixos.org</code> so whenever someone wants to install a package, a pre-built binary gets pulled from the cache instead of getting installed from source. For packages that don’t need compiling this is not that big of a time save, but for packages that do need to get compiled it is huge. Depending on which packages you want to install, if you had to build everything from source, it could potentially take hours, but if you can install pre-built binaries it’s just a matter of how quick your internet connection is.
</p>
<p>
Anyways, I went back to my fork of <code>nixpkgs</code> and updated the expression defining the CRAN packages myself and installed the latest versions of packages from my fork.
</p>
<p>
Before the update, this was the error message I was testing against:
</p>
<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/cant_subset.png" width="80%">
</p>
</div>
<p>
and this was on version 1.2.0 of <code>{tidyselect}</code>:
</p>
<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/tidyselect_120.png" width="50%">
</p>
</div>
<p>
but after the update, this was the error message:
</p>
<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/cant_select.png" width="80%">
</p>
</div>
<p>
on version 1.2.1 of <code>{tidyselect}</code>:
</p>
<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/tidyselect_121.png" width="50%">
</p>
</div>
<p>
so I found the issue, and updated my unit testing accordingly, and pushed the update to CRAN. All is well that ends well, but… this made me think. I needed to have an easy way to have bleeding edge packages on hand from Nix at all moments, and so I started working on it.
</p>
</section>
<section id="github-actions-to-the-rescue" class="level2">
<h2 class="anchored" data-anchor-id="github-actions-to-the-rescue">
Github Actions to the rescue
</h2>
<p>
As described in my <a href="../posts/2024-02-29-nix_for_r_part_10.html">previous blog post</a> updating the Nix expressions defining the R packages on <code>nixpkgs</code> involves running an R script that generates a Nix expression which then builds the R packages when needed. So what I did was create a Github actions that would run this R script every 6 hours, and push the changes to a branch of my <code>nixpkgs</code> fork. This way, I would always have the possibility to use this branch if I needed bleeding edge packages. Because this can be of interest to others, <a href="https://github.com/philipp-baumann">Philipp Baumann</a> started a Github organisation hosting this fork of <code>nixpkgs</code> that gets updated daily which you can find <a href="https://github.com/rstats-on-nix">here</a>. Because this action needs to run several times a day, it should be on a schedule, but actions on a schedule can only run from master/main. But that’s not what we wanted, so instead, we are using another action, on another repository, that pushes a random file to the target repository to get the action going. You can find this repository <a href="https://github.com/b-rodrigues/trigger-r-updates">here</a> with complete instructions. So to summarise:
</p>
<ul>
<li>
An action on schedule runs from b-rodrigues/trigger-r-updates and pushes a file to rstats-on-nix/nixpkgs on the <code>r-daily-source</code> branch
</li>
<li>
This triggers an action that updates all of <code>nixpkgs</code>, including R packages, and pushes all the updates to the <code>r-daily</code> branch (you can find it <a href="https://github.com/rstats-on-nix/nixpkgs/blob/r-daily-source/.github/workflows/r-daily.yml">here</a>)
</li>
<li>
We can now use the <code>r-daily</code> branch to get bleeding edge R packages on Nix!
</li>
</ul>
<p>
This happens without any form of testing though, so packages could be in a broken state (hey, that’s the definition of bleeding edge, after all!), and also, if anyone would like to use this fork to build a development environment, they’d have to rebuild a lot of packages from source. Again, this is because these packages are defined in a fork of <code>nixpkgs</code> and they don’t get built on Hydra to populate the public cache that Nix uses by default. So while this fork is interesting because it provides bleeding edges packages, using it on a day-to-day basis can be quite tedious.
</p>
<p>
And this is where <a href="https://www.cachix.org/">Cachix</a> comes into play.
</p>
</section>
<section id="setting-up-your-own-binary-cache-on-cachix" class="level2">
<h2 class="anchored" data-anchor-id="setting-up-your-own-binary-cache-on-cachix">
Setting up your own binary cache on Cachix
</h2>
<p>
<a href="https://www.cachix.org/">Cachix</a> is an amazing tool that makes it incredibly easy to set up your own cache. Simply build the packages once, and push the binaries to the cache. As long as these packages don’t get updated, they’ll get pulled from the cache instead of getting rebuilt.
</p>
<p>
So now, here is what I do with my packages: I define a <code>default.nix</code> file that defines a development environment that uses my fork of <code>nixpkgs</code> as the source for packages. For example, <a href="https://github.com/b-rodrigues/rix/blob/master/default.nix">here</a> is this file that defines the environment for my <code>{rix}</code> package. I can use this environment to work on my package, and make sure that anyone else that wants to contribute, contributes using the same environment. As you can see on line 2, the <code>rstats-on-nix</code> bleeding edge fork gets used:
</p>
<pre><code> pkgs = import (fetchTarball "https://github.com/rstats-on-nix/nixpkgs/archive/refs/heads/r-daily.tar.gz") {};</code></pre>
<p>
Then, still on <code>{rix}</code>’s repository, I define a new action that builds this environment periodically, but using the binary cache I set up with Cachix. You can find this action <a href="https://github.com/b-rodrigues/rix/blob/master/.github/workflows/cachix-dev-env.yml">here</a>. So the <code>r-daily</code> branch of our <code>nixpkgs</code> fork gets updated every 6 hour and this environment gets updated every 12 hours, 30 minutes past the hour.
</p>
<p>
Now, every time I want to work on my package, I simply use <code>nix-build</code> on my computer to update the development environment. This is what I see:
</p>
<pre><code>copying path '/nix/store/0l0iw4hz7xvykvhsjg8nqkvyl31js96l-r-stringr-1.5.1' from 'https://b-rodrigues.cachix.org'...
copying path '/nix/store/cw3lc7b0zydsricl5155jbmldm1vcyvr-r-tibble-3.2.1' from 'https://b-rodrigues.cachix.org'...
copying path '/nix/store/y32kpp09l34cdgksnr89cyvz6p5s94z8-r-tidyselect-1.2.1' from 'https://b-rodrigues.cachix.org'...
copying path '/nix/store/sw24yx1jwy9xzq8ai5m2gzaamvyi5r0h-r-rematch2-2.1.2' from 'https://b-rodrigues.cachix.org'...
copying path '/nix/store/z6b4vii7hvl9mc53ykxrwks1lkfzgmr4-r-dplyr-1.1.4' from 'https://b-rodrigues.cachix.org'...</code></pre>
<p>
as you can see, packages get pulled from my cache. Packages that are already available from the usual, public, <code>cache.nixos.org</code> don’t get rebuilt nor cached in mine; they simply continue getting pulled directly from there. This makes using the development environment very easy, and guarantees I’m always mirroring the state of packages released on CRAN. The other interesting thing is that I can use that cache with other actions. For example, <a href="https://github.com/b-rodrigues/rix/blob/master/.github/workflows/tests-r-via-nix.yaml">here</a> is the action that runs the unit tests included in the package in an environment that has Nix installed on it (some unit tests need Nix to be available to run). On line 25 you can see that we install Nix and set our fork as the repository to use:
</p>
<pre><code>nix_path: nixpkgs=https://github.com/rstats-on-nix/nixpkgs/archive/refs/heads/r-daily.tar.gz</code></pre>
<p>
and just below, we set up the cache:
</p>
<pre><code>- uses: cachix/cachix-action@v14
  with:
    name: b-rodrigues # this is the name of my cache</code></pre>
<p>
By using my cache, I make sure that the test runs with the freshest possible packages, and don’t run the risk of having a test succeed on an outdated environment. And you might have noticed that I am not authenticating to Cachix: to simply pull binaries, to authentication is needed!
</p>
<p>
Cachix has a free plan of up to 5Gb which is more than enough to set up several development environments like this, and is really, really, easy to set up, and it works on your computer and on Github Actions, as shown. If you want to use this development environment to contribute to <code>{rix}</code>, check out the instructions on <a href="https://github.com/b-rodrigues/rix/blob/master/CONTRIBUTING.md#development-environment">Contributing.md</a> file.
</p>
<p>
You can use the same approach to always have development environments ready for your different projects, and I will likely add the possibility to use this fork of <code>nixpkgs</code> with my <code>{rix}</code> package.
</p>
<p>
<em>Thanks to <a href="https://github.com/philipp-baumann">Philipp Baumann</a> for nudging me into the direction of using Cachix and showing the way!</em>
</p>


</section>

 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2024-04-04-nix_for_r_part_11.html</guid>
  <pubDate>Thu, 04 Apr 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Reproducible data science with Nix, part 10 – contributing to nixpkgs</title>
  <link>https://b-rodrigues.github.io/posts/2024-02-29-nix_for_r_part_10.html</link>
  <description><![CDATA[ 




<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/nix_parents.jpg" width="60%">
</p>
</div>
<p>
I’ve very recently started contributing to the <code>nixpkgs</code> repository of packages, which contains all the packages you can install from the Nix package manager. My contributions are fairly modest: I help fix R packages that need some tweaking to make them successfully build for Nix. Most of these fixes are very simple one-liners.
</p>
<p>
Most users of any free and open source tool rarely contribute to the development of this tool: I don’t think it is due to lack of skills and/or time or interest, but mostly because starting to contribute to a tool requires some knowledge that is rarely written down (even more so for an entire ecosystem). These tools and ecosystems grow organically, and if you’re not in the right spot at the right time or are not lucky enough to have kind people taking time to explain things to you, contributing might feel completely overwhelming.
</p>
<p>
Thankfully, I was very lucky to have found the small but very active community of R contributors to <code>nixpkgs</code> on <a href="https://matrix.to/#/#r:nixos.org">Matrix</a> which very kindly took the time to bring me up to speed!
</p>
<p>
I wanted to share my experiences in this blog post: but this blog post is not just going to be about me contributing to <code>nixpkgs</code> from the perspective of an R user (and giving you some pointers on how to start yourself), but also about how I built a report (let’s call it like that) to keep track of which R packages got fixed. This report is built using R, Nix, Github Actions and lists all the failed R package builds from Hydra (more on this later). The report gets updated every day automatically at midnight, and is accessible <a href="https://raw.githack.com/b-rodrigues/nixpkgs-r-updates-fails/targets-runs/output/r-updates-fails.html">here</a>. I also used a very minimalistic approach to build this: no <code>{tidyverse}</code> packages, and no Quarto. Why? Mostly just to keep dependencies at a minimum to accelerate CI/CD, but also for fun. And honestly, I must admit that base R is more than capable on its own and had forgotten that.
</p>
<section id="contributing-to-nixpkgs" class="level2">
<h2 class="anchored" data-anchor-id="contributing-to-nixpkgs">
Contributing to nixpkgs
</h2>
<p>
As explained in <a href="../posts/2023-12-19-nix_for_r_part_8.html">part 8</a>, <code>nixpkgs</code> is “nothing but” a huge GitHub repository containing thousands of Nix expressions. These expressions are then used to actually build the software that then gets installed by Nix. For example, <a href="https://github.com/NixOS/nixpkgs/blob/nixpkgs-unstable/pkgs/development/libraries/quarto/default.nix">this is the expression for Quarto</a>. As you can see, it starts by downloading the pre-compiled binary, and then applying “patches”. Essentially making sure that Quarto installed by Nix is able to find the other pieces installed by Nix that Quarto needs (Deno, Pandoc, Typst and so on). It then continues by installing Quarto itself (because we’re downloading a pre-compiled binary, <em>installation</em> consists in moving files in the right spot), finally some tests are executed (<code>quarto check</code>) and then some metadata is defined. Not every package is defined like this, with a single Nix expression, though. For example, individual R packages are not defined like this. Instead, every package from CRAN and Bioconductor gets built using only a handful of files that can be found <a href="https://github.com/NixOS/nixpkgs/tree/nixpkgs-unstable/pkgs/development/r-modules">here</a>.
</p>
<p>
(By the way, you can look for packages and find their associated Nix expressions on the <a href="https://search.nixos.org/packages?channel=unstable&amp;from=0&amp;size=50&amp;sort=relevance&amp;type=packages&amp;query=quarto">NixOS package search</a>).
</p>
<p>
The way this works, is that periodically the <a href="https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/r-modules/generic-builder.nix"><code>generate-r-packages.R</code></a> script is run and generates the <a href="https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/r-modules/cran-packages.nix"><code>cran-packages.nix</code></a> file (and the equivalent Bioconductor files). For each package on CRAN, a line gets written in the script with the package’s name, its current version on CRAN, and very importantly its dependencies. For example, here is the line for <code>{dplyr}</code>:
</p>
<pre><code>dplyr = derive2 { name="dplyr"; version="1.1.4";
   sha256="1jsq8pj12bngy66xms486j8a65wxvyqs944q9rxkiaylsla08wyg";
   depends=[cli generics glue lifecycle magrittr pillar R6 rlang tibble tidyselect vctrs]; };</code></pre>
<p>
These dependencies are actually the packages that can be found in the <a href="https://github.com/tidyverse/dplyr/blob/main/DESCRIPTION"><code>DESCRIPTION</code></a> file under <code>Imports</code>. <a href="https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/r-modules/cran-packages.nix"><code>cran-packages.nix</code></a> (and the same goes for the Bioconductor equivalents, <code>bioc-packages.nix</code>, <code>bioc-annotation-packages.nix</code> and <code>bioc-experiment-packages.nix</code>) get imported in the <a href="https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/r-modules/default.nix"><code>default.nix</code></a> file. In it, another file, <code>generic-builder.nix</code> gets also imported, which contains a function that will attempt building the package. Most of the time this succeeds, but some packages require further tweaks. Packages that have a field <code>NeedsCompilation</code> in their DESCRIPTION files are usually candidates for further tweaking: these packages require system-level dependencies, which are often listed under <code>SystemRequirements</code> (but not always, which complicates matters). For example, the <code>{terra}</code> package has these system requirements listed in itself DESCRIPTION file:
</p>
<pre><code>SystemRequirements:  C++17, GDAL (&gt;= 2.2.3), GEOS (&gt;= 3.4.0), PROJ (&gt;= 4.9.3), sqlite3</code></pre>
<p>
so these also need to be added if we want to build them on Nix. But if we look at the line for <code>{terra}</code> in <code>cran-packages.nix</code>, this is what we see:
</p>
<pre><code>terra = derive2 { name="terra"; version="1.7-65"; 
  sha256="0m9s5am8l6il1q0skab614cx0qjsb1i9xcv6nm0sdzj7p9lrzkfl"; 
  depends=[Rcpp]; };</code></pre>
<p>
Only <code>{Rcpp}</code> is listed, which is a dependency, yes, but an R package dependency, not a system-level requirement. System-level requirements need to be added in the <code>default.nix</code> file manually. In the <code>default.nix</code>, you’ll find a long list of packages called <code>packagesWithNativeBuildInputs</code> and <code>packagesWithBuildInputs</code>. <em>NativeBuildInputs</em> and <em>BuildInputs</em> are Nix jargon for dependencies the package needs, at compile-time and then at run-time specifically. For example, <code>{Rcpp}</code> is a <em>BuildInput</em> of <code>{terra}</code>, while the system-level requirements are <em>NativeBuildInputs</em> (in the context of R packages though, this rarely matters. If you want more details, refer to <a href="https://gist.github.com/b-rodrigues/c677b59126d05d43347ed9623ddd5b0c">this Gist</a> I’ve forked).
</p>
<p>
For <code>{terra}</code>, this means that we need to add this line to the list <code>{packagesWithNativeBuildInputs}</code> (I simplified the syntax here a bit):
</p>
<pre><code>terra = [ gdal proj geos ];</code></pre>
<p>
<code>gdal</code>, <code>proj</code> and <code>geos</code> are the system requirements that need to be added for <code>{terra}</code> to build successfully on Hydra.
</p>
</section>
<section id="hydra" class="level2">
<h2 class="anchored" data-anchor-id="hydra">
Hydra
</h2>
<p>
<em>Hydra is a tool for continuous integration testing and software release that uses a purely functional language to describe build jobs and their dependencies</em> (source: <a href="https://hydra.nixos.org/build/248007843/download/1/hydra/#introduction">the Hydra Manual</a>)
</p>
<p>
If you’re coming from R, think of Hydra as <a href="https://builder.r-hub.io/">R-hub</a>, which will check and build your R package before submitting to CRAN. Hydra periodically tries to rebuild packages. If that package fails, then the log gets hosted. When it comes to R packages, we can check which packages built successfully or not on <a href="https://hydra.nixos.org/jobset/nixpkgs/r-updates">here</a>.
</p>
<p>
As of writing, the latest evaluation was in mid-January. A new release of R is going to get released on the 29th of February (or maybe was already released, I’m not sure when this blog post is going to get posted), and this is when new evaluations will likely be executed. Evaluations are the processes by which Nix expressions get… evaluated and used to actually build packages. So if we look into the results of the evaluation of the 17th of January, we see that 757 jobs failed:
</p>
<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/hydra_failing_jobs.jpg" width="80%">
</p>
</div>
<p>
One job doesn’t strictly correspond to one package though: packages get built for different architectures, and each architecture gets its build process. If we log into the details of the first package whose build failed <code>{AIUQ}</code>, we see this:
</p>
<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/hydra_failed.jpg" width="80%">
</p>
</div>
<p>
From the log we see that actually what failed one of its dependencies, <code>{SuperGauss}</code>, so fixing <code>{SuperGauss}</code> will likely fix <code>{AIUQ}</code> (I say likely because maybe another needed dependency also fails). So we could try to fix <code>{SuperGauss}</code> first. Let’s see why <code>{SuperGauss}</code>, by clicking on <code>raw</code>:
</p>
<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/hydra_failed_raw.jpg" width="80%">
</p>
</div>
<p>
Here is what we see:
</p>
<pre><code>Running phase: unpackPhase
unpacking source archive /nix/store/615bdvjchxrd7wp5m7dhg4g04yv7ncza-SuperGauss_2.0.3.tar.gz
source root is SuperGauss
setting SOURCE_DATE_EPOCH to timestamp 1645735202 of file SuperGauss/MD5
Running phase: patchPhase
Running phase: updateAutotoolsGnuConfigScriptsPhase
Running phase: configurePhase
Running phase: buildPhase
Running phase: checkPhase
Running phase: installPhase
* installing *source* package 'SuperGauss' ...
** package 'SuperGauss' successfully unpacked and MD5 sums checked
** using staged installation
checking for gcc... /nix/store/xq8920m5mbd83vdlydwli7qsh67gfm5v-gcc-wrapper-13.2.0/bin/cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether /nix/store/xq8920m5mbd83vdlydwli7qsh67gfm5v-gcc-wrapper-13.2.0/bin/cc accepts -g... yes
checking for /nix/store/xq8920m5mbd83vdlydwli7qsh67gfm5v-gcc-wrapper-13.2.0/bin/cc option to accept ISO C89... none needed
checking for pkg-config... no
checking for FFTW... configure: error: in `/build/SuperGauss':
configure: error: The pkg-config script could not be found or is too old.  Make sure it
is in your PATH or set the PKG_CONFIG environment variable to the full
path to pkg-config.

Alternatively, you may set the environment variables FFTW_CFLAGS
and FFTW_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details.

To get pkg-config, see &lt;http://pkg-config.freedesktop.org/&gt;.
See `config.log' for more details
ERROR: configuration failed for package 'SuperGauss'
* removing '/nix/store/jxv5p85x24xmfcnifw2ibvx9jhk9f2w4-r-SuperGauss-2.0.3/library/SuperGauss'</code></pre>
<p>
This is essentially what we would see if we tried to install <code>{SuperGauss}</code> on Linux. The error message is quite clear here: a system-level dependency, <code>pkg-config</code> is missing. Looks like we found our first package to fix!
</p>
</section>
<section id="fixing-a-package" class="level2">
<h2 class="anchored" data-anchor-id="fixing-a-package">
Fixing a package
</h2>
<p>
The first step is to fork and clone the <code>nixpkgs</code> GitHub repository to your computer (be patient, the repository is huge so the download will take some time):
</p>
<pre><code>git clone git@github.com:b-rodrigues/nixpkgs.git</code></pre>
<p>
It’s also a good idea to add the original <code>nixpkgs</code> as an <code>upstream</code>:
</p>
<pre><code>git remote add upstream https://github.com/NixOS/nixpkgs</code></pre>
<p>
This way, you can pull changes from the original <code>nixpkgs</code> repository into your fork easily with:
</p>
<pre><code>git fetch upstream master
git merge upstream/master</code></pre>
<p>
These two commands synchronize your local copy of the repository with upstream. So now we can create a new branch to try to fix <code>{SuperGauss}</code>:
</p>
<pre><code>git branch -b fix_supergauss</code></pre>
<p>
and then we should try to build <code>{SuperGauss}</code> locally. This is because it might have been fixed in the meantime by someone else, so let’s try to build it with (run the following command in a terminal at the root of your local copy of the <code>nixpkgs</code> repository):
</p>
<pre><code>nix-build -A rPackages.SuperGauss</code></pre>
<p>
but I often prefer to use this instead, because this will build the package and drop me into a shell where I can start R, load the package, and try it by running some of its examples:
</p>
<pre><code>nix-shell -I nixpkgs=/path/to/my/nixpkgs -p rPackages.SuperGauss R</code></pre>
<p>
If any of the commands above fail with the same error message as on Hydra, we know that it hasn’t been fixed yet. So the fix consists in opening the <code>pkgs/development/r-modules/default.nix</code> and add the following line:
</p>
<pre><code>SuperGauss = [ pkg-config ];</code></pre>
<p>
in either the lists <code>packagesWithBuildInputs</code> or <code>packagesWithNativeBuildInputs</code> (as explained above, it doesn’t really matter). Trying to rebuild <code>SuperGauss</code> again will result in a new error message. Another dependecy needs to be added:
</p>
<pre><code>SuperGauss = [ pkg-config fftw.dev ];</code></pre>
<p>
Then, building succeeds! We can now commit, push, and open a pull request. Commit messages need to be formatted in a certain way, as per <code>nixpkgs</code> <a href="https://github.com/NixOS/nixpkgs/blob/master/CONTRIBUTING.md">contributing guide</a>, so:
</p>
<pre><code>git add .
git commit -m "rPackages.SuperGauss: add dependencies"</code></pre>
<p>
also, there should only be one commit per fix. So if in the process of fixing a package you commited several times, you will need to use <code>git rebase</code> to squash all the commits into one. Once you open the pull request, a maintainer will get pinged, and merge the PR if everything is alright (which is usually the case for these one-liners). You can see the PR for <code>{SuperGauss}</code> <a href="https://github.com/NixOS/nixpkgs/pull/287209">here</a>.
</p>
<p>
The process is relatively simple once you did it once or twice, but there are some issues: there is no easy way to find out on which packages we should focus on. For example, is <code>{SuperGauss}</code> really that important? The fix was very simple, so it’s ok, but if it took more effort, should we spend the limited time we have on it, or should we focus on another package? Also, if someone has already opened a PR to fix a package, but that PR hasn’t been merged yet, if I try to also fix the same package and try to build the package, it would still fail. So I might think that no one is taking care of it, and waste time duplicating efforts instead of either focusing on another package, or reviewing the open PR to accelerate the process of merging.
</p>
<p>
Discussing this with other contributors, <a href="https://fosstodon.org/deck/@kupac@functional.cafe">László Kupcsik</a> suggested we could use <code>{packageRank}</code> to find out which packages are getting a lot of downloads from CRAN, and so we could focus on fixing these packages first. This is a great idea and it gave me the idea to build some kind of report that would do this automatically for us, and also list opened and merged PRs so we wouldn’t risk duplicating efforts.
</p>
<p>
This report can be found <a href="https://raw.githack.com/b-rodrigues/nixpkgs-r-updates-fails/targets-runs/output/r-updates-fails.html">here</a> and now I’ll explain how I built it.
</p>
</section>
<section id="which-packages-to-fix-and-keeping-track-of-prs" class="level2">
<h2 class="anchored" data-anchor-id="which-packages-to-fix-and-keeping-track-of-prs">
Which packages to fix and keeping track of PRs
</h2>
<p>
So the main idea was to know on which packages to focus on. So essentially, we wanted this table:
</p>
<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/hydra_failing_jobs.jpg" width="80%">
</p>
</div>
<p>
but with <code>{packageRank}</code> added to it. So the first step was to scrape this table, using <code>{rvest}</code>. This is what you can find on lines 11 to 63 of this <a href="https://github.com/b-rodrigues/nixpkgs-r-updates-fails/blob/0fe273dd234f0d32e5fae86630173ff42cce2d9f/_targets.R">{targets} workflow</a> (alongside some basic cleaning). I won’t go too much into detail, but if something’s not clear, ping me on <a href="https://twitter.com/brodriguesco">twitter</a> or <a href="https://fosstodon.org/@brodriguesco">Mastodon</a> or even open an issue on the report’s <a href="https://github.com/b-rodrigues/nixpkgs-r-updates-fails/issues">repository</a>.
</p>
<p>
Next I also get the reason the package failed building. So in the example from before, <code>{AIUQ}</code> failed because <code>{SuperGauss}</code> failed. On Hydra, you should be clicking to see this, but here I scrape it as well automatically, and add this information in a column called <code>fails_because_of</code>. This is what you can read on lines <a href="https://github.com/b-rodrigues/nixpkgs-r-updates-fails/blob/0fe273dd234f0d32e5fae86630173ff42cce2d9f/_targets.R#L65">65 to 77</a>. I use a function called <code>safe_get_failed_deps()</code>, which you can find in the <code>functions.R</code> script <a href="https://github.com/b-rodrigues/nixpkgs-r-updates-fails/blob/0fe273dd234f0d32e5fae86630173ff42cce2d9f/functions.R#L41C1-L68C2">on here</a>. <code>safe_get_failed_deps()</code> wraps the main function, <code>get_failed_deps()</code>, with <code>tryCatch()</code>. This is because if anything goes wrong, I want my function to return <code>NULL</code> instead of an error, which would crash the whole pipeline.
</p>
<p>
Next, I add the packages’ rank using a function that wraps <code>packageRank::packageRank()</code> called <code>safe_packageRank()</code> on <a href="https://github.com/b-rodrigues/nixpkgs-r-updates-fails/blob/0fe273dd234f0d32e5fae86630173ff42cce2d9f/_targets.R#L97">line 97</a>.
</p>
<p>
<code>safe_packageRank()</code> uses <code>tryCatch()</code> to return <code>NULL</code> in case there’s an error. This is needed because <code>packageRank()</code> will only work on CRAN packages, but Hydra also tries to build Bioconductor packages: when these packages’ names get passed to <code>packageRank()</code>, an error gets returned because these are not CRAN packages:
</p>
<pre class="r"><code>packageRank("haha")
Error: haha: misspelled or not on CRAN/Archive.</code></pre>
<p>
but instead of an error that would stop the pipeline, I prefer it simply returns <code>NULL</code>, hence <code>tryCatch()</code>. Also, I compute the rank of the package listed under the <code>fails_because_of</code> column and not the <code>package</code> column. If we go back to our example from before, <code>{AIUQ}</code> failed because <code>{SuperGauss}</code> failed, I’m actually interested in the rank of <code>{SuperGauss}</code>, and not <code>{AIUQ}</code> (which I way I went to all the trouble to scrape the failing dependency).
</p>
<p>
So, for now, when comparing to the table on Hydra, we have two further columns with the dependency that actually fails (or not, if the package fails on its own and not because of a dependency), and the rank of either the dependency that fails or the package itself.
</p>
<p>
Next, I’d like to see if PRs have already been opened and merged. For this, I use the <code>gh</code> tool, which is a command line tool to interact with GitHub repositories. I wrote the <code>get_prs()</code> wrapper around <code>gh</code> to list the opened or the merged PRs of the <code>nixpkgs</code> repository. This is what it looks like (and is defined <a href="https://github.com/b-rodrigues/nixpkgs-r-updates-fails/blob/0fe273dd234f0d32e5fae86630173ff42cce2d9f/functions.R#L8C1-L21C2">here</a>):
</p>
<pre><code>get_prs &lt;- function(state){

  output_path &lt;- paste0(state, "_prs.json")

  # Run the command
  system(paste0(
    "gh pr list --state=", state,
    " --search=rPackages -R NixOS/nixpkgs --json title,updatedAt,url &gt; ",
    output_path
  ))

  # Return path for targets
  output_path
}</code></pre>
<p>
Because the PRs follow the contributing guidelines, I can easily process the PRs titles to get the name of the package (I essentially need to go from the string “rPackages.SuperGauss: fixing build” to “SuperGauss”) using regular expressions. This is what happens in the <code>clean_prs()</code> function <a href="https://github.com/b-rodrigues/nixpkgs-r-updates-fails/blob/0fe273dd234f0d32e5fae86630173ff42cce2d9f/functions.R#L23">here</a>.
</p>
<p>
Most of what follows is merging the right data frames and ensuring that I have something clean to show. Finally, an <code>.Rmd</code> document gets compiled, which you can find <a href="https://github.com/b-rodrigues/nixpkgs-r-updates-fails/blob/0fe273dd234f0d32e5fae86630173ff42cce2d9f/r-updates-fails.Rmd">here</a>. This will get compiled to an <code>.html</code> file which is what you see when you click <a href="https://raw.githack.com/b-rodrigues/nixpkgs-r-updates-fails/targets-runs/output/r-updates-fails.html">here</a>.
</p>
<p>
This runs every day at midnight using GitHub actions (<a href="https://github.com/b-rodrigues/nixpkgs-r-updates-fails/blob/0fe273dd234f0d32e5fae86630173ff42cce2d9f/.github/workflows/compile_table.yaml">the workflow is here</a>) and then I use the <code>raw.githack.com</code> <a href="https://raw.githack.com/">here</a> to serve the rendered HTML file. So every time I push, or at midnight, the action runs, computes the package rank, checks if new PRs are available or have been merged, and the rendered file is immediately available. How’s that for serverless CI/CD?
</p>
<p>
If you are interested in using Nix to make your analyses reproducible, check out <a href="https://b-rodrigues.github.io/blog/index.html#category=nix">the other blog posts in this series</a> and join our small but motivated community of R contributors to <code>nixpkgs</code> on <a href="https://matrix.to/#/#r:nixos.org">Matrix</a>. If you are interested in the history of Nix, checkout this super interesting <a href="https://economicsfromthetopdown.com/2024/02/17/nixing-technological-lock-in/">blog post</a> by <a href="https://mastodon.online/@blair_fix">Blair Fix</a>.
</p>
<p>
If you’re interested into using project-specific, and reproducible development environments, give <code>{rix}</code> and Nix a try! Learn more about <code>{rix}</code> on its Github repository <a href="https://github.com/b-rodrigues/rix">here</a> or <a href="https://docs.ropensci.org/rix/index.html">website</a>. We wrote many vignettes that are conveniently numbered, so don’t hesitate to <a href="https://docs.ropensci.org/rix/articles/a-getting-started.html">get started</a>!
</p>
<p>
<em>Thanks to the colleagues of the Matrix nixpkgs R channel for the fruitful discussions that helped shape this blog post and for proof-reading.</em>
</p>


</section>

 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2024-02-29-nix_for_r_part_10.html</guid>
  <pubDate>Thu, 29 Feb 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Reproducible data science with Nix, part 9 – rix is looking for testers!</title>
  <link>https://b-rodrigues.github.io/posts/2024-02-02-nix_for_r_part_9.html</link>
  <description><![CDATA[ 




<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/kick_rix.png">
</p>
</div>
<p>
After 5 months of work, <a href="https://github.com/philipp-baumann">Philipp Baumann</a> and myself are happy to announce that our package, <code>{rix}</code> is getting quite close to being in a state we consider “done” (well, at least, for a first release). We plan on submit it first to <a href="https://ropensci.org/software-review/">rOpenSci</a> for review, and later to CRAN. But in the meantime, if you could test the package, we’d be grateful! We are especially interested to see if you find the documentation clear, and if you are able to run the features that require an installation of Nix, the <code>nix_build()</code> and <code>with_nix()</code> functions. And I would truly recommend you read this blog post to the end, because I guarantee you’ll have your mind blown! If that’s not the case, send an insult my way on social media.
</p>
<section id="what-is-rix" class="level2">
<h2 class="anchored" data-anchor-id="what-is-rix">
What is rix?
</h2>
<p>
<code>{rix}</code> is an R package that leverages Nix, a powerful package manager focusing on reproducible builds. With Nix, it is possible to create project-specific environments that contain a project-specific version of R and R packages (as well as other tools or languages, if needed). You can use <code>{rix}</code> and Nix to replace renv and Docker with one single tool. Nix is an incredibly useful piece of software for ensuring reproducibility of projects, in research or otherwise, or for running web applications like Shiny apps or plumber APIs in a controlled environment. The advantage of using Nix over Docker is that the environments that you define using Nix are not isolated from the rest of your machine: you can still access files and other tools installed on your computer.
</p>
<p>
For example, here is how you could use <code>{rix}</code> to generate a file called <code>default.nix</code>, which can then be used by Nix to actually build that environment for you:
</p>
<pre class="r"><code>library(rix)

path_default_nix &lt;- tempdir()

rix(r_ver = "latest",
    r_pkgs = c("dplyr", "ggplot2"),
    system_pkgs = NULL,
    git_pkgs = NULL,
    ide = "code",
    shell_hook = NULL,
    project_path = path_default_nix,
    overwrite = TRUE,
    print = TRUE)</code></pre>
<pre><code>## # This file was generated by the {rix} R package v0.5.1.9000 on 2024-02-02
## # with following call:
## # &gt;rix(r_ver = "5ad9903c16126a7d949101687af0aa589b1d7d3d",
## #  &gt; r_pkgs = c("dplyr",
## #  &gt; "ggplot2"),
## #  &gt; system_pkgs = NULL,
## #  &gt; git_pkgs = NULL,
## #  &gt; ide = "code",
## #  &gt; project_path = path_default_nix,
## #  &gt; overwrite = TRUE,
## #  &gt; print = TRUE,
## #  &gt; shell_hook = NULL)
## # It uses nixpkgs' revision 5ad9903c16126a7d949101687af0aa589b1d7d3d for reproducibility purposes
## # which will install R version latest
## # Report any issues to https://github.com/b-rodrigues/rix
## let
##  pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/5ad9903c16126a7d949101687af0aa589b1d7d3d.tar.gz") {};
##  rpkgs = builtins.attrValues {
##   inherit (pkgs.rPackages) dplyr ggplot2 languageserver;
## };
##    system_packages = builtins.attrValues {
##   inherit (pkgs) R glibcLocales nix ;
## };
##   in
##   pkgs.mkShell {
##     LOCALE_ARCHIVE = if pkgs.system == "x86_64-linux" then  "${pkgs.glibcLocales}/lib/locale/locale-archive" else "";
##     LANG = "en_US.UTF-8";
##     LC_ALL = "en_US.UTF-8";
##     LC_TIME = "en_US.UTF-8";
##     LC_MONETARY = "en_US.UTF-8";
##     LC_PAPER = "en_US.UTF-8";
##     LC_MEASUREMENT = "en_US.UTF-8";
## 
##     buildInputs = [  rpkgs  system_packages  ];
##       
##   }</code></pre>
<p>
You don’t need to have Nix installed to use <code>{rix}</code> and generate this expression! This is especially useful if you want to generate an expression that should then be used in a CI/CD environment for example.
</p>
<p>
But if you do have Nix installed, then you can use two great functions that Philipp implemented, which we are really excited to tell you about!
</p>
</section>
<section id="nix_build-and-with_nix" class="level2">
<h2 class="anchored" data-anchor-id="nix_build-and-with_nix">
nix_build() and with_nix()
</h2>
<p>
When you have a <code>default.nix</code> file that was generated by <code>rix::rix()</code>, and if you have Nix installed on your system, you can build the corresponding environment using the command line tool <code>nix-build</code>. But you can also build that environment straight from an R session, by using <a href="https://b-rodrigues.github.io/rix/reference/nix_build.html"><code>rix::nix_build()</code></a>!
</p>
<p>
But the reason <a href="https://b-rodrigues.github.io/rix/reference/nix_build.html"><code>nix_build()</code></a> is really useful, is because it gets called by <a href="https://b-rodrigues.github.io/rix/reference/with_nix.html"><code>with_nix()</code></a>. <a href="https://b-rodrigues.github.io/rix/reference/with_nix.html"><code>with_nix()</code></a> is a very interesting function, because it allows you to evaluate a single function within a so-called subshell. That subshell can have a whole other version of R and R packages than your main session, and you can use it to execute an arbitrary function (or a whole, complex expression), and then get the result back into your main session. You could use older versions of packages to get a result that might not be possible to get in a current version. Consider the following example: on a recent version of <code>{stringr}</code>, <code>stringr::str_subset(c(““,”a”), ““)</code> results in an error, but older versions would return <code>”a”</code>. Returning an error is actually what this should do, but hey, if you have code that relies on that old behaviour you can now execute that old code within a subshell that contains that older version of <code>{stringr}</code>. Start by creating a folder to contain everything needed for your subshell:
</p>
<pre class="r"><code>path_env_stringr &lt;- file.path(".", "_env_stringr_1.4.1")</code></pre>
<p>
Then, it is advised to use <a href="https://b-rodrigues.github.io/rix/reference/rix_init.html"><code>rix::rix_init()</code></a> to generate an <code>.Rprofile</code> for that subshell, which sets a number of environment variables. This way, when the R session in that subshell starts, we don’t have any interference between that subshell and the main R session, as the R packages that must be available to the subshell are only taken from the Nix store. The Nix store is where software installed by Nix is… stored, and we don’t want R to be confused and go look for R packages in the user’s library, which could happen without this specific <code>.Rprofile</code> file:
</p>
<pre class="r"><code>rix_init(
  project_path = path_env_stringr,
  rprofile_action = "overwrite",
  message_type = "simple"
)</code></pre>
<pre><code>## 
## ### Bootstrapping isolated, project-specific, and runtime-pure R setup via Nix ###
## 
## ==&gt; Created isolated nix-R project folder:
##  /home/cbrunos/six_to/dev_env/b-rodrigues.github.com/content/blog/_env_stringr_1.4.1 
## ==&gt; R session running via Nix (nixpkgs)
## * R session not running from RStudio
## ==&gt; Added `.Rprofile` file and code lines for new R sessions launched from:
## /home/cbrunos/six_to/dev_env/b-rodrigues.github.com/content/blog/_env_stringr_1.4.1
## 
## * Added the location of the Nix store to `PATH` environmental variable for new R sessions on host/docker RStudio:
## /nix/var/nix/profiles/default/bin</code></pre>
<p>
We now generate the <code>default.nix</code> file for that subshell:
</p>
<pre class="r"><code>rix(
  r_ver = "latest",
  r_pkgs = "stringr@1.4.1",
  overwrite = TRUE,
  project_path = path_env_stringr
)</code></pre>
<p>
Notice how we use the latest version of R (we could have used any other), but <code>{stringr}</code> on version 1.4.1. Finally, we use <code>with_nix()</code> to evaluate <code>stringr::str_subset(c(““,”a”), ““)</code> inside that subshell:
</p>
<pre class="r"><code>out_nix_stringr &lt;- with_nix(
  expr = function() stringr::str_subset(c("", "a"), ""),
  program = "R",
  exec_mode = "non-blocking",
  project_path = path_env_stringr,
  message_type = "simple"
)</code></pre>
<pre><code>## * R session not running from RStudio
## ### Prepare to exchange arguments and globals for `expr` between the host and Nix R sessions ###
## * checking code in `expr` for potential problems:
##  `codetools::checkUsage(fun = expr)`
## 
## * checking code in `expr` for potential problems:
## 
## * checking code in `globals_exprs` for potential problems:
## 
## ==&gt; Running deparsed expression via `nix-shell` in non-blocking mode:
## 
## 
## ==&gt; Process ID (PID) is 19688.
## ==&gt; Receiving stdout and stderr streams...
## 
## ==&gt; `expr` succeeded!
## ### Finished code evaluation in `nix-shell` ###
## 
## * Evaluating `expr` in `nix-shell` returns:
## [1] "a"</code></pre>
<p>
Finally, we can check if the result is really <code>“a”</code> or not:
</p>
<pre class="r"><code>identical("a", out_nix_stringr)</code></pre>
<pre><code>## [1] TRUE</code></pre>
<p>
<code>with_nix()</code> should work whether you installed your main R session using Nix, or not, but we’re not sure this is true for Windows (or rather, WSL2): we don’t have a Windows license to test this on Windows, so if you’re on Windows and use WSL2 and want to test this, we would be very happy to hear from you!
</p>
<p>
If you’re interested into using project-specific, and reproducible development environments, give <code>{rix}</code> and Nix a try! Learn more about <code>{rix}</code> on its Github repository <a href="https://github.com/b-rodrigues/rix">here</a> or <a href="https://docs.ropensci.org/rix/">website</a>. We wrote many vignettes that are conveniently numbered, so don’t hesitate to <a href="https://docs.ropensci.org/rix/articles/a-getting-started.html">get started</a>!
</p>


</section>

 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2024-02-02-nix_for_r_part_9.html</guid>
  <pubDate>Fri, 02 Feb 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Reproducible data science with Nix, part 8 – nixpkgs, a tale of the magic of free and open source software and a call for charity</title>
  <link>https://b-rodrigues.github.io/posts/2023-12-19-nix_for_r_part_8.html</link>
  <description><![CDATA[ 




<div style="text-align:center;">
<p>
<img src="https://b-rodrigues.github.io/assets/img/santa_tux.jpg" width="100%">
</p>
</div>
<p>
<em>This is part 8 of a series of blog posts about Nix. Check out the other parts <a href="https://b-rodrigues.github.io/blog/index.html#category=nix">here</a>. TLDR: free and open source software is one of the most important common goods with enormous positive externalities: if you want to help funding it, keep reading!</em>
</p>
<p>
I wanted to quickly discuss about <code>nixpkgs</code>, which is the collection of packages that can be installed using Nix. Why is a project like Nix and <code>nixpkgs</code> important, even if you don’t use Nix? In actuality, you may not realise it, but you very much benefit from projects like Nix even if you don’t use it. Let me explain.
</p>
<p>
<code>nixpkgs</code> is “just” a Github repository containing thousands upon thousands of Nix expressions. When installing a package, these expressions get evaluated, and the package in question gets installed. What <em>installed</em> means can vary: sometimes the package gets built from source, sometimes a pre-compiled binary package for your operating system gets downloaded and installed.
</p>
<p>
For example, <a href="https://github.com/NixOS/nixpkgs/blob/dce218f4f35440622d2056f93ddc335351763bb4/pkgs/development/libraries/quarto/default.nix">here</a> is the Nix expression that downloads and installs Quarto. This is an example of an expression that downloads the pre-compiled Quarto package from Quarto’s own Github repository, and then <em>installs</em> it. The installation process in this case is essentially making sure that Quarto is able to find its dependencies, which also get installed from Nix, and some R and Python packages to make Quarto work well with both languages also get installed.
</p>
<p>
Because Nix packages are “nothing but” Nix expressions hosted on Github, contributing to Nix is as simple as opening a PR. For example, <a href="https://github.com/NixOS/nixpkgs/pull/263108">here</a> is a draft PR I opened to prepare for the imminent release of Quarto <code>1.4</code>. My goal when I opened this draft PR was to get used to contributing to <code>nixpkgs</code> (this was my second or third PR to <code>nixpkgs</code>, and I did some rookie mistakes when opening my first ones) and also to make the latest version of Quarto available on Nix as quickly as possible. But this PR had an unexpected consequence: through it, we found a bug in Quarto, which was then fixed before the actual release of the next version!
</p>
<p>
You see, how these things work is that when software gets released, operating system specific packages get built downstream. In the case of Quarto, this is not entirely true though: the developers of Quarto release many pre-compiled packages for Windows, macOS and several Linux distribution themselves. But they don’t do so for many other operating systems (which is entirely normal: there’s just too many! So releasing pre-built binaries for the main operating systems is more than enough), so the maintainers of these other operating systems (or package managers) have to package the software themselves. In the case of scientific software like Quarto, this usually means that it must get packaged for the Conda package manager (popular among Python users) and Nix (and there’s certainly other package managers out there that provide Quarto for other <em>exotic</em> systems) (Note: in the case of Quarto, I think the Quarto devs themselves also package it for Conda, though).
</p>
<p>
Turns out that when trying to package the pre-releases of Quarto for Nix, we discovered a regression in the upstream code that would not only affect packaging for Nix, but also for other package managers. We opened an issue on <a href="https://github.com/quarto-dev/quarto-cli/issues/7344">Quarto’s issue tracker</a> and after some discussion, the bug was identified and adressed in a matter of hours. And now everyone gets to enjoy a better version of Quarto!
</p>
<p>
This type of thing happens quite a lot in the background of open source development. My mind always gets blown when I think about the enormous amount of hours that get put by hobbyists and paid developers into open source and how well everything works. Truly a Christmas miracle (but one that happens all around the year)!
</p>
<p>
But it’s not all good and perfect. Some software is more complex to package, and requires much more work. For example the RStudio IDE is one of these. It’s a complex piece of software with many dependencies, and while it is available on Nix, it can only be installed on Windows and Linux. If you’re a Nix user on macOS, you won’t be able to install RStudio, unfortunately. And, unfortunately also, if you install RStudio using the usual macOS installer, it won’t be able to find any version of R and R packages installed with Nix. This is because RStudio needs to be patched to make it work nicely with Nix (just like we have to patch and prepare Quarto to play well with Nix). And packaging Rstudio for Nix on macOS requires some expertise and hardware that we R users/contributers to Nix don’t have all have access to.
</p>
<p>
This is where I appeal to your generosity: I have contacted a company called Numtide which offers a packaging service. You tell them which software you want on Nix, they write the expression and open a PR to <code>nixpkgs</code>. But this costs money: so I started a Gofundme which you can find <a href="https://www.gofundme.com/f/package-rstudio-for-nix-on-macos-platforms">here</a> to fund this. The goal is 4500€, which would cover the work, plus Gofundme fees and interest rate risk. I stated in the Gofundme that if the goal was not reached until the end of the year, I would donate all the money to the R foundation, but I might extend it to end of January 2024 instead.
</p>
<p>
So here is my ask: if you want to help make free and open source software better, consider donating to this Gofundme! As explained above, even if you don’t use Nix, everyone can benefit from work that is done by everyone, be it upstream or downstream. And if the goal is not met, your donation will go to the R foundation anyways!
</p>
<p>
The link to the Gofundme is <a href="https://www.gofundme.com/f/package-rstudio-for-nix-on-macos-platforms">here</a>.
</p>
<p>
I hope you can help out with this and make free and open source available and better for everyone.
</p>
<p>
Many thanks, merry Christmas and happy new year!
</p>



 ]]></description>
  <category>R</category>
  <category>nix</category>
  <guid>https://b-rodrigues.github.io/posts/2023-12-19-nix_for_r_part_8.html</guid>
  <pubDate>Tue, 19 Dec 2023 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>
