Transitioning from x86 to arm64 on macOS - experiences of an R user

Note: To avoid (questionable) third-party discussion tools, please post your thoughts and comments in an issue at https://codeberg.org/pat-s/pat-s.me/issues.

With the release of the M1Pro and M1Max chips and the new MacBook Pros, many more people are transitioning to the Apple Silicon Chips and with that, to a new platform architecture.

For years x86_64 was the architecture which most systems used. Most of these were driven by Intel CPUs, some by AMD ones. Apple’s new chips are based on a different architecture referred to as arm64 (with the 64 in both terms referring to the “bit” identifier). This change is substantial and of the major reasons for the improved performance and battery stats of Apples new chip.

As you might infer from this little introduction, this also causes some changes with respect to software. More specifically, everything needs to be rebuilt for the arm64 architecture and also clearly distinguished from the x86_64 one.

This post is mainly devoted to macOS users as arm64 is not yet really popular among other operating systems in the wild, even though also there are also Windows and Linux installations which are able to run on this architecture.

R

Double-check to install R for the arm64 architecture - there are two installers available, one for x86_64 and one for arm64. To prevent issues, I recommend using homebrew (see more information below) and execute brew install --cask r , which will install the arm64 version. You can verify this by looking at the output of the startup message, it should include aarch64-apple-darwin20.

R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin20 (64-bit)

Homebrew

If you’re on a Mac, I highly recommend using homebrew for all installations, system libraries (brew install <library>) and GUI installations (brew install —cask <app>). If you have already done so - great! Now, when transitioning to arm64, homebrew is not anymore installing everything into /usr/local but instead uses /opt/homebrew to distinguish x86_64 from arm64 installations.

The new /opt/homebrew path is somewhat non-standard as most software is not aware of it (in contrast to the old x86_64 path in /usr/local). What software sees, i.e., which programs are available, is usually determined by the $PATH environment variable.

If you are installing brew on a arm64 Mac, homebrew will add the following to .profile:

eval "$(/opt/homebrew/bin/brew shellenv)"

(Note: .profile is executed by all shells during startup in contrast to the shell-specific startup files like .bash_profile, fish.config, etc.)

This call executes the following (here for the fish shell, it will look different for other shells, e.g. bash):

set -gx HOMEBREW_PREFIX "/opt/homebrew";
set -gx HOMEBREW_CELLAR "/opt/homebrew/Cellar";
set -gx HOMEBREW_REPOSITORY "/opt/homebrew";
set -gx HOMEBREW_SHELLENV_PREFIX "/opt/homebrew";
set -q PATH; or set PATH ''; set -gx PATH "/opt/homebrew/bin" "/opt/homebrew/sbin" $PATH;
set -q MANPATH; or set MANPATH ''; set -gx MANPATH "/opt/homebrew/share/man" $MANPATH;
set -q INFOPATH; or set INFOPATH ''; set -gx INFOPATH "/opt/homebrew/share/info" $INFOPATH;

Here, the /opt/homebrew paths are added to your $PATH variable such that other programs are able to find brew installations.

Sometimes this is not picked up by some applications or they are using their own PATH environment variable instead of looking at the user-defined one. If you are facing issues at some point, check this setting and ensure /opt/homebrew/bin is defined in your path env var, e.g., by executing echo $PATH in the terminal.

RStudio

RStudio does not source the contents in ~/.profile when starting. Hence, $PATH looks as

Sys.getenv("PATH")
[1] "/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/Library/TeX/texbin:/opt/X11/bin:/Applications/RStudio.app/Contents/MacOS/postback"

You notice that our new /opt/homebrew/bin is missing in this list. While this is not necessarily a problem for most R operations, sometimes R tries to look for some binaries in this path (e.g., when using ccache for faster source installations).

Hence, we need to somehow force add /opt/homebrew/bin into RStudio’s $PATH. There are multiple ways how to do so, I’ll share my favorite one here: using the {startup} package from Henrik Bengtsson. This package gives you a lot of power with respect to the R startup. In this case, we want to add the path mentioned above but only if we’re running R on an arm64 macOS installation.

This can be done by adding a file ~/.Renviron.d/sysname=Darwin,machine=arm64/path. This file will only be executed if the system name evaluates to “Darwin” (which is the common identifier for macOS systems) and is running on an arm64 architecture.

In path we set

PATH="/opt/homebrew/bin:${PATH}"

The last step is to tell R to make use of the startup package when starting R, i.e., one needs to add

startup::startup()

into ~/.Rprofile.

So, checking again, the output in RStudio now looks as desired

Sys.getenv("PATH")
[1] "/opt/homebrew/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/Library/TeX/texbin:/opt/X11/bin:/Applications/RStudio.app/Contents/MacOS/postback"

R packages - source installations

Usually R packages are installed via binaries on macOS. Doing so is fast and works for most packages. However, when one wants to install from GitHub or other places than CRAN, packages need to be installed from source.

This is not an issue for packages with only R code, however, when the packages required compilation of C/C++/gfortran code or needs to link against local system libraries, it gets tricky.

Usually, when a required system library is missing and packages are installed from binaries, this causes an issue during runtime, i.e., when trying to load the package. Instead, when installing from source, this already causes an issue during installation. I prefer the latter as this ensures that my system is always able to install packages from source if needed.

So we have already learned that with the new arm64 platform things are located in new places. And yes, this might/will cause issues when installing from source.

For example, when installing the packages jpeg,you’ll see something like

install.packages("jpeg")
Installing package into ‘/Users/pjs/Library/R/arm64/4.1/library’
(as ‘lib’ is unspecified)
trying URL 'https://stat.ethz.ch/CRAN/src/contrib/jpeg_0.1-9.tar.gz'
Content type 'application/x-gzip' length 18596 bytes (18 KB)
==================================================
downloaded 18 KB

* installing *source* package ‘jpeg’ ...
** package ‘jpeg’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
ccache clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -falign-functions=64 -Wall -g -O2  -c read.c -o read.o
In file included from read.c:1:
./rjcommon.h:11:10: fatal error: 'jpeglib.h' file not found
#include <jpeglib.h>

This is because R is not able to find the jpeglib headers, which are needed during compilation. You see that it looks in -I/opt/R/arm64/include but cannot find it there. Instead, these are located in /opt/homebrew/include. Hence, one needs to add the following to ~/.R/Makevars, which controls where R looks for libraries:

CFLAGS=-I/opt/homebrew/include

So now, let’s try again:

install.packages("jpeg")
Installing package into/Users/pjs/Library/R/arm64/4.1/library(aslibis unspecified)
trying URL 'https://stat.ethz.ch/CRAN/src/contrib/jpeg_0.1-9.tar.gz'
Content type 'application/x-gzip' length 18596 bytes (18 KB)
==================================================
downloaded 18 KB

* installing *source* packagejpeg...
** packagejpegsuccessfully unpacked and MD5 sums checked
** using staged installation
** libs
ccache clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -I/opt/homebrew/include -c read.c -o read.o
read.c:21:44: warning: implicit conversion from enumeration type 'boolean' to different enumeration type 'Rboolean' [-Wenum-conversion]
    R_RegisterCFinalizerEx(dco, Rjpeg_fin, TRUE);
    ~~~~~~~~~~~~~~~~~~~~~~                 ^~~~
1 warning generated.
ccache clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -I/opt/homebrew/include -c reg.c -o reg.o
ccache clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -I/opt/homebrew/include -c write.c -o write.o
write.c:31:44: warning: implicit conversion from enumeration type 'boolean' to different enumeration type 'Rboolean' [-Wenum-conversion]
    R_RegisterCFinalizerEx(dco, Rjpeg_fin, TRUE);
    ~~~~~~~~~~~~~~~~~~~~~~                 ^~~~
1 warning generated.
ccache clang -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/opt/R/arm64/lib -o jpeg.so read.o reg.o write.o -ljpeg -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
ld: library not found for -ljpeg
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Oh no, another error! This time it can’t find ljpeg. Okay, so more path work needed. In this case, the following addition to ~/.R/Makevars helps us

LDFLAGS+=-L/opt/homebrew/opt/jpeg/lib

This tells R to look in /opt/homebrew/opt/jpeg/lib when searching for linkers. Finally, the installation of jpeg succeeds

install.packages("jpeg")
Installing package into/Users/pjs/Library/R/arm64/4.1/library(aslibis unspecified)
trying URL 'https://stat.ethz.ch/CRAN/src/contrib/jpeg_0.1-9.tar.gz'
Content type 'application/x-gzip' length 18596 bytes (18 KB)
==================================================
downloaded 18 KB

* installing *source* packagejpeg...
** packagejpegsuccessfully unpacked and MD5 sums checked
** using staged installation
** libs
ccache clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -I/opt/homebrew/include -c read.c -o read.o
read.c:21:44: warning: implicit conversion from enumeration type 'boolean' to different enumeration type 'Rboolean' [-Wenum-conversion]
    R_RegisterCFinalizerEx(dco, Rjpeg_fin, TRUE);
    ~~~~~~~~~~~~~~~~~~~~~~                 ^~~~
1 warning generated.
ccache clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -I/opt/homebrew/include -c reg.c -o reg.o
ccache clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -I/opt/homebrew/include -c write.c -o write.o
write.c:31:44: warning: implicit conversion from enumeration type 'boolean' to different enumeration type 'Rboolean' [-Wenum-conversion]
    R_RegisterCFinalizerEx(dco, Rjpeg_fin, TRUE);
    ~~~~~~~~~~~~~~~~~~~~~~                 ^~~~
1 warning generated.
ccache clang -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/opt/R/arm64/lib -L/opt/homebrew/opt/jpeg/lib -L/opt/homebrew/opt/libpng/lib -o jpeg.so read.o reg.o write.o -ljpeg -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
installing to /Users/pjs/Library/R/arm64/4.1/library/00LOCK-jpeg/00new/jpeg/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (jpeg)

NB: This is just one example where explicit path additions to CFLAGS and LDFLAGS are needed. A lot of this can also be controlled by the package to look in these places by default. Hence, you do not need to do this for all packages that make use of some system library. However, when you are facing troubles, this might be helpful as a pointer.

gfortran

Our friend gfortran is also creating some troubles. Since the official brew cask has been deprecated and integrated into the gcc formula, there were some issues with respect to R being able to locate gfortran and make use of it. See also my previous post on gfortran and macOS from March 2021.

Now gfortran is again at a new place, this time /opt/homebrew/bin/gfortran or /opt/R/arm64/gfortran, depending on if you want to go with the homebrew gfortran or the one from CRAN.

Hence, when installing a package which requires gfortran, e.g. glmnet, the installation first errors with the following

ld: warning: directory not found for option '-L/opt/R/arm64/gfortran/lib/gcc/aarch64-apple-darwin20.2.0/11.0.0'
ld: warning: directory not found for option '-L/opt/R/arm64/gfortran/lib'
ld: library not found for -lgfortran

There are two options how to get gfortran support on macOS arm:

  1. Using the gfortran installation that comes included with the R installation for arm64
  2. Using the homebrew gfortran installation

For 1. we first need to manually download and install a static gfortran bundle:

curl -O https://mac.r-project.org/libs-arm64/gfortran-f51f1da0-darwin20.0-arm64.tar.gz
sudo tar fvxz gfortran-f51f1da0-darwin20.0-arm64.tar.gz -C /

For 2. we need to tell R explicitly where to look for gfortran. Hence, the following additions to ~/.R/Makevars are needed:

# homebrew
FLIBS   =-L/opt/homebrew/opt/gfortran/lib
F77     = /opt/homebrew/bin/gfortran
FC      = /opt/homebrew/bin/gfortran

CFLAGS   = -I/opt/homebrew/include
CPPFLAGS = -I/opt/homebrew/include
CXXFLAGS = -I/opt/homebrew/include

Which approach is better you’re asking? Hard to say! While the CRAN asset is well tested within the CRAN build chain, it requires a manual download and does not update itself (which can also be a feature for in some cases). The homebrew installation is dynamic and requires manual linking in ~/.R/Makevars instead. Both have their pros and cons and it’s your choice here.

BLAS

The arm64 version of R also comes with a new Basic Linear Algebra Subprograms (BLAS) library which seems to speed up numerical calculations up to 3200%. To make use of this new BLAS library, do the following

cd /Library/Frameworks/R.framework/Resources/lib/

# create a symbolic link pointing libRblas.dylib to the optimized BLAS implementation
ln -s -i -v libRblas.vecLib.dylib libRblas.dylib

If you ever want to revert this, do

cd /Library/Frameworks/R.framework/Resources/lib/
ln -s -i -v libRblas.0.dylib libRblas.dylib

This gem was first shared on the r-sig-mac mailing list, specifically in this thread.

rJava

The most trouble-free Java installation might still be Java11. To install it via homebrew, do

brew install opendjk@11

Next, it is important to execute the suggested command from the post-install message:

sudo ln -sfn /opt/homebrew/opt/openjdk@11/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-11.jdk

Otherwise most libraries won’t be able to find Java in the expected location (with rJava being one of them).

If you want to be able to install rJava from source, you need to download the custom pcre2 library from the libs4-arm64 and move it to /opt/R/arm64/lib:

# version will most likely change in the future
curl -s -O https://mac.r-project.org/libs-arm64/pcre2-10.34-darwin.20-arm64.tar.gz
tar xzf pcre2-10.34-darwin.20-arm64.tar.gz
cp opt/R/arm64/lib/lib* /opt/R/arm64/lib/

# clean
rm -rf opt/
rm pcre2-10.34-darwin.20-arm64.tar.gz

OpenMP

Support for OpenMP parallelization in R packages which support it (e.g. data.table, or fst) can be enabled as follows:

First, install libomp via brew:

brew install libomp

Next, add some options to ~/.R/Makevars:

LDFLAGS += -L/opt/homebrew/opt/libomp/lib -lomp
CPPFLAGS += -Xclang -fopenmp

This will enable OpenMP support using the libomp formula via brew which auto-updates itself. I favor this approach over the manual downloads of static openMP builds from https://mac.r-project.org/openmp/.

Virtual Machines / Parallels

Parallels is probably the most widely used VM software on macOS. When transitioning from an x86_64 installation to a new arm64 installation, VMs cannot be ported due to the architecture mismatch. Instead, one needs to reinstall and remap/copy the existing data into the new instances. See the official “how to” doc from Parallels.

This also requires the architecture of the used guest operating systems to be based on arm_64, which is not so easy: Windows does not yet provide “official” installers for arm_64 and one needs to register for the Insiders program to get a working ISO image.

However, there is a workaround. https://uupdump.net/ provides bundles to create any Windows ISO you can image, no matter which OS you are running. You can use it to create a copy of Windows 11 home for the arm64 architecture and use it to install a Parallels VM.

Of course you can also use Windows 10 instead of 11 - however, directly installing Windows 11 prevents you from going through update troubles - and Windows 11 is already waiting around the corner.

Switching between x86_64 and arm64 R installations

To switch between multiple R versions on macOS including support for different architectures, you can check out the following tools:

Kudos to Apple

The user experience is on the new machines is astonishing. I’ve upgraded from a fairly recent machine (MBP 13’ early 2020) and the difference is astonishing. There are many components which have been updated but my two favorites are the reduced heat on the body and the increased battery life - followed by the amazingly fast CPU and SSD speeds.