May 02, 2012

Installing pcalg

Attention conservation notice: Boring details about getting finicky statistical software to work; or, please read the friendly manual.

Some of my students are finding it difficult to install the R package pcalg; I share these instructions in case others are also in difficulty.

  1. For representing graphs, pcalg relies on two packages called RBGL and graph. These are not available on CRAN, but rather are on the other R software repository, BioConductor. To install them, follow the instructions at those links; to summarize, run this:
    source("http://bioconductor.org/biocLite.R")
    biocLite("RBGL")
    (Since RBGL depends on graph, this should automatically also install graph; if not, run biocLite("graph"), then biocLite("RBGL").)
  2. Now install pcalg from CRAN, along with the packages it depends on. You will get a warning about not having the Rgraphviz package. However, you will be able to load pcalg and run it. You should be able to step through the example labeled "Using Gaussian Data" at the end of help(pc), though it will not produce any plots.

    You can still extract the graph by hand from the fitted models returned by functions like pc --- if one of those objects is fit, then fit@graph@edgeL is a list of lists, where each node has its own list, naming the other nodes it has arrows to (not from). If you are doing this for the final in ADA, you don't actually need anything beyond this to do the assignment.

  3. Rgraphviz is what pcalg relies on for drawing pictures of causal graphs. Its installation is somewhat tricky, so there is a README file, which you should read.
    The key point is that Rgraphviz itself relies on a non-R suite of programs called graphviz. You will want to install these. Go to graphviz.org, and download and install the software. (If you use a Mac, the standard download also includes Graphviz.app, which is a nice visual interface to the actual graph-drawing functions, and what I use for drawing the DAGs in the lecture notes.)
  4. You have to make sure that your operating system will let other software (like R) call on graphviz. The way to do this is to add the directory (or folder) where you installed graphviz to the list of places your computer recognizes as containing executable programs --- the system's "command path". The README for installing Rgraphviz explains what you have to add to the path. (If you are a Windows user and do not know how to alter the command path, read this.)
  5. If you have R open, close it. (If you do not, it will probably not know about the new software you've just gotten the system to recognize.) Re-open R, and install Rgraphviz. The basic installation command is just
    source("http://bioconductor.org/biocLite.R")
    biocLite("Rgraphviz")
    The README for Rgraphviz gives some checks which you should be able to run if everything is working; try them.
  6. You should now be able to generate pictures of DAGs with pc and the other functions in pcalg; try stepping through all the examples at the end of help(pc).

When I installed pcalg on my laptop two weeks ago, it was painless, because (1) I already had graphviz, and (2) I knew about BioConductor. (In fact, the R graphical interface on the Mac will switch between installing packages from CRAN and from BioConductor.) To check these instructions, I just now deleted all the packages from my computer and re-installed them, and everything worked; elapsed time, ten minutes, mostly downloading.

Update, 30 April 2013: Some readers report problems with getting Rgraphviz to run (especially from the binary package) if the version of graphviz you have installed has a different version number than what Rgraphviz expects. It may be necessary to install an older version of graphviz than the latest release.

Advanced Data Analysis from an Elementary Point of View

Posted at May 02, 2012 21:30 | permanent link

Three-Toed Sloth