Tuesday, May 3, 2011

To attach() or not attach(): that is the question

R objects that reside in other R objects can require a lot of typing to access. For example, to refer to a variable x in a dataframe df, one could type df$x. This is no problem when the dataframe and variable names are short, but can become burdensome when longer names or repeated references are required, or objects in complicated structures must be accessed.

The attach() function in R can be used to make objects within dataframes accessible in R with fewer keystrokes. As an example:

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
names(ds)
attach(ds)
mean(cesd)
[1] 32.84768

The search() function can be used to list attached objects and packages. Let's see what is there, then detach() the dataset to clean up after ourselves.

search()
> search()
[1] ".GlobalEnv" "ds" "tools:RGUI" "package:stats"
[5] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
[9] "package:methods" "Autoloads" "package:base"
detach(ds)

As noted in section B.4.5, users are cautioned that if there is already a variable
called cesd in the local workspace, issuing attach(ds), may not mean that cesd references ds$cesd. Name conflicts of this type are a common problem with attach() and care should be taken to avoid them.

The help page for attach() notes that attach can lead to confusion. The Google R Style Manual provides clear advice on this point, providing the following advice about attach():
The possibilities for creating errors when using attach are numerous. Avoid it.


After being burned by this one too many times, we concur.

So what options exist for those who decide to go cold turkey?

  1. Reference variables directly (e.g. lm(ds$x ~ ds$y))

  2. Specify the dataframe for commands which support this (e.g. lm(y ~ x, data=ds))

  3. Use the with() function, which returns the value of whatever expression is evaluated (e.g. with(ds,lm(y ~x)))

  4. (Also note the within() function, which is similar to with(), but returns a modified object.)


Some examples may be helpful.

> # fit a linear model
> lm1 = lm(cesd ~ pcs, data=ds)

> mean(ds$cesd[ds$female==1]) # these next three are equivalent
[1] 36.88785
> with(ds, mean(cesd[female==1]))
[1] 36.88785
> with(subset(ds, female==1), mean(cesd))
[1] 36.88785

In short, there's never an actual need to use attach(), using it can lead to confusion or errors, and alternatives exists that avoid the problems. We recommend against it.

In SAS, all procedures use the most recent data set or must reference a data set explicitly. Very roughly speaking, using attach() in R is like relying on the implicit use of the most recent data set. Our recommendation against attach() thus mirrors our use of the data= option throughout our books.

7 comments:

Henrik Bengtsson said...

To copy the elements of a data.frame, a environment or a list to the current environment (either locally inside a function or the global env), see attachLocally() of R.utils. It does not mess with the search() path.

My $.02

AJ Cann said...

OK, you've convinced me to get rid of attach() on StatsBytes (http://www.microbiologybytes.com/statsbytes). But:

> datatoplaywith = read.csv("http://www.microbiologybytes.com/statsbytes/datatoplaywith.csv", header=TRUE, sep=",")
> datatoplaywith
really boring data
1 1 1 0.752
2 2 17 0.860
3 3 289 0.266
4 4 4913 0.932
5 5 83521 0.629
6 6 1419857 0.831
7 7 24137569 0.131
8 8 410338673 0.383
9 9 6975757441 0.249
> names(datatoplaywith)
[1] "really" "boring" "data"
> min(really, data=datatoplaywith)
Error: object 'really' not found
> min(really, data="datatoplaywith")
Error: object 'really' not found
>
> # But:
>
> attach(datatoplaywith)
> min(really)
[1] 1

How do I get this to work without attach?

Ken Kleinman said...

Not all functions accept the data= option. But AFAIK all will work with the with() function:

> with(datatoplaywith, min(really))
[1] 1

AJ Cann said...

That's a shame. For the students I'm aiming at the syntax of with() is going to be pretty confusing, and telling them to use one syntax for some commands and another for others is a non-starter. I may have to stick with attach().

Ken Kleinman said...

I agree attach() is a lot simpler. And I don't even both with data=, myself.

I don't think with() should be too hard, though, really.

richierocks said...

Don't forget the transform function!

I wrote about this issue last week; apologies for shameless self-promotion:

http://4dpiecharts.com/2011/04/29/friday-function-triple-bill-with-vs-within-vs-transform/

The time taken to learn how to use with or within or transform is less than the time taken to debug the mess you make by using attach.

Ken Kleinman said...

Hi Richie-- Funny coincidence that we posted on this so shortly after you had. (FTR, Nick started this post a couple of months ago.)

Do you think there's any practical advantage to using transform() instead of within()?