Some large datasets are pushing memory and some functions I’m writing to the limit. I wanted to ask some questions about subsetting, of matrices and arrays in particular:
- Does defining a variable as a subset of another lead to copy? For instance
x <- matrix(rnorm(20*30), nrow=20, ncol=30)
y <- x[, 1:10]
Some exploration with object_size
from pryr
seems to indicate that a copy is made when y
is created, but I’d like to be sure.
- If I enter a subset of a matrix/array as argument to a function, does it get copied before the function is started? For instance in
x <- matrix(rnorm(20*30), nrow=20, ncol=30)
y <- dnorm(0, mean=x[,1:10], sd=1)
I wonder if the data in x[,1:10]
are copied and then given as input to dnorm
.
I’ve heard that data.table
allows one to work with subsets without copies being made (unless necessary), but it seems that one is constrained to two dimensions only – no arrays – that way.
Cheers!