On the Gory Loops in R | /en/2010/10/on-the-gory-loops-in-r/

yihui 2022-12-16 19:38:14

https://yihui.org/en/2010/10/on-the-gory-loops-in-r/

5 Comments

giscus-bot 2022-12-16 19:38:15

Guest *Eric* @ 2010-11-16 15:29:19 originally posted:

Another solution:

x2 = c(1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 
     1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 
     1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 
     1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0)
m=5
# stop if m>length(x2)
M=matrix(x2[1:(floor(length(x2)/m)*m)],ncol=m,byrow=TRUE) #handle case: length(x2) not a multiple of m 
table(apply(M,1,paste,collapse=""))

Eric

giscus-bot 2022-12-16 19:38:22

Guest *Li Yi* @ 2011-04-11 14:50:49 originally posted:

This will not get the desired result... What you will get is actually part of the results. Since the patterns that span across consecutive rows are not counted.

giscus-bot 2022-12-16 19:38:15

Guest *An Tran Duy* @ 2010-11-16 16:55:48 originally posted:

Good post! I would suggest the students to read this:
Uwe Ligges and John Fox. R Help Desk: How can I avoid this loop or make it faster? R News, 8(1):46-50, May 2008. URL: http://cran.r-project.org/doc/Rnews/Rnews_2008-1.pdf

For the function that counts 0 and 1 in a vector, I think it would be better to compare the loop with the vectorization approach. Bellow I made an example test.

freq.tab <- function(x) {
vec <- c(sum(x == 0), sum(x ==1))
names(vec) <- c("0", "1")
vec
}
  system.time(replicate(100000, freqTable01(x2)))
  user     system   elapsed 
  73.33    0.35      73.70 

> system.time(replicate(100000, freq.tab(x2)))
   user    system  elapsed 
   1.28    0.00      1.30

yihui 2022-12-16 19:38:19

Thanks! This is a nice reference.

Originally posted on 2010-11-16 21:17:13

giscus-bot 2022-12-16 19:38:20

Guest *An Tran Duy* @ 2010-11-16 22:14:03 originally posted:

Correction: the function for counting 0 and 1 in a vector that I submitted was as follows (I guess it was modified when copying and pasting)

freq.tab <- function(x) {
vec <- c(sum(x == 0), sum(x ==1))
names(vec) <- c("0", "1")
vec
}

Thanks!

yihui 2022-12-16 19:38:21

Fixed. Thanks!

Originally posted on 2010-11-16 23:01:59

giscus-bot 2022-12-16 19:38:16

Guest *Guest* @ 2010-11-16 22:20:11 originally posted:

You are right, not vectorizing is one of the primary hurdles to gaining R proficiency. I've used R for years, and if I find myself writing a loop, I know that my code is most-likely inefficient and I'll start thinking about a better way to get the same result. I also know I'm likely not doing things quite right when I find myself sending scalars to functions that take vectors or dataframes, or when I try to keep track of an indexing variable manually (often wishing that R had ++ and -- operators). So, students should realize that this isn't only a problem for beginners, but rather that more experienced users are more adept at immediately recognizing these patterns in their and others' code.

That said, loops offer some benefits, even when the operation could be vectorized:

For small datasets, if I can't immediately see how to vectorize the operation, I'll just write a loop in the interest of getting the solution done so that I can get on with the analysis. As you showed above, with very small datasets there isn't much of a penalty when there are only a few rows of data.
What you call "elegant" can also seem arcane and terse to non-R users, especially for multi-function, subscripted statements. When dealing with colleagues (especially SPSS- or Stata-using co-authors), it is sometimes preferable to write loops so that they can better understand what is going on. Most of my co-authors use, or at least can read, R now, so this is less of an issue.
Especially for beginners, loops are often less error-prone than vectorized operations. They will naturally find incentives to minimizing explicit looping when they analyze their first 100K+ row dataset.

giscus-bot 2022-12-16 19:38:17

Guest *Marc Taylor* @ 2010-12-18 11:05:31 originally posted:

Another interesting aspect of loops in R: I have noticed a huge difference in the speed of a loop depending on how the results are recorded. Perhaps this is just a problem for an inexperienced programmer, but I used to record the results of some loops as an vector that increased in length with each loop:

for(i in 1:number_of_iterations){
   results<-rbind(results, 2*i)
}

However, this means that the amount of memory required for the vector "results" is changing after each iteration, which increases the computing time a lot. A better loop is structured like this:

results=NA*c(1:number_of_iterations)
 for(i in 1:number_of_iterations){
   results[i]<-2*i
}

giscus-bot 2022-12-16 19:38:18

Guest *tobiy* @ 2011-03-05 15:34:13 originally posted:

in regard to the task of writing a function that counts the number of '1' in a vector:

vec <- c(1, 0, 1, 1, 0, 0) # create test vector
length(which(vec == 1)) # count number of Ƈ'

giscus-bot 2022-12-16 19:38:23

Guest *Henning Reetz* @ 2012-01-10 00:14:31 originally posted:

If it is only counting '0' and '1' in a vector, why not use the sum function?

vec <- c(1, 0, 1, 1, 0, 0) # create test vector
sum(vec) # gives the number of Ƈ's