Do You Have to get() Objects? | /en/2023/04/get-objects/

giscus 2023-04-13 13:11:10

Do You Have to `get()` Objects?

Last month I saw an interesting question on Stack Overflow, in which the OP wanted to print a series of data frames as tables, and tried double loops, which did not work:
for (i in c("CP", …

https://yihui.org/en/2023/04/get-objects/

👍 1 ▶

4 Comments

jepusto 2023-04-16 13:43:26

Nice insight. So if you go back and refactor your code as you suggest, would that be “DRY cleaning” it?

😄 1 ▶

yihui 2023-04-17 03:04:18

I didn't hear back from the OP, so I'm not sure if my guess was correct, i.e., there might be a full data frame from which he created the individual *_*_comb1 data frames. The code is definitely going to be much cleaner if he has the full data frame.

jsinnett 2023-04-17 22:27:12

I happen to use get all the time in the context of "getting" a variable that represents a column name in data.table. In my experience, this process is necessary in utilizing functions that process data.tables otherwise data.table has no clue what column I want to use.

Rudimentary example below:

ret_prop = function(dta, col_name, by_cols) {
  dta[, .(prop = round((sum(get(col_name))/.N)*100,0), by = by_cols]
}

I'm curious though, would you suggest a different way of doing this? I'm aware of the 'env' arg in data.table but it's only available in the dev version.

yihui 2023-04-17 23:10:52

Sorry, my knowledge on data.table is quite limited, so I don't know if there is a better way. I think your case is different than the one I mentioned in the post. You didn't create col_name as a global variable, so I think that should be a valid way to do this task.

iamyingzhou 2023-04-23 18:14:50

如果我需要读好几个非常大文件(csv文件大约5G)到环境中，

后续代码再获取他们的时候似乎只能只用get() 如果每次都重新读取需要很长时间, 这种情况有好的解决方法吗

yihui 2023-04-23 20:23:44

我不明白为什么不能直接用变量名而需要用 get() 呢？

iamyingzhou 2023-04-23 20:35:45

因为我要在自定义的函数内运行，比如我有5个大csv数据，data_liver data_heart data_brain data_lung data_kidney 。我写了一个函数， fun = function(X) { data = get(pasteo(“data_”，X )) ; some code 需要使用data write.csv(result, paste0(X，“.csv" ) }
lapply(c("liver"，”heart",...), fun) 这个例子也可以使用创建5个文件的list改写，但是我感觉木有我这样写更直观

yihui 2023-04-23 22:37:58

你这样用一系列全局变量也并没什么大问题，我个人会倾向于用列表：

all_data <- list(liver = ..., heart = ..., brain = ..., lung = ..., kidney = ...)
lapply(names(all_data), function(name) {
  result <- all_data[[name]]
  # 处理数据
  write.csv(result, paste0(name, '.csv'))
})

这样的好处在于：

代码更安全，因为如本文中所说，get() 可能会有坑，你需要小心处理 envir / mode / inherits 等参数，以确保你获取的真的是你想要的对象，而 all_data[[name]] 则不存在这个风险，它可以百分百保证给你所需的数据对象；
你不需要在 lapply() 的第一个参数再重复一遍数据名，而用 names() 动态获取就可以了。

iamyingzhou 2023-04-24 08:41:00

谢谢老师回复，受教了