Home Comments Thread
New Thread

8 Comments

giscus-bot giscus-bot 2022-12-16 05:38:35
访客 *五脚星* @ 2007-08-11 12:41:19 写道:

😎 sofa!

giscus-bot giscus-bot 2022-12-16 05:38:36
Guest *rtist* @ 2007-08-13 09:57:29 originally posted:

Did you try something on the netflix data?
I got many troubles on it... for such a big data set.
Creating a table in MySQL seems to need more than a day's harddisk noisy "clicking"...

yihui yihui 2022-12-16 05:39:24

I haven't tried that data yet... But I don't think my method above will be effective for netfix data, because I find there're only some three or four variables. However, ignoring the customer id (they are randomly-assigned) might save some time.

BTW, I didn't check the data carefully... It seems that there are only two predictors? ❓ MovieID & Date?

Originally posted on 2007-08-13 15:22:13

giscus-bot giscus-bot 2022-12-16 05:38:37
Guest *rtist* @ 2007-08-13 19:48:12 originally posted:

My question is naive: how will you make a databases 'efficiently' from those 17770 text files, before using any of your stuffs? There are 3~4 predictors: movieID, customerID, ratingDate, and movieReleaseDate. But the reference method only used movieID and customerID. The customerID is an important predictor.

yihui yihui 2022-12-16 05:39:25

It took me less than an hour to merge these 17770 files into a single csv file (using my Windows XP with 512M memory). You just read these files and write them into a MySQL database? I didn't try it.

Originally posted on 2007-08-14 09:24:01

giscus-bot giscus-bot 2022-12-16 05:38:38
访客 *littlesas* @ 2008-11-23 20:25:51 写道:

谢大哥,你好厉害,我在学R,向你看齐。

yihui yihui 2022-12-16 05:39:26

一般一般……别向我看齐,我又不是什么标杆人物……

——原帖发布于 2008-11-24 14:45:13

giscus-bot giscus-bot 2022-12-16 05:38:39
访客 *myli* @ 2009-01-22 12:27:22 写道:

无意中看到你的博客,收获颇多。
我不是统计出身,但因在数据公司打混,所以很想把统计学精了,以便用于工作中。
真希望能和你多交流哈, 👍

@licw58

giscus-bot giscus-bot 2022-12-16 05:38:39
访客 *king64* @ 2009-03-24 06:38:32 写道:

Yihui,看你的东西长学问了!

如何是Access格式的数据库如何做?我尝试修改了下面语句,但不成功!
channel = odbcConnect("Text Files")
print(sqlColumns(channel, "x.csv")[, c(1, 3, 4, 6)])

yihui yihui 2022-12-16 05:39:26

RODBC默认有对Access数据库的支持,参见ocbcConnectAccess函数;本文的文本数据是特殊“数据库”,因此我自己建立了特殊的ODBC连接。

——原帖发布于 2009-03-24 11:08:39

giscus-bot giscus-bot 2022-12-16 05:38:40
访客 *Stivensen295* @ 2012-08-20 08:59:09 写道:

我从大哥这里学习了很多R的应用,受益匪浅。伴随大哥的资料也顺利完成了研究生学业。愿大哥有更多精品问世。

yihui yihui 2022-12-16 05:39:27

谢谢!大家都在陆续毕业,就我至今还在学校混日子啊。

——原帖发布于 2012-08-21 03:31:26

giscus-bot giscus-bot 2022-12-16 05:39:28
访客 *zilhua 裴* @ 2012-08-29 09:23:08 写道:

新人,学习中

Sign in to join the discussion

Sign in with GitHub