One Fewer Dependency of knitr | /en/2023/01/knitr-stringr/

giscus 2023-01-06 16:41:30

It will no longer depend on the stringr package

Just before the 2022 Christmas, we finally fulfilled a knitr request from 2018 by Hugh Parsonage (thanks, Hugh). That is, remove the stringr dependency in knitr. The original motivation was to make it …

https://yihui.org/en/2023/01/knitr-stringr/

👍 1 ▶

1 Comment

scottkosty 2023-01-06 16:41:31

Thanks for this writeup, Yihui. Do you know if there is any attempt at automated performance tests for R packages? For example, I don't think "testthat" can currently be used for this. It is a hard problem to solve, since the results (time durations) are stochastic. So maybe there is no automated approaches, but still was curious if you knew of any. I would add a feature request to "testit" but I don't think it falls under the goals :)

yihui 2023-01-06 18:14:22

I agree it's a tricky problem. As you said, the time duration can be stochastic, so you can't really do an accurate test. On the other hand, it's also tricky to set an appropriate baseline, i.e., the target to which the performance of the new code is compared. Benchmarking in general is tricky because there are often too many factors that can have different impact on performance. Take the string processing functions in this post as an example: their performance depends a lot on the size of the input. For small documents, I don't think there will be much difference. To benchmark fairly and meaningfully, we need to know the common size of knitr input documents (well, a "common size" may not exist, and perhaps I should say the "statistical distribution").

BTW, you may have heard this from Donald Knuth:

[...] premature optimization is the root of all evil (or at least most of it) in programming.

scottkosty 2023-01-06 19:05:32

Thanks for those thoughts, Yihui! That all makes sense. Although I am aware of Knuth's quote, no matter how often I remind myself of it the dark forces are always tempting me :)

Actually, the reason I want automated performance tests is because I could then easily ignore those premature optimization voices in my head. Sometimes I wonder "hmmm I don't think this will affect performance for any real use case if I do it this more readable and robust way, but am I sure?". If I had tests, then I could essentially forget about performance (unless a test raises a red flag) and focus on other important parts of coding.

yihui 2023-01-06 19:35:42

Yes, it's extremely tempting to optimize code. For me, I rarely optimize my code unless someone reminds me of doing so by showing evidence, or I notice unbearable slowness. Both situations have been rare.

The magnitude of the time also matters a lot. It's pointless to me to speed up code that takes very little time to run (e.g., the 100x speed up from 0.01 second to 0.0001 second is often not worth it).

scottkosty 2023-01-06 19:46:37

Agreed. I actually try to apply the "don't optimize prematurely" lesson I've learned in programming to non-programming parts of my life. For example, when driving I don't necessarily take the fastest route (which might be faster by just a few seconds) that my phone tells me but rather the one with the least turns (and enjoyable scenery).

yihui 2023-01-06 20:14:00

That's exactly what I intended to say, too, but I was busy with other things, so I didn't generalize.

It's hard to hold the urge. Sometimes I do rush for the traffic lights that are turning yellow not far ahead (I rush only with enough certainty and definitely won't risk my life or car), and am not patient enough to fully stop at stop signs when the crossing is "apparently" empty. I clearly know the difference is on the magnitude of seconds instead of hours, but still...

scottkosty 2023-01-07 03:34:03

It's funny how programming affects the way we think through life. I never thought I would be a programmer, and I certainly never thought that what I learned from programming would affect other parts of my life.