Hi guys,
today I'm going to explain how to compare YafaRay speed between two code branches to test if the changes slow down or speed up the render time and how much.
My wife (Cecilia Scarinzi) is helping me out on this because she has a PhD in statistics, so thank you Cecilia!
Usually I see people comparing the averages obtained from two set of times but this procedure is not always correct from a statistical point of view and could lead to wrong conclusions, so we need a good method to know if a branch is faster or slower than another.
Obtain data setsFirst of all you will need a good data set to compare timings. To do this you need to minimize the noise (confounding variables in statistics) due to other software that can randomly use the processor during testing.
Some recommendation are:
- Use xml file format (this will remove the modeling software from the workflow) and render it using yafaray-xml from the command line;
- Do not use scenes with errors (black dots, white dots, whatever) because they will interfere with the math chain and give you bogus timings;
- Do not use a file taking only a bunch of seconds to render;
- Do not use the machine while performing tests and close as many software running as possible;
- Do not reboot your machine while testing a branch, it's better to run at least each branch test in a row;
- Take at least 30 render times from each branch you'd like to test (100 would be better but not always confortable);
So far there is little new. You should end with something like:
master Branch comparing Branch
359.096 349.386
345.314 350.702
345.824 348.253
346.044 355.043
350.024 346.722
352.892 349.888
346.351 347.519
349.442 347.665
354.111 348.630
347.200 352.521
346.909 353.778
351.875 345.577
347.462 379.539
346.754 345.517
347.308 345.280
352.646 347.573
349.509 349.363
350.690 349.305
350.516 346.348
345.589 351.596
350.948 348.439
346.399 346.796
352.833 349.303
352.157 349.570
347.919 346.693
352.787 352.425
356.976 350.538
350.710 345.471
348.593 347.679
352.092 361.098
346.223 352.349
Validate data setsEach YafaRay run does pretty much the same calculations on a given scene. This means that without the interference from other software running on the same machine it would take the same time to do the calculations each time.
But you have other software running on your test machine (some of which are critical to the operation of the machine itself) giving you noise. That noise will interfere increasing the render time by a certain amount of time. But, if you reduce that noise following basic rules from previous paragraph, the noise will give only a slight increase in rendering time for each run depending on how often the software is executed.
To make it short you should end up with an exponential distribution if you did things right, so we check now if our distributions are exponential.
NOTE: Some environments or sometimes on the same machine distribution fits best the lognormal [
http://en.wikipedia.org/wiki/Lognormal_distribution] case, I'll explain why this happens later, if necessary, but the exponential distribution fits well too these cases because exponential distribution generalize lognormal distributions.
To check if a distribution is Exponential [
http://en.wikipedia.org/wiki/Exponential_distribution] you look for the p-value [
http://en.wikipedia.org/wiki/P-value] of the method you choice and check if it's greater than a reference value. This reference value is usually 0.05 so we use this value from now on.
Here I'm using three methods you can find in SAS. Later I will use R, sorry, blame Cecilia for that!
master Branch
Goodness-of-Fit Tests for Exponential Distribution
Test ---Statistic---- -----p Value-----
Kolmogorov-Smirnov D 0.15251910 Pr > D 0.189
Cramer-von Mises W-Sq 0.13540035 Pr > W-Sq 0.159
Anderson-Darling A-Sq 0.75495381 Pr > A-Sq 0.171
As you can see we have every methods saying p-value is greater than 0.05 reference value we are using, so this distribution is Exponential.
comparing Branch
Goodness-of-Fit Tests for Exponential Distribution
Test ---Statistic---- -----p Value-----
Kolmogorov-Smirnov D 0.11343368 Pr > D >0.500
Cramer-von Mises W-Sq 0.08772650 Pr > W-Sq >0.250
Anderson-Darling A-Sq 0.55847087 Pr > A-Sq >0.250
Even here all the methods say that the p-value is greater than 0.05 reference value and we can say the second distribution is Exponential too.
Finally we know the data sets are good to analyze because both fit the exponential distribution test, otherwise you did something wrong during tests and you need to
collect data again.
Analyze data setsNow we know our data sets are good, but we need to check for statistically valid difference between them, that is population averages.
To do this we need to apply the Savage-score [
http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_npar1way_a0000000200.htm] test to our distributions because the distributions are exponential, we also need to apply exact method because we have few values [
http://www.ats.ucla.edu/stat/sas/library/exact.pdf]. If p-value of Chi-Square from Savage-score test is lower than our reference value (remember it is 0.05) we can assume we have two different distributions and we can compare them safely later.
Running savage-score test on our example we get:
Savage One-Way Analysis
Chi-Square 0.0026
DF 1
Pr > Chi-Square 0.9595
Well, here we can see we have a Chi-Square p-value of 0.9595 so we can tell both distribution are quite exactly the same. This means we cannot check differences between them.
If Chi-Square p-value is less than our reference value (0.05) we can finally and safely check differences between data sets.
Noteworthy is to compare the medians and not the averages between distributions because we have positively skewed distributions.
Finally, you can compare medians to check amount of speed improvements or worsening.
If we compared the averages of our data set (that is not statistically valid) with the old method, we got this:
Branch Mean
master_Branch: 349.780
comparing_Branch: 350.341
Speed worsening: 1%
But this value is not statistically valid as we know now, so the two branches have exactly the same speed.
Cheers,
Michele Castigliego