View topic - Black Dots ghost

Black Dots ghost

Ask here your questions about development and about raytracing theory & implementation. Here you can post your patches for review.

Black Dots ghost

Post Sun Apr 26, 2009 12:35 pm

Hi all,
after days of debugging I came out to a conclusion about black dots, or at least to a kind of them.

My first result was about using double precision on the whole code they don't happened here, and omitting unsafe-math-optimizations from compiler switches again black dots are gone.

So, what's the problem?
unsafe-math-optimizations means the code is not checked about range violations in mathematical functions, so this pointed me to:
Increasing precision yafaray code does not commit errors in mathematical ranges so those errors happen really near the function range limits.
Let me explain this with sqrt. sqrt has a real solution for any number greater or equal than zero, but what happen when I do a sqrt for -1e-6? I get a range violation. So with no unsafe-math-optimizations the code is checked against this stuffs.
Where yafaray code is wrong then? Well, mainly when it does some assumption where the result is not correct if the precision is not sufficient and we are near a range limit.
For instance, cosine between two vectors is:
cos_theta = vector_a*vector_b / |vector_a|*|vector_b|
but if vector_a and vector_b are unit vectors we can simplify with:
cos_theta = vector_a*vector_b
this is true in theory, but in code this assumption could be not true in some cases near limits boundaries.
Infact, if you check it, you will see cos_theta could be strictly >1 in some cases due to precision fault.
So how we can solve this?
Well, the solution may vary, check if cos_theta is in range could be one, make a cos_vectors function could be another, the real matter is about checks on range boundaries.

A practical example:
In we have the OrenNayar function which does:
PFLOAT cos_to = N*wo;
here we are assuming that N and wo are unit vectors...
later in the code:
if (cos_to<=0.f) cos_to=0.f;
here we are checking for the lower limit of the valid range, and later:
tan_beta = sqrt(1.f - cos_to*cos_to) / cos_to;
we are committing an error because due to precision stuffs, cos_to could be > 1 and we have not checked for that.
In this case if cos_to is > 1 we have a sqrt(negative number) which results in a NaN!
This near boundaries problems are not usual, but could happen as we seen in black dots posts.

Still remain to choice the best way to check validity of mathematical function ranges we use.

Michele Castigliego
User avatar
Site Admin
Posts: 232
Joined: Thu May 29, 2008 11:06 am
Location: Turin, Italy

Re: Black Dots ghost

Post Thu May 07, 2009 1:40 am

I was curious as to whether or not there are any new developments with this bug?

Thanks :)
Posts: 36
Joined: Mon Jan 05, 2009 5:25 am

Re: Black Dots ghost

Post Sun Jun 03, 2012 7:07 pm


I found some interesting things which could be related to my previous post.

If you look at about line 413 you will find:

dir[0] += (0.01-cos_wi_Ng)*Ng;

which is obviously a trick. Now if I change that to more correct one (removing that ugly trick):

dir[0] -= cos_wi_Ng*Ng;

I have black dots on a scene I'm testing here.

So? You didn't get it? With that trick you just moved dots from cos_wi_Ng==0 to cos_wi_Ng==0.01!
That's terrible wrong.

Remove the trick (they are all over the code) and find the REAL domain problem.

Michele Castigliego
User avatar
Site Admin
Posts: 232
Joined: Thu May 29, 2008 11:06 am
Location: Turin, Italy

Re: Black Dots ghost

Post Fri Mar 15, 2013 12:26 pm

This is a small repository of scenes which are known to produce black dots, so you can test your builds and code changes against them:

User avatar
Posts: 3092
Joined: Tue Dec 20, 2005 10:39 am
Location: Spain

Re: Black Dots ghost

Post Wed Apr 03, 2013 1:25 pm

Fixes for Black and White Dots

For your information, I have found the cause of many of the black and white dots problems. I've tested this with the "horror gallery" scenes and it seems to work fine with both Linux 32 bits and Linux 64 bits

I suppose these fixes will not solve all the black/white dots problems, but I hope they help in most cases.

I've sent the fixes in this Pull Request:

I hope this helps!

Technical details:

The problem is located in the acos() function used in several places of the program. The acos() is the arccosine and its valid domain is [-1.0, +1.0]

However, sometimes it receives values "slightly" outside the domain, as for example +1.000000000000000001

That causes the acos() function to generate an invalid NaN result that is propagated over all the math until you get the black NaN dots in the render.

I have created a custom fAcos() function which does a domain check before calling the actual acos(), ensuring that the values are within the domain. If a value is outside the domain, it's "forced" inside.

The underlying causes of the "values outside the domain" are quite complex and surprising. It also explains why this is different in Linux 32 bits and Linux 64 bits, for example.

Some of the causes:

The compilation flags are "unsafe". This is good for speed, but many floating point operations are not guaranteed to be done correctly. However, we have to keep the unsafe flags or we would have a much slower yafaray, so we do the extra checks manually in yafaray.

Linux x86 (32 bits) compiles by default by using the CPU x87 floating point operations. Those operations are not IEEE 754 compliant. In fact, the x87 floating point operations use ALWAYS 80 bits for the x87 internal registers. That's why even with float 32bits numbers you may get some "garbage" digits where you would not expect them. Doing the manual domain checks should mitigate this, or even resolve it.

Linux x86_64 (64 bits) compiles by default by using the CPU SSE floating point operations and NOT the x87 for floating point operations. These are IEEE 754 complaint and their internal registers work as expected (32bits with floats and 64 bits with doubles). Therefore, the floating point operations may work differently in Linux with 64 bits than 32 bits causing the strange differences in the Horror Gallery. In any case, even using SSE you still have sometimes values outside the valid domain, so the manual domain checks should mitigate or resolve this.

WHITE DOTS (Inf or very big values)

In the pdf1D class in sample_utils.h, sometimes one array index was -1 causing an access to array out of bounds and therefore causing eventually invalid values, NaNs, Infs, etc. This is the line causing the problem

int index = (int) (ptr-cdf-1);

So I have added this fix:

int index = (int) (ptr-cdf-1);
if(index<0) index=0; <-- Fix to make sure the index never goes to negative values outside the array.

However, this should be investigated in the future to see why the index goes to -1 in the first place.

Other type of white dots problem happens when the CMake option FAST_MATH=ON

One of the reasons is that the math simplifications were written only for float 32bit numbers, but in the rest of the program they sometimes were used to calculate double 64bit numbers, resulting in bad calculations and white dots.

Also, sometimes the fSin() function generated return values outside its valid range [-1.0,+1.0] also causing other white dots.

I have created the double 64bit versions of all optimizations. I have also added checks to make sure the fSin() return value is in the correct range.

However, I think that we need to refine a bit better the double 64bit versions of the optimizations, but it seems to work fine now anyway.

For your reference, it's possible to force the compilation to use x87 or SSE instructions. For example, in Linux 32 bits, you can get the same results you get in a Linux 64 bits by compiling with the gcc flags: -msse2 -mfpmath=sse

I suppose these fixes will not solve all the black/white dots problems, but I hope they help in most cases.

Best regards! David Bluecame.
User avatar
David Bluecame
Posts: 460
Joined: Mon Jan 21, 2013 12:42 pm
Location: Spain

Return to Developers' Corner

Who is online

Users browsing this forum: No registered users and 2 guests