Nick's Voxel Blog: Video update for October 2015

Monday, 19 October 2015

Video update for October 2015

I've posted a new video to YouTube showing some of the speed improvements I've made recently.

Since the last video in Feburary I've moved the entire voxel -> mesh pipeline (except for the seam mesh generation) to the GPU via OpenCL and reworked how the volume is processed. I'm planning on a blog post to explain the changes but the main thing is that I no longer always sample the field at a 1:1 resolution. E.g. for a LOD1 node I now only sample a 128x128x128 volume instead of 8, and for LOD2 only 1 volume instead of 64. This has had a huge impact on the performance which has helped with other features, like the dragging operations you can see in the video.

14 comments:

Cody Cero19 October 2015 at 20:25
It seems you are some steps ahead of me. Part of the reason might be because I lost the will to work on anything for like a whole month but life is life.

I have yet to move the voxel -> mesh pipeline entirely to the GPU. Seam generation is still kind of broken, and I've only gotten a start to changing the sampling resolution to no longer be 1:1

What I have done is implement volumetric placement of resources (textures) (however limited that feature currently is). And next, while this might perhaps be a bad move, I figured out I can test if a chunk of any resolution is empty by sampling it a rather low resolution. Like, even if it's a 64 * 64 * 64 chunk I can test it with a 4 * 4 * 4 resolution check to see if it's empty, and I originally implemented this change on a 1:1 ratio, but had it handle the empty checks ahead of the actual chunk filling operations, so that I could maximize the amount of empties it can test, before it needed to voxelize an actual filled chunk. I did this because I could not run the second half of the voxelization (which I handle in a secondary thread) at the same time as I run a GPU operation. Why? Because Unity (I'm fairly sure it's Unity's fault at least).

Anyway... The 1:1 ratio thing is what I did at first. But then I figured out: Hmmm, if I can test even high res chunks at a low res for checking if it's empty, why not just increase the res and scale of the test proportionately to cover the same amount of space and overall sampling resolution of that space, with less individual GPU accesses. And using a Octree-based restructuring, that's exactly what I did...

However, that actually brings me to where I am now. I actually have a couple leftover bugs. One: I haven't been able to get modification of empty chunks working right, now that I've restructured how they are consuming space. In general I think it's simply a matter of figuring out where I made the modification, then subdivide down to chunk size space at the place I made the modification. And for when I completely empty out a chunk, I believe I need to subdivide to account for the newly emptied space.

I'm probably just getting the position wrong somehow... I initially made a mistake in my refactored empty space testing code in which I wasn't feeding the GPU the proper sampling position (I needed to offset the position by half the size of the empty pocket of space in question).

Technically there also seems to be a problem with my chunk deletion as well right now, but no other new bugs besides that.

Other than that, I made it so that the voxel scale of the world can be less than 1 by multiplying a few things by a floating point value. It was actually easier to implement that than I thought it would be. And I implemented a Space Colonization based Tree generation code that I haven't really expanded on nearly enough yet. I do know that the way I'm actually drawing the trees into the world needs to be completely refactored, as drawing each tree as an array of small individual primitive shapes that need to all be sampled by voxelization is WAY too slow. I haven't implemented a true procedural generation system for trees and other such objects yet, but I think I came up with an idea that has some potential quite recently.

Other than fixing up those leftover bugs, my next ideas for optimization of chunk loading include: Sampling multiple chunks of the same LOD all in one (which will also save some time in my empty space testing area as well) and eventually moving the rest of the voxel -> mesh process to the GPU so that I can finally get it off that secondary thread, which should double the speed of loading chunks that actually do get mesh in them. And MAYBE there is something I can do about making a better solution to generating collision detection. I'm not really sure about it really, I know that it can take like 30 to 40 ms to generate the collision mesh for a 64 * 64 * 64 res chunk using Unity's collision mesh.
ReplyDelete
Replies
Cody Cero19 October 2015 at 20:50
This comment has been removed by the author.
ReplyDelete
Replies
Unknown23 October 2015 at 12:53
(4th try to post here)
Thanks for DC CPU example... it slightly differs from dc.java, not sure if EQF works same, because of torn edges at subtracted geometry.

about empty space skipping, there may be an optimization:

void sample_grid_spherical_opt(i32_t x,i32_t y,i32_t z){
float sgn=Density_Func(x,y,z), rad=fabs(sgn);
grid[x][y][z]=sgn, grid_set[x][y][z]=true;
if(rad<3)return;// too close to surface - better compute real value
sgn=sign(sgn);
const vec3i cs(max( 0.0f,x-rad), max( 0.0f,y-rad), max( 0.0f,z-rad)),
ce(min(rootsize+1,x+rad), min(rootsize+1,y+rad), min(rootsize+1,z+rad));
loopi(cs.x,ce.x)
loopj(cs.y,ce.y)
loopk(cs.z,ce.z) if(!grid_set[i][j][k]){
val=length(vec3(x,y,z)-vec3(i,j,k));
if((val+3)<rad){
grid[i][j][k]=(rad-val)*sgn, grid_set[i][j][k]=true; // interpolate SDF value for grid
}
}
};
call it recursively until size 8 reached
with that, I got boost from 13.66 sec to 6.59 sec for tree root size 256 (about 50% of time)
it saves 16.656.995 function calls of SDF samples (3.081.360 instead 19.738.355) and do interpolation - this actually, an rough approximation because SDF not always return radius for empty sphere.
how much time you spend on one block generation? cpu|gpu
ReplyDelete
Replies
Nick Gildea24 October 2015 at 08:01
This comment has been removed by the author.
ReplyDelete
Replies
Tuck28 October 2015 at 05:37
That's an awesome filled video Nick! Love the progress!

How are you doing the LOD changes? I mean, i know the basics that you've been over before, but how do you tell each chunk to render what LOD? I've been toying with having each OctreeNode contain a Surface class and an LOD id, so that you can tell each Parent Octree what LOD should be displayed, and it can dive down.. but doing that, or having each octree node independantly looking up a distance from the player.. SO expensive! I'm struggling with getting just a system running that doesn't use more CPU than the DC component at this point.. I would LOVE to see a blog post on that as well!
ReplyDelete
Replies

Add comment

Nick's Voxel Blog

Monday, 19 October 2015

Video update for October 2015

14 comments:

Blogroll

About Me