Adaptive Algorithm for Accelerating Direct Isosurface Rendering on GPU

2018-10-16 07:53SergeyBelyaevPavelSmirnovVladislavShubnikovNataliaSmirnova

Sergey Belyaev | Pavel Smirnov | Vladislav Shubnikov | Natalia Smirnova*

Abstract—Direct isosurface volume rendering is the most prominent modern method for medical data visualization.It is based on finding intersection points between the rays corresponding to pixels on the screen and isosurface. This article describes a two-pass algorithm for accelerating the method on the graphic processing unit (GPU). On the first pass, the intersections with the isosurface are found only for a small number of rays, which is done by rendering into a lower-resolution texture. On the second pass, the obtained information is used to efficiently calculate the intersection points of all the other. The number of rays to use during the first pass is determined by using an adaptive algorithm, which runs on the central processing unit (CPU) in parallel with the second pass of the rendering. The proposed approach allows to significantly speed up isosurface visualization without quality loss. Experiments show acceleration up to 10 times in comparison with a common ray casting method implemented on GPU. To the authors’knowledge, this is the fastest approach for ray casting which does not require any preprocessing and could be run on common GPUs.

1. Introduction

Volume rendering techniques are important for medical data visualization. They provide a crucial means for performing visual analysis of a patient’s anatomy for diagnosis, preoperative planning, and surgical training. Direct methods do not require intermediate representation generating. They include direct volume rendering and direct isosurface rendering. The first method evaluates the volume integral for all pixels of the final image; this method is very convenient for exploring 3-dimensional (3D) data sets. The second one visualizes an isosurface level selected by the user. It requires less computational power, since to visualize the isosurface, one only needs to find intersection points of the rays with it. This method is based on the algorithm called raycasting. Under this algorithm, the intersection is found by moving through the data set along a ray passing through a given pixel with some step length until a value exceeding the isosurface level is found.

Raycasting is usually accelerated using preprocessing, wherein auxiliary data structures are constructed for culling the empty blocks, shortening the distance traveled along rays, or varying the step length along the rays,during runtime. The proposed method, on the other hand, does not require preprocessing. It constructs the necessary auxiliary data on the first step of the rendering process for every frame. The less pixels the first step involves, the faster it is; however, if too few pixels are used, artifacts begin to appear. To find out the minimum required number of pixels needed to prevent the formation of artifacts, an adaptive algorithm is used that runs on the central processing unit (CPU) in parallel with the second step of the rendering process.

2. Related Works

There are two approaches for visualizing 3D arrays. Some of them involve rasterizing the polygons obtained from the array, e.g. marching cubes[1]. A detailed overview and comparison of those algorithms were presented in [2]. The advantage of this approach is that polygonal models can be rendered very quickly. The disadvantage,however, is that the raw data has to be preprocessed before visualizing. This can be a problem if the transfer function changes or the raw data is to be plotted dynamically. Although there were works[3],[4]dedicated to fast triangulation on the graphic processing unit (GPU), the other approach—direct isosurface or volume rendering—is preferred in those cases.

The most flexible and popular method for direct isosurface rendering is raycasting, which is also used in direct volume rendering. GPU-based raycasting was presented in [5]. It made use of the cube proxy geometry (the dataset’s bounding box) to determine the start and the end points of the ray. This method does not require preprocessing and can be used to visualize raw data directly. Unfortunately, it is very slow. For every pixel, it requires finding the intersection of the ray and isosurface by tracing the ray from start to end. A significant amount of time is wasted on processing rays that do not intersect the isosurface at all. In order to accelerate the finding of intersections and cull the rays that do not intersect the isosurface, most authors fall back on preprocessing.

In [6], the isosurface was covered in a set of cuboids during preprocessing, which allowed one to cull the empty blocks during runtime and to reduce the ray’s length. In [7], these blocks were assembled into an octree structure. In [8], a min-max octree structure was introduced. Other variants of octree structures were examined in[7], [9] to [16]. An alternative structure that serves the same purpose is the kd-tree[17]-[19].

Recently, a lot of effort has been spent on reducing the preprocessing time by making use of new GPU functions[15]-[23]. These works use a histogram pyramid structure, an analog of the mipmap structure. A histogram pyramid lets one convert a 3D volume into a point cloud entirely on the graphics hardware[24]. The algorithm reduces a highly sparse matrix with N elements to a list of its M active entries in O(N+Mlog N) steps. It should be noted that this structure is also used to optimize triangulation on GPU[3]. The method that was presented in [25] is the peak-finding method, which accelerates direct volumetric rendering by determining the isosurfaces in advance. In this case, it is desirable to not only determine the ray’s starting point, but also all the points where it intersects the isosurface. Of the works examined above, this can be done with the methods from [22] and[23]. Other approaches to solving this problem were examined in [21] and [26]. In [21], the approach presented is the pre-integration approach[27],[28], in which it is the isosurfaces, and not the calculated integrals, that are saved. In[26], a triangular net is built during preprocessing, and is later rendered into an A-buffer[29]. The lists in the A-buffer were sorted by the distance to the viewer. Various implementations of the A-buffer on GPU using the K- and G-buffers can be found in [30].

Some of the methods listed above can be used efficiently for visualizing large amounts of data that does not fit fully into the GPU memory. An overview of processing big data sets can be found in [31]. The presented algorithm is independent of the input size.

Another class of problems that use direct isosurface raycasting is multifield data visualization. Such data is very common in medicine, for instance that in [32]. From the viewpoint of raycasting, they are special in the regard that several isosurfaces need to be handled simultaneously. In [33], where fiber surfaces were visualized, triangular nets constructed during preprocessing were used for this purpose.

Works dedicated to the acceleration of raycasting without using preprocessing data include [25], [34], and [35].They used occlusion frustums as proxy geometry, obtained from a previously rendered frame. Unfortunately, this approach can handle only coherent changes in view parameters, and is unusable for cases where the transfer function changes rapidly or the input data changes completely. It also performs poorly on multifield data visualization, as it only registers the first intersection of a ray and an isosurface.

An alternative approach to increasing the frame rate regardless of whether preprocessing is done was presented in [36] to [40]. They attempted to keep up the required frame rate by worsening the quality of the resulting image to the least extent possible. To do that, [36], [37], and [40] used analytical performance modeling(regression, genetic algorithms, and performance skeletons), whereas [38], [39], and [41] used machine learning approaches.

3. Algorithm Description

The work in [5] outlines the idea of using a bounding box that bounds the given volume to determine the start and end points of rays. In Fig. 1, these rays are denoted with ri, and their start and end points are denoted with fiand li, respectively.

The runtime of the raycasting algorithm depends on the step length along the rays and the number of rays. The main idea of the proposed algorithm is to reduce the number of rays. In accordance with it, the ray-isosurface intersections are found for a smaller number of rays and then obtained information is used to predict where the remaining rays will intersect the isosurface.

The idea is implemented using a two-pass algorithm. On the first pass, the rendering is done into an auxiliary texture T, which has a resolution n times lower than the screen in both directions.

The proposed algorithm is described in details as below. To simplify the explanation, let us examine the case of a 1-dimensional texture, letting n=2 without loss of generality, where n is the ratio between resulting image size and the size of the auxiliary texture T. Fig. 2 depicts the various ways that two neighboring texels’ depths may be related.

Fig. 1. Raycasting: The volume is sampled at regular intervals between the starting (f0 to f4) and ending (l0 to l4) points obtained via rasterization.

Fig. 2. Isosurface sampling: (a) concave, (b) convex, (c) object outline with no intersection, (d) intersected object outline,and (e) no isosurface.

The z-axis begins at the points where the rays enter the bounding box. In Fig. 1, these are denoted with fi. In Fig. 2, the z-values for the texels (in T) on the isosurface are shown with squares, and those for the pixels on the screen are shown with circles. The pixel currently being handled is denoted with p. Then by examining the z-values for T, the algorithm calculates the raytracing starting point in such a way that it is as close as possible to the isosurface. It is clear that in the case shown in Fig. 2 (a), where the isosurface is concave, one may start the raytracing for p at z=min(d1, d2).

However, if the isosurface is convex (Fig. 2 (b)),movement along the ray from that point in the same direction results in missing the isosurface entirely. Thus,before starting to trace the ray, the direction of movement should be chosen, as shown in Fig. 3. The choice is done by evaluating the transfer function at p. If the value is below the threshold, then movement direction coincides with the ray direction; if it is above the threshold, the opposite direction is used.

The cases shown in Figs. 2 (c) and (d) happen on the outlines of objects. In Fig. 4, the pixels where these cases happened are shown in red and white. The number of raytracing steps done for these pixels may be very big if they do not intersect the isosurface. This is especially relevant for red pixels, which lie at the boundary between the object and the background.Fortunately, there are very few such pixels.

In the case shown in Fig. 2 (e), the pixel may be culled, as the texels surrounding it show that there is no intersection with the isosurface. Raycasting is not done for these pixels. The culled pixels are shown in blue in Fig. 4.

The lower the resolution of the auxiliary texture, the larger the red and white regions. Handling these pixels takes a lot of time, so at some point increasing n leads to a decrease in performance. Experiments have shown that from the viewpoint of the algorithm’s runtime, the optimal value for n is around 5. However, for scenes containing fine detail or surfaces with large curvature, the value of n should be lower than the optimal resolution to prevent any artifacts.

Fig. 3. Choosing the tracing direction.

Fig. 4. Colored points show pixels where the ray might not intersect the current isosurface. In this case, the ray may either intersect the next isosurface (white), or there might be no intersection at all (red).

Fig. 5 shows the results of visualizing the isosurface for different threshold values of the transfer function. Some small isolated details can be noticed in Fig. 5 (b) and examined more closely in Figs. 5 (c) and (d). The size of auxiliary texture should be chosen carefully to prevent such pixels from being lost.

Fig. 6 shows the results of the raycasting algorithm for different values of n. Particularly, Figs. 6 (e) to (g)highlight the data lost due to choosing auxiliary textures of various resolutions. For n=10, the artifacts are significant, whereas for n=2 there are almost none.

Since the image of the model changes significantly depending on the view point and visualization parameters,and the goal is to maximize the performance without sacrificing quality, the optimal resolution for texture T must be chosen for each frame separately. A presented adaptive algorithm chooses the resolution using information from the texture T at frame i to automatically select the best value of n for frame i+1. The diagram in Fig. 7 shows the algorithm workflow for one frame.

Fig. 5. Results of visualizing the same volumetric data with different threshold values for the transfer function: (a) smooth isosurface for a low threshold value and (b) spiky isosurface for a higher threshold; enlarged isolated small details: (c)“tooth” and (d) “lone” texels.

Fig. 6. Results for different values of n: (a) full-sized auxiliary texture for n=1; reduced auxiliary texture: (b) n=2, (c) n=5,and (d) n=10; the difference between the results for n=1 and the respective higher values of n: (e) n=2, (f) n=5, and (g)n=10.

Fig. 7. Diagram of the adaptive algorithm.

The auxiliary texture is analyzed by counting the number of lone and tooth texels. A lone texel is the one that contains the isosurface, but its neighboring texels do not; a tooth texel is the one that contains the isosurface and has only one neighbor that also contains the isosurface. Fig. 8 shows the examples of lone and tooth texels.

The image is analyzed as follows: The ratio,denoted with S, of the number of lone and tooth texels to the total number of texels in T is calculated. Then the n-value for the auxiliary texture on the next frame can be determined by using

To increase performance, the auxiliary texture analysis is done on CPU in parallel with the second step of the rendering process. The time taken to construct one frame is thus increased only by the time taken to load the auxiliary texture data from the GPU memory to that of CPU.

4. Experiments

To determine which n-value leads to the best performance, the presented algorithm was tested with different sparse and dense data sets. Fig. 9 demonstrates the used models. The results of experiments are listed in Table 1,where the factors of raycasting acceleration are presented as a function of n for each model.

As can be seen from the table, the best performance is achieved when the n-value is close to 5. The average value of the acceleration factor over all the models is approximately equal to 8, but it is different for every model—those with less contour lines and more empty blocks get accelerated more.

5. Conclusion and Future Work

A approach for accelerating volume raycasting was presented. It used an auxiliary low-resolution texture,worked for any number of isosurfaces, and did not require preprocessing. The condition for the absence of any artifacts has been determined. As experiments shown, the algorithm accelerated the raycasting process by a factor of 8 on average (depending on the scene) if the ratio of the resolutions of the screen and auxiliary texture was close to 3. Presented adaptive algorithm changes the ratio at every frame.

Fig. 8. Particular cases where the texture resolution is important: (a) “lone” and (b) “tooth” texels.

Fig. 9. Models the algorithm was tested on: (a) head, (b) transparent head, (c) skull, (d) brain, (e) transparent brain, (f)electron density (near), (g) electron density (far), (h) chest, and (i) blood vessels.

Table 1: Relationship between n and the raycasting acceleration factor

It should be noted that acceleration by a similar factor can be achieved by using preprocessing-based methods(for example, the one in [23]). Authors do not contrapose the proposed algorithm to other methods; in fact, if a situation allows preprocessing, the proposed approach may be used as an additional means of increasing the overall productivity.

In future work, authors will examine methods that allow to avoid productivity regression for large n-values.