Visable Human Project at PSC - Implemetation
A number of browsers have been developed for viewing 3D datasets. The majority of these implementations provide either web-based or standalone views of the original Visible Human transverse cutting planes, and of reorganized data forming coronal and sagittal planes. Many others provide specific segmented and labeled planes, or pre-rendered views of segmented anatomical structures. There is substantial diversity in the methods used to implement browsers that display previously prepared imagery. Implementing systems to provide arbitrary cutting planes, as the Visible Human Browser does, is much more difficult, since cutting plane tilt, viewer rotation, and field of view must be combined to dynamically extract the particular viewing plane from the volume data.
The PSC Volume Browser is implemented in C and OpenGL optimized for maximum speed. This is particularly important because our approach off-loads most computational responsibilities from the server to the client.
One implementation model for arbitrary cutting planes uses a central server, housing the entire 3D volume, to produce cutting plane images based on user viewing parameters. Each completed image is then transmitted across the web for viewing. Because of the size of the Visible Human datasets, implementations based on this model have used disk storage as the primary operating medium. Roger Hersch and his colleagues at the EPFL in Lausanne Switzerland have produced a very usable and well known web-accessible system, which was originally based on this design. It is able to supply a total aggregate rate of up to 5 slices per second by using a parallel disk farm, initially consisting of 60 separate disks, to provide independent overlapped seeks and high bandwidth. More recent technology has reduced the number of disks to 16. Further refinements have been incorporated into their current system, deployed at http://visiblehuman.epfl.ch/ .
Although this model is adequate when users can carefully select one image at a time and pause between successive images, it does not scale well to large numbers of simultaneous highly interactive users. This is because total image slice production rate is limited by disk seek rates and high server loading. This approach exposes all of the network and processing latencies to the user, so that it cannot provide the screen update rates of at least 10 per second per user needed for smooth continuous visual movement through the dataset. Our project goal of 40 simultaneous users therefore demands an aggregate rate more than 80 times faster than the reported 5 slices per second. Additionally, since each slice image is formed independently at the server, there is no easy opportunity for data to be reused at the client. Even when a new image substantially overlaps a previous image the entire content of the new image is transmitted anew. Finally, now that a single disk easily holds the entire dataset, it is not attractive to maintain large numbers of drives to produce adequate aggregate seek rates.
Figure 1 — Central Server
The second implementation model places the complete dataset locally on a PC or workstation. This works well for constrained volumes that can be entirely memory resident. It was quite effective for the early implementation of Edgewarp3D by Bookstein and Green, which ran on SGI hardware.
Figure 2 — Standalone System
However, with very large datasets, the approach suffers severely on 32-bit architectures that limit addressable memory to 2-4 Gbytes, and from slow disk I/O. When a dataset does not fit into memory one is forced to use disk not just to initially load bulk data, but interactively during use. This has the same disk seek limitations as the first model. Even high performance striped-disk systems do not substantially reduce this problem, since they are designed for high data bandwidths but do not substantially improve seek rates. Additionally, disk-bound systems suffer from directional sensitivities, where access to particular directions in the data can be extremely slow even while some others are much faster. Reorganizing and duplicating data can produce performance adequate for a single user but not good enough for multi-user server applications.
We chose a hybrid approach (Figure 3) for the UMVH application using the increased memory addressability of a 64-bit server to link with multiple inexpensive PC clients over the Internet. This approach can off-load nearly all computation from the server to client computers, achieving aggregate levels of performance that could not be produced by any reasonably priced server-based architecture. Low computational demands and lack of dynamic disk activity at the server greatly increase the number simultaneous users that can be supported by a single large-memory high-performance server.
In the hybrid model, the clients request sub-volumes of the data from the server, and then manipulate the data to produce the final displays. Voxel data is initially sent in coarse resolution to provide fast screen updates during rapid user movements. When navigation slows, finer resolution data is retrieved and displayed. This approach takes advantage of continuous user navigation rather than individual position jumps, so cached and overlapping data can be reused. The client computer requests data as needed, and the server merely needs to satisfy client requests, and does not manipulate the data in any way. Thus, the overall design takes advantage of increases in desktop computing power to enable real-time user navigation through very large datasets that would be unfeasible to place directly on client machines. Letting many low-cost clients share access to the more expensive large-memory server optimizes the overall price to performance ratio. Network and server latencies are hidden, since coarse resolution and cached data are quickly available to provide smooth and immediate user response. This approach is described in our Visible Human Project Conference paper and is used to drive both PSC clients and current versions of Edgewarp3D.
The technologies developed at PSC and used by the UMVH project consist of several components:
- volume server, data structures, and compression
- network tuning using Web100 tools for measuring and adapting to dynamic changes in network conditions
- client-side viewing software (the volume browser)
- segmentation and meshing software to produce 3D surface models of anatomical structures
The data that currently can be served to clients include volume data, voxel identification, anatomic database entries (based on voxel identification), and surface meshes.� While the Visible Human datasets consist of optical, CT, and MRI data, any type of volumetric data can be viewed. For example, we have been able to view static 3D MRM mouse volumes from the Center for In Vivo Microscopy. Our initial tests with 4D datasets have used 3D blood flow velocity fields over a series of time steps from detailed heart simulations performed at the PSC by Peskin and McQueen.
Figure 3 — PSC's Volume Browsing Client-Server Architecture
The physical server is home to four server programs: a volume server to provide the volumetric data, a web server to provide surface and related models, a collaborative server to allow multiple sessions to be driven from a single browser, and a voxel identification server to link specific voxels to anatomical structures. The volume server is described in Section A, the web server is described in Sections C and D, and the collaborative server is described in Section C.
Volume Server, Data Structures, and Compression
The PSC Volume Server (PVS) is responsible for efficiently and dynamically delivering volume data for interactive, uncorrelated navigation by up to 40 simultaneous users. The PVS operates exclusively from a pre-constructed hierarchical memory-resident data structure based on 8*8*8 volume cubes, with the finest cube formed by an 8*8*8 set of voxels, the next level representing 8*8*8 lowpass-filtered half-resolution voxels, and so forth to coarser and coarser levels. The data on the volume server is maintained at four levels of detail, corresponding to 2x zoom scales from 1 to 1/16th resolution. The client provides continuous zooming by sampling and interpolating from these base resolutions. For each client-requested cube, the server initially sends the coarsest cube, followed by the finer resolution cubes as time permits. Thus, multiple users can access the same memory-resident data, which provides relatively independent memory use and access speed even as the number of users grows. This approach avoids the directional sensitivity suffered by disk-based approaches; that is, movement in all three directions is equally fast. The only server disk access is to write log files. A further advantage of this sub-volume approach is that the client computer caches recently used cubes, which can be quickly accessed to form smooth navigation through newly constructed cutting planes.
Removing the ice from the 40 Gbytes of Visible Female cryosection data leaves 7 Gbytes of anatomical data, which barely fits into the 8 Gbytes memory of the volume server. Such a system would not allow simultaneous support of multiple datasets (for example, the Visible Male or the 70 mm film scans). Thus, we have investigated a number of compression schemes, which fit naturally with this data structure and support our basic requirements fast random access retrieval and decompression. Each cube is compressed using a self-contained representation that is dependent only on coarser levels of the hierarchy at the same position, and does not depend on the properties of adjacent cubes. Using this method the Visible Human data is losslessly compressed by ~3:1 using integer wavelet methods (7 Gbytes → 2.2 Gbytes), and maintains high visual quality when compressed to ~30:1using discrete cosine transform methods. Other researchers have employed similar solutions to the problems of representing the Visible Human and other large datasets for random access.
Figure 4 — Sending Different Levels of Detail
The approach used in both methods is similar to cube-splitting, but does not incorporate zero-tree encoding, so the same protocol can be used in both the lossy and lossless cases and the client does not have to represent multiple quality levels at a single resolution. Both compression methods take advantage of 3D correlations in the data volumes, which results in almost a 50% advantage over 2D methods at the same signal-to-noise rate for the finest resolution cubes, and even more for the coarser levels of the data hierarchy. The process of compressing raw volume data and organizing it into the data structure used by the server is currently performed offline.
Network Tuning and Web100
The hybrid implementation depends on an interactive exchange of data between client and server. Each of the interaction protocols depicted in Figure 3 has different requirements and characteristics. The volume data service path is the most demanding of bandwidth and latency. Three of the servers use custom protocol socket programs while the fourth is a standard HTTP web server.
Internet2 potentially provides access to extremely high bandwidth, but without expert attention from network engineers, users are unlikely to achieve even 10 Mbps despite an end-to-end network infrastructure that can support data rates of 100 Mbps or more. This is often due to mistuned end hosts and a lack of effective network performance diagnostic tools. Systems as shipped from vendors are generally configured for low performance dial-up connections rather than the performance optimized configuration needed to take advantage of I2 facilities. Furthermore, optimal network settings are not static but need to track changes in the network characteristics including dynamic load, link outages, latency variations, and end application demands. Therefore, network tuning should be a continuous activity.
Web100 and the companion Net100 tool sets provide a user-transparent solution to these problems by incorporating automatic host tuning and advanced diagnostic tools into the operating system. These tools incorporate the results of experience in network performance tuning that previously required manual intervention. In the Visible Human project we have installed and used Web100 facilities on the PSC Volume Server host, a Compaq ES-40 running Linux. It is also used to collect statistics from the client applications when run on Linux machines. Applications programs can query network characteristics in order to provide an additional level of adaptation. Since Web100 is OS kernel resident, its full power cannot be used with MS-Windows or Apple Macintosh clients, but the server-side facility can still be used to identify the most common network problems.
The client-side software consists of two separate sections: one that interacts with the server (the hop server), and one that is the graphical user interface. The hop server is written modularly, basically mirroring the volume server, and can support a number of user interface programs, including the PSC Volume Browser (PVB) and the University of Michigan�s Edgewarp3D. Edgewarp3D and PVB provide similar slice views, but Edgewarp3D was developed for morphometrics, while PVB was developed for anatomical segmentation and labeling.
Figure 5 — Sending Compressed Data
The user interface determines which cubes intersect the defined cutting plane images and places requests to the hop server for these cubes; the hop server responds directly with any data it has cached but otherwise forwards these requests the primary PSC volume server. The hop server decompresses any cube data received from the volume server and sends the decompressed volumes to the user interface. Finally, the user interface constructs the final 2D slice, including possible interpolations and display overlays.