|
About VirtualGL Downloads Library |
BackgroundThere's no denying that "thin client" has become a buzzword in the computing industry of late. For applications that demand a great deal of system resources, the ability to run those applications remotely from the "cold room" has been a desirable -- even necessary -- feature for quite some time. Even for lightweight applications, the ease of administration and security benefits of thin client computing are making it an increasingly popular computing model. However, whereas a host of commercial thin client solutions exist to turn run-of-the-mill desktop applications into on-demand services, most of those solutions either lack the ability to run 3D applications or force those applications to perform 3D rendering without hardware acceleration. Scientists, researchers, and engineers in fields such as biosciences, energy exploration, and mechanical design rely on heavily interactive 3D applications as part of their daily workflow. These applications generally can only achieve usable performance if the 3D rendering is hardware-accelerated. Thus, organizations which would like to move toward a more centralized, managed model of application deployment have been constrained by their inability to move key 3D applications off of the user's desktop. The Old School Approach: Indirect RenderingThe problem of how to remotely display a 3D application with hardware-accelerated rendering is a thorny one. 3D applications which are built on Unix or Linux typically use the OpenGL application programming interface (API) to do the actual 3D rendering and the GLX API to manage the relationships between rendering contexts and application windows. GLX is an extension to the X-Windows protocol, and it can take advantage of that protocol's inherent remote display capabilities. In this mode of operation, referred to as "indirect rendering", the OpenGL commands are encapsulated inside of the X-Windows protocol stream and sent to an X-Windows server running on a remote machine. The X server then passes the OpenGL commands to the local 3D rendering system, which may or may not be hardware-accelerated (Figure 1.) So, the 3D rendering is still occurring on the desktop, even though the application is actually running on a machine located elsewhere. ![]() FIGURE 1: Indirect OpenGL Rendering Using GLX This works OK (not great) if the data being rendered is small and static, if display lists are used, and if the network has high bandwidth and low latency. For a variety of reasons, though, most applications do not use display lists. In some cases, the application programmers simply did not envision the application being displayed remotely, but usually an application avoids display lists because they are not suitable for the particular type of rendering that the application does. Many applications generate geometry dynamically, making display lists useless. Still others deal in large geometries, for which the overhead of building the display list on even a local display is undesirable. When display lists are not used in an indirect rendering environment, then every vertex call has to be passed over the network on every frame. This could amount to millions of little packets of information being sent over the network just to render one frame. In this case, the application performance will be bound by the latency of the network connection, and even on the fastest networks, rendering geometries of any significant size will quickly become an untenable proposition. The situation becomes even worse when textures enter the picture. Imagine passing a planar probe through a multi-gigavoxel volumetric dataset such as the Visible Human. The probe will be at least 1 megavoxel in size, meaning that the textures mapped to that probe will occupy at least 3 Megabytes. These textures have to be regenerated on every frame with no reuse of texture data from frame to frame. So, whereas network latency is not as much of an issue in this case, bandwidth definitely is an issue. Even in the best case, a gigabit connection would be required to get anything close to a usable frame rate. To further complicate matters, certain OpenGL extensions do not work in an indirect rendering environment. Some of these extensions require the ability to directly access the 3D graphics hardware and thus can never be made to work across a network. In other cases, either the server's OpenGL library or the client's X-Windows server do not provide explicit support for the extension, or the extension relies on a specific hardware configuration that is not present on the client machine. Also, it goes without saying that, since the client must perform the 3D rendering, it must have a 3D accelerator and a decent amount of computing power. Thus, indirect OpenGL rendering does little to centralize an organization's computing resources. Indirect OpenGL rendering might best be termed "welterweight client", since the client is not really heavy but not really thin either. Server-Side 3D RenderingWe begin to see that it is desirable for the 3D rendering to occur on the server machine, where there is a fast and direct link between compute, graphics, and storage resources. If the 3D rendering occurs on the server, then only the resulting 2D images must be sent to the client. Images can be delivered at the same frame rate regardless of how big the 3D data was that was used to generate them. So, performing 3D rendering on the server effectively converts the 3D performance problem into a 2D performance problem. The problem then becomes how to stream 1-2 megapixels of image data over a network at interactive frame rates, but this problem is already addressed by a variety of commodity technologies (HDTV, to name one.) There are generally only two approaches that have been used to implement server-side OpenGL rendering with hardware acceleration:
Screen ScrapingScreen scraping is fairly straightforward. A separate application (the "screen scraper") runs on the server machine and monitors its display for events (such as window expose events) which might cause the pixels in the display to change. As such events occur, the screen scraper reads back the affected regions of the display from the frame buffer, compresses them, and sends the compressed images to all connected clients. Conceptually, this is the same approach used by a host of off-the-shelf 2D remote display packages, including the Windows version of VNC. But such software packages generally don't work with hardware-accelerated 3D. The reason is that 3D accelerators use "direct rendering" to send OpenGL commands directly to the 3D hardware, thus bypassing the windowing system (Figure 2.) ![]() FIGURE 2: Direct OpenGL Rendering in an X-Windows Environment The screen scraper would normally monitor communication between the application and the X server to determine when pixels in the display have changed. But when direct rendering is used, such communication never occurs. The rendered pixels go straight to the frame buffer, so neither the X server nor the screen scraper knows when a 3D application has finished rendering a frame. The upshot of this is that 3D applications often appear as solid black windows when run inside a screen scraper. One solution is for the screen scraper to asynchronously read back the entire X display on a periodic basis, compare the current screen snapshot against the last, and send the differences to all connected clients. x11vnc, at first glance, uses this approach. An improvement upon this methodology would be to implement an inter-process communication mechanism to allow direct-rendered 3D applications to tell the screen scraper when the OpenGL region of a particular window has been updated. This would produce an architecture similar to Figure 3 and eliminate the need for asynchronous polling of the display. ![]() FIGURE 3: Screen Scraping with Direct Rendering This creates a viable solution, at least, but screen scraping still has some obvious drawbacks: it is limited to one user per machine, and it isn't seamless. The user is forced to interact with their application in a remote desktop window, and while other users can share the display, they cannot use the machine to run other applications at the same time. Screen scraping is thus useful for accessing one's 3D workstation remotely, but it doesn't really solve the problem of delivering 3D applications on demand to multiple users. GLX ForkingGLX forking is the method that VirtualGL uses to implement server-side 3D rendering. The idea of GLX forking has been around for a few years, and the definitive paper describing the approach was published by Stegmaier, Magallon, and Ertl (hereafter S/M/E) in the proceedings of the joint EuroGraphics/IEEE Symposium on Visualization, 2002. In a nutshell, GLX forking involves rerouting GLX commands to the server's (presumably hardware-accelerated) X display while leaving the rest of the X protocol stream to carry on its merry way. GLX forking can generally be accomplished in two ways:
Method 1 involves an architecture very similar to VNC for Unix, in which each user has their own personal "virtual" X display ("X proxy.") But unlike VNC for Unix, this virtual X display supports the GLX extension and can thus be used to display 3D applications. The 3D applications use indirect OpenGL rendering to send OpenGL commands to the X proxy, which then performs GLX forking as described by S/M/E to re-route all of the GLX commands to the server's hardware-accelerated X display (Figure 4.) ![]() FIGURE 4: GLX Forking with an X Proxy Essentially, this is just a combination of indirect OpenGL rendering (Figure 1), a VNC-style X proxy architecture, and S/M/E's GLX forking technique. In this case, the X proxy acts as the GLX forker, marshalling GLX commands to the "real" X server while handling 2D drawing commands itself. Once a direct OpenGL rendering context has been established, the subsequent OpenGL commands can be sent directly to the hardware, and the X proxy reads back the resulting images directly from the hardware and composites them into the appropriate X window. Since the X proxy is in charge of marshalling 3D commands, it knows exactly when a 3D window has been updated, even though direct rendering was used to generate the pixels in that window. The major problem with the above approach is that the application must still use indirect OpenGL rendering to send 3D commands to the X proxy. It is, of course, much faster to use indirect rendering over a local socket rather than a remote socket, but there is still some overhead involved. Also, it is necessary for the X proxy to explicitly handle all OpenGL commands, including esoteric OpenGL extensions. OpenGL changes a lot more quickly than GLX, and this approach requires that the developer keep on top of those changes. However, per S/M/E, if the application is dynamically linked with an OpenGL shared library, then a GLX faker library can be inserted in-process at run time using ![]() FIGURE 5: In-Process GLX Forking with an X Proxy Apart from marshalling GLX commands and managing Pbuffers, the GLX faker must also read back the rendered pixels at the appropriate time (usually by monitoring This approach is architecturally ideal, since it supports both multiple sessions on a single server as well as multiple clients for each session. However, the reality is that most off-the-shelf X proxies, such as VNC, are tuned to handle 2D applications with large areas of solid color, few colors, and few inter-frame differences. 3D applications, on the other hand, generate images with fine-grained, complex color patterns and much less correlation between subsequent frames. The workload generated by drawing rendered 3D images into an X window is essentially the same workload as a full-screen video player, and those who are familiar with VNC and its ilk know that these off-the-shelf X proxies are not very swift at playing videos. This is largely due to the lack of sufficient image compression/encoding performance. Also, most existing X proxies do not provide a seamless application experience. Rather than each application window appearing as a separate client window, the user must interact with the entire remote desktop in a single window. VirtualGL works around this issue in two ways:
The VGL Image TransportWhen using the VGL Image Transport, VirtualGL encodes or compresses the rendered 3D images in real time and sends the encoded images through a dedicated TCP socket to a VirtualGL Client application running on the client machine (Figure 6.) The VirtualGL Client is responsible for decoding the images and re-compositing the pixels into the appropriate X window. Meanwhile, the 2D elements of the application's GUI are sent over the network using the standard remote X-Windows protocol. Since their original paper, S/M/E's remote rendering solution has been modified to include an architecture similar to the VGL Image Transport (see: "Widening the Remote Visualization Bottleneck", ISPA 2003.) Examples of this architecture exist in industry as well. ![]() FIGURE 6: VirtualGL's Architecture (VGL Image Transport) This approach definitely has drawbacks. It requires that an X server be present on the client machine, and it can be somewhat sensitive to network latency due to its reliance on the chatty X protocol for key/mouse interaction. Additionally, it does not inherently support collaboration (multiple clients per session), since the images are being pushed to the client rather than pulled from the server. In order to collaborate with a user who is using the VGL Image Transport, a separate screen scraper (such as WebEx, NetMeeting, etc.) must be installed on the user's desktop machine. The primary advantage of the VGL Image Transport is that it provides a completely seamless application experience -- every application window appears as a separate window on the user's desktop. The VGL Image Transport can also send stereo image pairs, which allows true quad-buffered stereo to be used in a remote display environment. TurboVNCTurboVNC was developed to address the performance limitations of off-the-shelf VNC implementations and to provide an architecture more like that shown in Figure 5. TurboVNC lacks the seamless application experience of Direct Mode -- users are required to interact with a remote desktop in a single client window. But since TurboVNC does not rely on the remote X-Windows protocol, it is the fastest solution for high-latency or low-bandwidth networks. On high-speed networks, it performs similarly to the VGL Image Transport. It also provides rudimentary built-in collaboration capabilities. |
![]() | All content on this web-site is licensed under the Creative Commons Attribution 2.5 License. Any works containing material derived from this web-site must cite The VirtualGL Project as the source of the material and list the current URL for the VirtualGL web-site. |