您的当前位置：首页 Abstract Evaluation of Visualization Software

Abstract Evaluation of Visualization Software

来源：华佗小知识

Evaluation of Visualization Software

Al Globus, Sam UseltonReport NAS-95-005, February 1995

Computer Sciences Corporation at NASA Ames Research Center1

Abstract

Visualization software is widely used in scientiﬁc and engineering research. But computed visual-izations can be very misleading, and the errors are easy to miss. We feel that the software produc-ing the visualizations must be thoroughly evaluated and the evaluation process as well as theresults must be made available. Testing and evaluation of visualization software is not a trivialproblem. Several methods used in testing other software are helpful, but these methods are (ap-parently) often not used. When they are used, the description and results are generally not avail-able to the end user.

Additional evaluation methods speciﬁc to visualization must also be developed. We present sever-al useful approaches to evaluation, ranging from numerical analysis of mathematical portions ofalgorithms to measurement of human performance while using visualization systems. Along withthis brief survey, we present arguments for the importance of evaluations and discussions of ap-propriate use of some methods.

1. This work is supported through NASA contract NAS2-12961.

Introduction

Visualization software is becoming widely used as shown by the large number of visualization en-vironments available today. To gain beneﬁt from using such a system one must be able to distin-guish between aspects of the visualization which reﬂect the data and aspects due to thevisualization process itself.

Is the visualization valid? That is, does it show what the user intended? Is the visualization soft-ware veriﬁable? That is, can one demonstrate that the software does what the programmer intend-ed? And how can one evaluate a system’s capabilities and compare it with other systems? For themost part, the information necessary to answer these questions is not generated or, if it exists, isnot public. Application scientists have sometimes addressed these questions [e.g. BUNI88], butmainly for their own software and application. Papers on the error characteristics of visualizationtechniques [e.g. DARM95, MARC94] are beginning to appear. Experiments using human sub-jects to get (relatively) hard data on visualization system performance are required for clinical usein medical ﬁelds [VANN94] and have begun for various other applications [e.g. BRYS95]. Butmuch more work is needed.

This paper is a brief look at some of what has been done, and some of what needs to be done toensure that scientiﬁc visualization software is valid and veriﬁable. We examine standardized testsuites, the error characteristics of the visualization process, and human experimentation to test thebasic hypothesis of visualization: that visualization improves insight and task performance.

Standardized test suites

One approach to veriﬁcation and evaluation is to run visualization software on standard test suitesthat, by general agreement, cover the important characteristics of certain classes of data. Differenttest suites are necessary for different problem domains; computational ﬂuid dynamics, geophysi-cal simulations and radiological data, for example, present different challenges to visualizationsystems and therefore need different test suites. One can (and should!) use real scientiﬁc data setsto evaluate accuracy and performance, but these data sets are not sufﬁcient. They were designedto investigate phenomena, not evaluate software. Thus, correct software behavior is difﬁcult toverify and the various aspects of performance are difﬁcult to asses. Veriﬁcation suites and bench-marks speciﬁcally designed to test visualization systems would be of great value.

Correctness (the software does what it claims) and accuracy (it maintains appropriate precision)are vital for acceptance by critical users and become more important when visualization softwareprovides quantitative data (e.g. volume of a tumor or clearance between moving parts). Other thanblind faith, there is no reason to believe the results from visualization systems are more than ap-proximately accurate most of the time. For example, one well known (but here nameless) visual-ization package uses Euler integration for streamlines, a technique well known to produceincorrect results in vortices! Worse, there is no accepted way to quantify error, which is alwayspresent to some degree. There are differences in the results using different packages, but no

straightforward means of measuring how much the results vary, which is nearer the correct value,or if either is even close.

Furthermore, developers testing systems have no well established methods of determining thecorrectness of systems. Each development team creates its own ad hoc testing techniques.One may compare the performance of different packages on data sets from various applicationﬁelds, but the results only apply to data with nearly identical characteristics. A well designed setof benchmarks will support independent characterization of various aspects of visualization pack-age performance (e.g., very fast streamlines but slow isosurface generation). High performancecomputing has many years of experience designing and examining the results of benchmarks [e.g.BAIL94]. One could do worse than follow their lead.

Consider computational ﬂuid dynamics (CFD). Correctness is particularly difﬁcult in CFD visual-ization since computational grids often contain complex topologies including multiple, interpene-trating 3D blocks with some embedded points marked as not valid. Visualization software

becomes complex to avoid computations using the masked data points and to perform properly inthe presence of singularities, where an edge of a grid cell may have length of zero. Other applica-tions have different obstacles.

CFD data sets are legendary for their size. The current state of the art in CFD simulations producesolutions over multiple three dimensional grids deﬁned by several million points. Each grid pointis represented by three ﬂoating point numbers (for location), and ﬁve ﬂoating point numbers pernode are needed to represent the solution, resulting in tens or hundreds of megabytes to capture aninstant of ﬂow. But researchers are simulating unsteady ﬂows, requiring a few thousand timesteps. Current visualization environments have trouble with these large CFD data sets. As a gener-al rule, limits on the size data set to be visualized are determined by the hardware available to theuser. Developers may be able to give a minimum recommended conﬁguration, but they cannotknow what maximum to expect, and can not test all possible data set sizes. A set of variable sizedbenchmarks could provide a means for developers to discover and communicate the size limita-tions of their products.

A good set of test data should have at least the following properties (in no particular order):1. The data sets should be scalable to allow tailoring for hardware conﬁgurations and the timeavailable for performing the tests.

2. Results should be easily, visually recognizable as correct or incorrect. Furthermore, quantitativeresults should be easily interpretable.

3. The variety of grids should encompass normal cases and all known pathologies.

4. Fields deﬁned over the grids should also encompass normal cases and all known pathologies.5. Grids and ﬁelds of all the relevant dimensionalities should be represented.6. The suite should be easily distributable to most hardware and software platforms.

7. The size (in bytes) of the distribution should be minimized. In particular, it is unlikely that datasets themselves should be distributed, but rather the code to generate them.8. It should be possible to run the suite in reasonable amount of time.

9. The results should give unambiguous performance (as well as correctness) information.10. The suite should challenge current systems.

In addition to test data sets, there is a need for a standard set of tests of particular visualizationtechniques and functions.

We examine a few ideas along these lines for techniques commonly used in CFD visualization:Integral Curves (particle traces):

* Trace forward in time from a point, then trace backwards in time for the same total time fromthe end of the ﬁrst trace. The distance between the start of the ﬁrst trace and the end of the secondis an error measure. Do this for many points in vector ﬁelds with various properties. Be sure thatsome of the traces pass near critical points of various types (saddles, nodes, etc.). Also, force trac-es through areas of rapid change.

* Change the grid system in a variety of ways without changing the vector ﬁeld deﬁned over thegridded space, and compare traces starting from the same point. A constant vector ﬁeld is usefulhere. Be sure to force traces across grid boundaries and near grid singularities.

* In multi-zone data sets, start traces near grid boundaries and stop just after passing to the nextgrid zone to measure the time to pass between grid zones. Be sure to include a case with a gridsingularity near the transition.Integral Surfaces [HULT90]:

* Force surfaces around saddles. This is difﬁcult because the surfaces tend to tear.* Integrate surfaces into vortices. This is difﬁcult due to twisting of the surface.Isosurfaces:

* Generate isosurfaces on a ﬁeld where all isosurfaces are sets of spheres. Change the grid suchthat isosurfaces must be generated for all marching cube [LORE87] cases.Cutting Planes:

* Examine results in overlapping grids to see how visualization of data in the overlap is handled.* Include benchmarks for time dependent data with static grids. This will reward the developerwho reuses rather than recomputes values when possible. For example, when the location of thecutting plane remains constant (but the scalar ﬁeld changes), the vertices and the interpolation fac-tors needed don’t change.Interpolation:

* Design benchmarks to test interpolation time and precision, including non-linear interpolation.Vector ﬁeld topology [HELM91, GLOB91]:

* Place critical points in grid cells with grid singularities.* Place critical points on computational boundaries.

These ideas are offered as initial suggestions, not ﬁnal solutions. There are certainly other needsfor other applications, and there are probably additional or even better tests for CFD data as well.

Error characterization

Visualization, particularly of numerical results, can be thought of as constructing models of mod-els of models. Even when visualizing experimental data the concept is the same, only the ﬁrst fewlayers of models are different. Each model is an imperfect representation of the system fromwhich it is derived, so error is introduced.

Consider the models constructed in the course of studying a physical system numerically withvisualization in the loop.ModelErrorPhysical systemsimulationscienceContinuous mathDiscrete mathSoftwareData transformationsContinuous vis’ modelVis’ algorithmsDisplay effectsMind’s eyesimpliﬁcations, lack of knowledgediscretization, approximationsround off, bugstruncation, interpolation, resampling,interpolationlinearization, approximationsscreen space interpolation, jaggiesperceptual, cognitive effectsinsightThe ﬁrst step in numerical study of a physical system, say airﬂow over a wing, is selecting a con-tinuous mathematical model, in this case the Navier-Stokes equations. In their usual form, theseequations presume an ideal gas, but air is not exactly an ideal gas. The scientist must understandthe error introduced to ensure that it does not invalidate the desired results.

Similarly, to solve continuous equations numerically on a computer often requires a discretemathematical model. A given continuous mathematical model may be discretized in several ways,and at a variety of resolutions, each introducing different amounts and kinds of errors. Softwareimplementing the discrete model introduces bugs, and execution of the software introduces

round-off errors. All of these error sources are discussed in numerous textbooks, college coursesand research articles. Scientists have substantial resources available to help understand the errorand ensure that the simulation is faithful to the important aspects of the original physical system.Indeed, a researcher is expected to understand these issues and normally cannot publish in reputa-ble journals if the sources of error are not directly and successfully addressed.

The story is quite different once the realm of visualization is entered. Even before the data reach avisualization system, several transformations may be applied. For example at one facility, CFDdata are routinely truncated from to 32 bits before visualization. This is usually acceptable, butoccasionally causes problems for the unwary. Similarly, data are often transformed from cell cen-tered to the point centered schemes used by most visualization systems. These transformationsintroduce error, which must be understood to ensure it does not corrupt the analysis.

Some visualization techniques assume continuous data ﬁelds. These ﬁelds are simulated usinginterpolation between the points where data are available. Trilinear interpolation of hexahedralcells is a particular favorite. These interpolation schemes are seldom consistent with the interpola-tion used by the simulation software, introducing additional error.

visualizationEach visualization algorithm introduces its own set of errors. For example, particle tracers accu-mulate error since each point is generated from the previous point. Different particle tracing tech-niques will occasionally generate quite different results [KENW93].

Another set of errors come from rendering. When a shaded polygon is rendered using the usualgraphics shading techniques, interpolation is carried out in screen space. As a result, the color of apoint on a surface is affected by the current viewing transformation! This may be acceptable forqualitative assessment, but the careful user must be aware of such problems.

Better understanding of the numerical error in the individual processing stages is not good

enough. How the various errors compound, cancel or otherwise interact is also important but littleinvestigated. Changing the order in which operations are performed can cause signiﬁcant differ-ences. Interpolating between grid node colors (as done by fast rendering hardware) often gives adifferent result than interpolating data values and then mapping to colors. To make a carefulchoice of appropriate visualization methods, a scientist needs the error characterization informa-tion. Since scientists will be investigating new phenomena, and applying visualization tools inunexpected ways, they must have access to the error analysis, not just its conclusions.

It behooves the visualization developer to understand the sources of error in the visualization pro-cess, thoroughly document the errors -- in the visualization itself if possible -- and take steps toensure that the magnitude of the error is much less than the errors introduced by the modeling pro-cess. After all, users want to see their data, not visualization system induced error.

Finally, perceptual and cognitive effects color an investigator’s view of the pixels displayed on thescreen. Hopefully, the user’s mind’s eye will help generate insight into the original physical sys-tem, spawning new hypotheses and experiments. How can we tell whether what is perceived illu-minates the scientist’s questions?

Experiments with human subjects

The grand hypothesis of the visualization community is that scientiﬁc visualization improveshuman insight. Several methods have been traditionally used to prove this hypothesis:

Proof by repeated assertionProof by vigorous gesticulationProof by pretty picture

(For extended comments relevant to Proof by Pretty Picture, see [GLOB94].) There are betterapproaches to proving the insight hypothesis. Most visualization practitioners have accumulated agreat deal of anecdotal evidence. Such evidence could be, and should be, collected and carefullyexamined. Better yet, if one wants to prove that the state of a human being has changed, why notrun a controlled experiment and measure the change of state?

Ideally, one measures the insight of an individual, shows them a visualization, measures theinsight again and studies the differences. Unfortunately, there is no general way to measure

insight. However, good teachers write test questions to reveal a student’s insight (or lack thereof),so it may be possible in certain specialized circumstances.

Although insight is difﬁcult to measure, task performance can often be measured with some accu-racy. Thus, experiments to determine the efﬁcacy of visualization systems and techniques for par-ticular tasks can, and should, be developed.

Although exploratory scientiﬁc visualization is difﬁcult to break into speciﬁc tasks, engineeringand medical visualization applications are fairly task speciﬁc and amenable to an experimentalapproach, as are other analytical visualization uses.

It should be noted that experiments comparing visualization systems are easier to undertake andinterpret than experiments to evaluate or characterize a particular system. We therefore focus dis-cussion on experimental comparison of visualization systems.

Consider comparing two visualization systems for task performance. In theory, each system mightbe characterized as a point or region in a large multi-dimensional space. Although some dimen-sions can be given, e.g., techniques available, data formats supported and memory usage, not allthe relevant dimensions are known or easily measured. Therefore, we have an unknown, probablynon-linear functionf:system -> task-performance. Each domain/range pair of this function canonly be discovered by laborious, controlled experiment. Interpolation is fraught with danger, andextrapolation ridiculous. Obviously, characterizing this function -- whose domain is not thor-oughly understood -- is a daunting task. However, at present we have almost no experimental datapoints at all. Even a few widely space, but reasonably ﬁrm, data points in a single context andwidely available would be of considerable value.

In particular, one would like to predict performance for other, related tasks and predict the effectof changes to the system(s) under study. As mentioned above, predicting performance of tasksother than those studied is difﬁcult. Predicting the effect of changes in the system is also difﬁcult,in part because different aspects of systems interact to affect end user performance.

A ﬁnal word of warning for those considering controlled experiments of visualization systems, becareful to choose experimental subjects from the target user audience. For example, if insight intomoderately experienced users is desired, don’t use programmers or novice users (or freshmanpsychology students) as experimental subjects.

Conclusion

The quality of a visualization system, method or a particular visualization, is almost an unknownconcept. System comparisons are based on ease of use, ﬂexibility, speed of operations and thepresence or absence of particular features. None of these criteria are really relevant unless one canrely on the images produced to reﬂect the data without distortion or confusion. After all, it’s easyto makewrong pictures at sixty frames per second.

Some of the work needed is just the direct application of numerical analysis techniques and dis-semination of the results. Other important research will involve fundamental work on identifyingall the relevant parameters of “visualization quality” and their relationships. Then the scattering ofpoints in this space where the “quality” function has been evaluated may allow us to see trends orunexplored combinations of parameters.

Acknowledgments

This work was performed at the Numerical Aerodynamic Simulation Systems Division, NASAAmes Research Center under NASA contract NAS-2-12961. The authors have beneﬁted from dis-cussions with many visualization researchers, developers and users, but especially with GeoffDorn of ARCO Production and Technology, Joseph Gitlin of Johns Hopkins University MedicalInstitutions, Michael Vannier of Washington University School of Medicine, and the audience forour Visualization ‘94 panel, and with Steve Bryson and many other colleagues at NASA Ames.

References

[BAIL94] D. Bailey, et al, “The NAS Parallel Benchmarks (94),” RNR Technical Report RNR-94-007, NAS Division, NASA Ames Research Center, March, 1994.

[BRYS95] S. Bryson, S. Johan, A. Globus, T. Meyer, C. McEwen, “Initial User Reaction to theVirtual Windtunnel,” AIAA95-0114, 33rd AIAA Aerospace Sciences Meeting, Reno, NV, Janu-ary, 1995.

[DARM95] D. L. Darmofal, R. Haimes, “An Analysis of 3D Particle Path Integration Algo-rithms,” (to appear) AIAA CFD Conference, San Diego, CA, June 1995.

[GLOB91] A. Globus, C. Levit, T. Lasinski, “A Tool for Visualizing the Topology of Three-Di-mensional Vector Fields,”Proc.Visualization ‘91, IEEE Computer Society, San Diego, CA(1991).

[GLOB94] A. Globus, E. Raible, “Fourteen Ways to Say Nothing with Scientiﬁc Visualization,”IEEE Computer, vol. 27, no. 7 (July 1994), pp 86-88.

[HELM91] J. L. Helman, L. Hesselink, “Visualizing Vector Field Topology in Fluid Flows,”IEEE Computer Graphics & Applications vol. 11, no. 3 (May 1991), pp 36-46.

[HULT90] J. P. M. Hultquist, “Interactive Numerical Flow Visualization Using Stream Surfaces,”Computing Systems in Engineering1 (2-4) pp. 349-353.[HULT91] J. P. M. Hultquist, personal communication.

[KENW93] D.N. Kenwright,Dual stream function methods for generating three-dimensionalstreamlines, Ph.D. thesis, University of Auckland, New Zealand, August 1993.[LORE87] W. E. Lorenson, H. E. Cline, “Marching Cubes: a High Resolution 3D Surface Con-struction Algorithm,”Computer Graphics, Vol 21, No 4 (July 1987), pp. 163-169.

[MARC94] S. R. Marschner, R. J. Lobb, “An Evaluation of Reconstruction Filters for VolumeRendering,”Proc.Visualization ‘94, IEEE Computer Society, Washington, DC (1994).[VANN94] M. Vannier, Position Statement for Panel: “Validation, Veriﬁcation and Evaluation,”published in Proc.Visualization ‘94, IEEE Computer Society, Washington, DC (1994).

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文