Projects | Interactive Data Visualization

Chapter 1: Introduction

Using the vehicle data set and an existing visualization tool (e.g.,Excel, Weka, Weave, or XmdvTool), perform the following tasks.
1. Read the full data set into the program.
2. Select a subset of the data that contains an obvious correlation (exploratory visualization).
3. State a hypothesis and confirm it using the full data set (confirmatory visualization).
4. Present your results in a PowerPoint slide (presentation visualization).
Write a scatterplot program from scratch, using the following steps.
1. Write a program that reads the data, stores that data internally, and identifies which records have missing values. Keep track of the minimum and maximum of each variable.
2. Select two of the variables as your axes. Draw coordinate axes and label them with the names of the variables.
3. Loop through all nonmissing data records and plot a circle or square at location (x,y) based on your selected variables. Skip any record that has missing values.
4. Additions to consider: color the square or circle by some other value; use size to represent yet another value of the record; have the user select which variables to use as the axes; handle missing values by replacing the missing data with some very large number, some very small number, or the average value for that variable.

Chapter 2: Data Foundations

Write a program that accepts as input a uniform, 3D scalar field (each record is an integer) whose dimensions are (height_i, width_i, depth_i) and that computes and outputs a file with dimensions (height_j,width_j, depth_j). Assume the program is invoked with the command:
- resample file1 height1 width1 depth1 file2 height2 width2 depth2
A common task when dealing with data is dividing it into categories, such as low, medium, and high. There are numerous strategies for performing this task, each of which has strengths and weaknesses. Write a program that reads in a list of integers and divides them into a given set of bins (this number can be passed into the program), using one or more of the following strategies:
- uniform bin width – the size of the range of values for each bin is the same;
- uniform bin count – as best as possible (without dividing a sin- gle number between multiple bins), each bin has about the same number of elements in it;
- best break points – start with everything in one bin. Search for the largest gaps, and divide at those locations. If no gaps exist, break at values with low number of occurrences.
Normalization is a process in which one or more dimensions are pro- cessed so that the resulting values have a particular range and/or mean. This allows two or more dimensions with very different characteristic ranges (such as temperature and elevation) to be combined into a distance calculation. Given a list of floating point values, write a program that normalizes these values into one or more of the following ranges (you will see why this is useful when we start mapping to graphical attributes):
- all values fall between 0.0 and 1.0;
- the values are mapped such that the resulting set has a mean of 0.0 and standard deviation of 1.0;
- all values are integers between 0 and 255.
Imputation for a single variable or column is rather straightforward. However, in situations where numerous values are missing in many columns, Schafer [308] has developed a model-based technique. Test Schafer’s technique on a data set having missing values, and then implement that technique and compare your results. The R project [282] is an open-source and free software environment for statistical computing and graphics where you can find an implementation of Schafer’s algorithm.

Chapter 3: Human Perception and Information Processing

Take the scatterplot code you’ve written. Consider some perceptual attribute you’ve read about and are interested in. Generate a display for a perceptual study, say a target (one objector or a pattern) to be identified within an area of distractors. Ask a few classmates if they can easily identify the target.
Write a program to reproduce one of the perceptual experiments, varying either a single graphical attribute or multiple ones. Start with two or three values for a given attribute, and increase this number until you (or a willing friend) start making errors over a short sequence of samples. Describe what feature you are testing, whether you are testing for absolute or relative judgment, and what your results are.
Using the VIAT Windows-based software available on the book’s web site, design an experiment for some of the perceptual features described in this chapter.

Chapter 4: Visualization Foundations

Extend your scatterplot program to enable a third data dimension to be mapped to at least five of the seven remaining visual variables (not counting position).
Extend your scatterplot program to enable one of the dimensions to take on nominal values. Test different ways of displaying this dimension using the results of the previous project.
Extend M_exp to deal with different sets of information. For example, suppose A and B are sets of information from some Universe (U). Try to define M_exp(A ∩ B), M_exp(A ∪ B), M_exp(U − A), and so on.
Extend M_eff as in Project 3.

Chapter 5: Visualization Techniques for Spatial Data

Rewrite drawLineGraph() to instead draw a color bar, given a color ramp with a range (color_min, color_max). You can assume that the number of data points is less than the width of the screen in determining the width of the rectangle. Set the height of the rectangle to some user-specified constant.
Write a program that extends drawLineGraph() to subsample the data whenever the number of data points dataCount exceeds the number of pixels in the drawing area (xMax − xMin).
Write a program that reads in a three-dimensional volume data set and displays a user-selected slice. Assume a grayscale color map with 256 intensity levels.
Extend the above program to allow the user to specify an orientation for the slice (0 = aligned with x-axis, 1 = aligned with y-axis, and 2 = aligned with z-axis). Note that since the size of the data volume often differs for each dimension, the selected slice must be confined to a range that depends on the orientation.
Extend the above program to allow arbitrary orientation, as specified with a vector normal to the cutting plane plus center point for the cut plane that is within the data volume. Note that this project will require resampling of the data in almost all cases.

Chapter 6: Visualization Techniques for Geospatial Data

Use the TIGER-System (Topologically Integrated Geographic Encoding and Referencing), a provided geographic polygon data set from the U.S. Census (TGR06001.RT2), to write a script that converts the polygon data in the following format:
- −121764253| + 37160714
- −121746453| + 37611800
- −121746709| + 37611300
- NA – NA
- NA – NA marks the end of a polygon.
Use the R-project function “polygon” to draw the extracted polygons of Project 1.
Write a program to visualize the “quakes” data from Exercise 3 using the Google Maps API. Think about how to visualize the attributes of depth, magnitude, and stations for each data point in the map with a suitable glyph visualization.
- Hint: Create an image for each of the three attributes, and scale it according to the data value. Place them side by side on the Google Map, so that they look like one glyph.
- Sources: quakes.xml (10% Sample of the “quakes” data; depth, stations, magnitude are normalized) http://www.google.com/apis/maps/ (Google Maps API).
Write a program that distorts map regions along the two Euclidean dimensions x and y. The distortion operations should be done by computing a histogram with a given number of bins in the two dimensions x and y to determine the distribution of the geospatial data points in these dimensions. The distortion depends on the number of data points that are geographically located in the bins.

Chapter 7: Visualization Techniques for Multivariate Data

Write a program to display a data set using a choice of three or more of the glyph types described in this chapter. Test it on a data set with a modest number of records (less than 300) and dimensions (less than 10). Which glyph do you think is most effective? Why?
Write a program that will draw multiple line plots (one for each variable of a data set). The program should have two options: juxtaposing the plots (e.g., by slicing the screen horizontally and drawing one plot per slice) and superimposing the plots (e.g., drawn on top of each other). Test it with three color schemes:
- randomly selected hue, saturation, and value
- evenly spaced hues, with full saturation and value
- a perceptually designed color map, such as those described by Cindy Brewer (http://www.colorbrewer.org)
- Comment on the effectiveness of the various color schemes and the two different layouts
Write a program to generate a heat map from a table of values. Each cell should be a square or rectangle whose color is proportional to the value. Use a standard color ramp, such as grey scale or yellow to red. Make sure you normalize the values first to make best use of the full range of colors. Now write a function for reordering the columns of the table such that the sum of the absolute differences between adjacent columns is minimized (if you have a modest number of dimensions, you can test all possible orderings of columns to find the minimum; otherwise you should use a heuristic search strategy to find a local minimum). Note the patterns that emerge in the final view. What does it tell you about the relationships between dimensions/columns?
Extend the previous program to reorder the rows based on the same or similar distance measures and search strategies.

Chapter 8: Visualization Techniques for Trees, Graphs, and Networks

Write a program that reads in a graph in the following format:
- number_of_vertices number_of_edges
- edge1_start edge1_end
- edge2_start edge2_end
- ….
- edgeN_start edgeN_end
Add a very simple drawing function that places the vertices in random positions and connects the vertices based on the edge list. Run the program several times with a data set of your design (it should have more than 10 nodes and 20 edges). What conclusions can you draw from observing the output?
Modify the above program to place the vertices at equal angles around a circle. Again, run the program several times and describe your observations. From these observations, can you propose a vertex-ordering algorithm that will generally result in less cluttered displays?
Write a program that will determine if a graph entered in the above format is connected, e.g., if there is a path from every node to every other node.
Write a program that will determine if a graph entered in the above format is biconnected, e.g., if removal of a single node will not disconnect the graph.
Assuming that the input graph represents a tree, and that all links are given in the order of (parent, child), write a program that will draw the tree as in Figure 8.5, where all nodes on the same level are evenly spaced. (Hint: in a single pass through the list of links, you should be able to assign each node to a level.)
Modify the above program to generate a radial layout, e.g., the layers are arranged as concentric circles with a radius proportional to the tree depth.
Modify either or both of the above programs to insert extra space between adjacent nonsibling nodes.
Write a program that generates the adjacency matrix A using the same data as in Project 1 or some other graph data. Use R-project (or your own code) to compute A2 and draw it differentiating the values in the matrix using color (note that is may have values larger than 1). Explain what you see and the meaning of the numbers.

Chapter 9: Text and Document Visualization

Write a program that determines the distribution of words in a document.
Using the above, compute the tf-idf for that same document.
Write a program that generates a tag cloud.
A common task when dealing with data is dividing it into categories, such as low, medium, and high. Write a program that reads in a document and divides the words into three classes: simple, complex, and those in between.
Implement the pseudocode of this chapter on a section of text, say one of your reports or on one of the smaller VAST-like data sets available on the book web site.
Explore Zipf’s Law on a few documents.
Download and install RapidMiner. Then use it on one of the smaller VAST-like data sets available on the book web site or, if you are ambitious, on one of the VAST data sets.

Chapter 10: Interaction Concepts

Programming projects dealing with interaction are included in the next chap- ter, which covers details of interaction techniques based on the concepts covered in this chapter.

Chapter 11: Interaction Techniques

Implement a screen space distortion that is shaped like a truncated pyramid, e.g., it is flat on top and has linear ramps on the edges. Note that such a distortion would be much more appropriate for viewing text than the more common lens effects.
Implement a set of data space transformations for a line plotting program. Make sure the resulting data values fall within the range of your display. Test the program on several 1D data sets.
Implement an attribute space transformation that sets the opacity of a glyph or scatterplot element based on how close one of its data dimensions is to a user-specified value. For example, if the value specified is 0.5 and the first data dimension is selected, then points for which that data dimension is at 0.0 or 1.0 should have an opacity of 0.5.
Modify the program above to enable a range of influence to be specified. This means that the opacity would be set to 0.0 for points whose value is further than the range of influence from the selected value. Thus, distant points would be transparent, unless the range of influence is very large.
Choose one type of distortion and implement it, along with controls for specifying focus, extents, and transformation. Focus can be controlled by the mouse or via a dialog box. Extents should just be one or more sliders that convey a size parameter. Transformation should be a list of possible types of transformation. This implies that you must implement at least two such transforms, so that you can switch among them.
Extend the above system so that smooth animation is used between the undistorted and distorted views. The user should be allowed to control the rate of the animation. What range of rates do you think is most effective or aesthetically pleasing?

Chapter 12: Designing Effective Visualizations

Choose three visualization programs that you’ve written for this course. For each, try to find at least three ways of improving them, based on the design guidelines of this chapter. Re-implement these programs with the improvements you’ve identified. Compete with your classmates to see who can create the most attractive, informative visualization.
Choose three visualization programs that you’ve written for this course (they can be the same three as used in the project above). For each, try to find at least three ways of making them worse by violating design guidelines of this chapter. Reimplement these programs with the negative improvements you’ve identified. Compete with your classmates to see who can create the ugliest, least informative visualization!

Chapter 13: Comparing and Evaluating Visualization Techniques

Design and carry out an evaluation of one of the visualizations that you implemented for this course. If the evaluation requires human subjects, try recruiting people with similar backgrounds (e.g., the students in this class).
Implement a minor variation on this visualization, for example, using a different color scheme, default layout, or other easily changed aspect. Design and carry out an evaluation that compares the original and modified versions. If there is a noticeable difference in performance or satisfaction levels, describe what you believe to be the likely cause.
Design and carry out an evaluation of a visualization implemented and evaluated by one of your classmates. You should NOT ask people how they evaluated their own program, or what the results were! Once the evaluation has been completed, you should compare the procedures, as well as the results. How were they similar or different? This might expose some biases that we often have when it comes to evaluating our own work – we generally want the results to come out well, while in evaluating the work of others we don’t usually have a preference as to how things work out.

Chapter 14: Visualization Systems

Download, install, and test at least one of the visualization systems described in this chapter. You should attempt to import a data set into the system from scratch, rather than using the ones provided with the system.
Download and install one of the toolkits described in this chapter. Follow the guidelines to create a simple application, such as a scatterplot or line graph, using the toolkit. Write a summary of your experience, including the difficulty/ease of creating an application, and your satisfaction level with the results.

Chapter 15: Research Directions in Visualization

Write a program that will generate a continuous set of numbers. For example, you could start with a parametric equation and then either randomly perturb the parameters or the values after they are generated. Now write a program that plots these points in real time (you may need to slow down the generator by putting sleep() calls in). What do you observe happening? Implement at least two distinct solutions to the scale problem.
Take one of the visualization programs you wrote earlier in this course and modify it so that it would work on a small display (e.g., 200×300 pixels). What design changes would be required? What functionality would you need to make to enable effective use of the results? Implement and test some of these changes to see how well you anticipated the effect of reduced scale.
Take one of the visualization programs you wrote earlier in this course and modify it so that it would run on a GPU (this will likely involve acquiring a book on GPU programming). Note that you may only be able to perform a subset of the processing on the GPU. Compare the performance with your CPU-based implementation.
Rewrite the scatterplot program for use by the elderly. Issues include readability and ease of use. Test your results on someone over the age of 70. Incorporate their feedback into a revised version of the program.