CUSP-GX-6006.001: Data Visualization SUMMER 2019
Lab 4 Vega-lite
Motivation: We have used D3.js and JavaScript to build our visualization so far. Though powerful, using D3 to visualize data can be quite tedious. Users have to get it right in both the design and the implementation (aka. writing JavaScript code) to produce a desired visualization. This leads to the development of many high-level visualization libraries such as vega-lite.js. With vega-lite, users can specify the description of a visualization using the terminologies of visualization design, e.g. marks, channels, and encodings, etc. All of these can be specified in a data-exchange format, JSON, thus, it is less worrisome about the syntax of JavaScript coding.
In this task, we will produce one of cuisine plot in HW1 (shown on the left) using vega-lite. Before proceed, please take some time to go through the introduction slides from one of the librarys authors. Please also take a look at their online tutorials for additional details.
Setup: For this task, we will be using the Vega editor to walk through the example. Please open your browser to: https://vega.github.io/editor/#/custom/vega-lite
The left side of the editor is the Vega-lite description in JSON. The right side is the visualization that reflects the specification on the left. We start with this simple snippet to build a bar plot from the cuisine data set.
Vega-lite Pipeline: From the code below, you should see a very long bar chart, similar to the tiny image on the bottom right. This is already very close to the visualization that we want, and with quite little efforts:
1. Setthelanguagespecification 2. Set the data source
3. Set the marks
4. Set the encoding, and channels
NOTE: you can copy and paste this into the editor panel.
CUSP-GX-6006: Data Visualization
Page 1 CUSP
SIUMMER 2019
{
$schema: https://vega.github.io/schema/vega-lite/v3.json, data: {
url: https://raw.githubusercontent.com/hvo/datasets/master/nyc_restaurants_by_cuisine.json,
format: { type: json } },
mark: { type: bar
}, encoding: {
y: {
field: cuisine, type: ordinal
}, x: {
field: total, type: quantitative, axis: {
title: Restaurants }
} }
}
In particular, we specify the language to be parsed by the editor using the schema command. Here, were using vega-lite version 3:
$schema: https://vega.github.io/schema/vega-lite/v3.json, Then, we set the data source, and tell Vega that it is in JSON format.
The mark can then be specified through the mark property. For a bart chart, we use bar as the mark.
Finally, we set the encodings using the encoding property. Note that, the channels are implicitly interpreted by Vega based on the mark type. In this case, since were using the bar marks, were expected to use the
CUSP-GX-6006: Data Visualization Page 2 CUSP
data: {
url:
https://raw.githubusercontent.com/hvo/datasets/master/nyc_restaurants_by_cuisine.json, format: { type: json }
},
mark: { type: bar
},
SIUMMER 2019
position channel. Hence, we only need set the x and y property of the encoding for it to work. In each of the encoding, we need to specify the field, and type, and whether we need an axis label.
encoding: { y: {
field: cuisine,
type: ordinal, },
x: {
field: total, type: quantitative, axis: {
title: Restaurants }
} }
Ordering Data: However, our bar plot is sorted by the ordinal field by default. We would like to sort them by the total field instead. In this case, we could add a sort description to the y field, makes it become the following:
Our plot should now be sorted by the total field. Note that argmax is an operation to be applied on group of cuisine. This is needed since there is no guarantees that our data only contain unique cuisine names. For example, if there are multiple American entries, then, the argmax operation will select the one with the maximum total across these operations. We should now get the plot on the right.
Data Transformation: Our plot is still showing too many entries. We would like to shorten the list of cuisines to only the top 25 entries. A natural way of doing this is to sort the cuisine by total, and pick the top one. Though this could be done in Vega (the superset library of vega-lie), it requires some tricks with vega-lite, unfortunately. This is because the operations that are supported is not sufficient for computing the rank (using the total), and then filter the top 25. In practice, we can either prepare the data beforehand, or add additional fields that can be used for filtering. In our case, we can compute the top 25, by applying a filter, e.g. saying that only selecting cuisines with 3,300 or more restaurants. Please add the following right after the data section, and before the mark:
y: {
field: cuisine,
type: ordinal,
sort: {field: total, op: argmax}
},
CUSP-GX-6006: Data Visualization Page 3
CUSP
SIUMMER 2019
data: {
url: https://raw.githubusercontent.com/hvo/datasets/master/nyc _restaurants_by_cuisine.json,
format: { type: json } },
transform: [
{filter: {field: total, range: [3300,null]}}
],
mark: { type: bar,
},
In the range value, null indicates Infinity since we do not have a cap on the upper limit. The expected result is shown on the right. Its worth noting again that the flexibility in data transformation is one of the limitations of vega-lite, and/or other high-level visualization libraries. Though this works, it requires knowing the value 3,300 beforehand. In the Selection section, well look at a workaround for this issue.
Styling: In this part, to stay true to the original visualization, we will color all bars as Light Grey, and adding the highlight capability that changes the bar color to Steel Blue when being hovered.
First, we add a color channel right at the beginning of the encoding to make all bars become gray:
encoding: {
color: {
value: LightGrey
},
y: {
We also need to add a black border like we had in the homework. This is part of the styling process. With D3, we can use CSS. For vega-lite, we can either define a custom style property for mark or simply adding basic style properties, such as stroke to the mark definition itself:
CUSP-GX-6006: Data Visualization Page 4 CUSP
SIUMMER 2019
mark: {
type: bar,
stroke: black },
mark: { type: bar
},
Selection: By default, vega-lite already includes highlighting feature. However, if we hover the mouse over an entry, it only shows a tooltip showing detailed info about the datum, fields associated with the record. You can try this with the existing code. In order to change the color when the mouse hover a mark, we need to add selections into our visualization. More info on selections in vega-lite can be found here. We would like our selection type to be single (selecting one mark at a time), triggered on the event of mouseover, and with an empty selection, none will be highlighted (vega-lite also allows all to be selected with an empty selection). Lets add the following between the definition of transform and mark:
transform: [
{filter: {field: total, range: [3300,100000]}}
],
selection: { highlight: {
type: single, on: mouseover, empty: none
} },
mark: {
The selection has been named highlight, which can be used to style our marks color encoding, by adding the following to the color definition:
Now you should see the visualization similar to the one on the right.
CUSP-GX-6006: Data Visualization Page 5 CUSP
color: {
condition: { selection: highlight, value: SteelBlue
},
value: LightGrey },
SIUMMER 2019
Workaround for top-25 filter: as mentioned before, its impractical to provide a value to filter out data outside of the top 25 values. This could be done if we can have a row id for each record, and only keep the row with ids not larger than 25 after sorted. Since vega-lite doesnt provide a row id, this cannot be achieved. However, once we add selections to our system, if you hover over a mark, you will notice that theres now a new field called _vgsid_, meaning vega selection id. This is indeed equivalent to the row id field that were looking for. Thus, now we can change our filter to filter out all _vgsid_ larger than 25 to keep only the top 25 items. Please change your filter to the following, and verify that its still working.
Changing:
{filter: {field: total, range: [3300,null]}} to either
{filter: {field: _vgsid_, range: [null, 25]}} or simply:
{filter: datum._vgsid_<=25″} User Interface: so far, we have built a static visualization with vega-lite. Next, we look at building user interface for interactions. Lets say we would like to build a search box that allows users to enter partial text of a cuisine name, and we then highlight all the matching bars. For example, if a user types izza, both the Pizza, and Pizza/Italian entries will be highlighted.With vega-lite, selections are the basic building blocks for interactions. To achieve this, we need to create a selection that binds to a text input, and conditionally set our color based on this text. First, add a new selection, named search, as follows: …”transform”: [{“filter”: “datum._vgsid_<=25”}], “selection”: {“search”: {“bind”: {“input”: “input” },”empty”: “none”, “on”: “mouseover”, “fields”: [“term”], “type”: “single”},”highlight”: { “type”: “single”, “on”: “mouseover”, “empty”: “none”} …CUSP-GX-6006: Data Visualization Page 6 CUSPSIUMMER 2019 After this, there should be a text input at the bottom of the visualization. This text can be referenced as search.term as the combination of the selection name, and the fields that we define. Now, in order to highlight the right word. We need to change the condition of the color encoding to reflect this logic. Our test expression is:”(indexof(lower(datum.cuisine), lower(search.term))>=0) || (highlight._vgsid_==datum._vgsid_)
This means that we convert the cuisine name for each entry to lower cases, then saying that its a match if we could find the typed text (also in lower case) in the cuisine name. Note that, we still allow mouse hovering highlight, however, this will invalidate our search term whenever it occurs.
color: {
condition: {
selection: highlight,
value: SteelBlue },
value: LightGrey },
color: {
condition: {
test: (indexof(lower(datum.cuisine),
lower(search.term))>=0) || (highlight._vgsid_==datum._vgsid_) ,
value: SteelBlue },
value: LightGrey },
Your output now should look like the one on the right. Another note is that indexof and lower are special functions for vega expressions. Vega supports quite a few of these functions, however, they are simply not as extensive as in JavaScript or Python. This is a trade-off in flexibility when using a high-level grammar, e.g. vega-lite, versus a low-level one like D3.js.
Export: similar to our JS Bin labs, we can export vega-lite visualizations to a GIST, and make it accessible publicly. This can be done by extracting the JSON that we build with vega editor, and vega-embed library. Here are the steps to have your visualization published on bl.ocks.org:
1. Using your GitHub account to log in to GitHubGist, and create a New Gist.
2. Add a new file named index.html as follows (using the vega- embed template on the link above). Note that this assumes our vega- lite JSON will be stored as DV_Lab4.vg.json in the same Gist:
CUSP-GX-6006: Data Visualization Page 7 CUSP
SIUMMER 2019
Reviews
There are no reviews yet.