Broad20

Role
UX/UI Design Lead

Collaboration Partners
Director of Engineering
1 Junior Engineer
1 Junior UX Designer

External Stakeholders
Broad Institute Leadership

Introduction

In 2004, the Broad Institute was founded with an ambitious goal: to leverage the newly sequenced human genome and fulfill the promise of genomic medicine. We celebrate 20 years of progress toward understanding the roots of disease and narrowing the gap between new biological insights and impact for human health.

One way of determining whether a research organization is successful is through the lens of scientific publications: "papers," the main channel through which scientists disclose and share new findings, developments, and discoveries. The amount papers an organization publishes spanning broad areas of research gives a greater ethos and credibility, thereby increasing the amount of funding the organization receives.

(adapted from The Broad’s website and our Broad20 site)

Our Challenge

We wanted to create a visualization microsite that told the story of how Broad’s scientific efforts evolved over the past 20 years by leveraging publication data.

Along with visualizing the number and topics of research, it had to tell an engaging story and provide contextual information on why specific areas are important to advancing science.

The format would guide users through this narrative and demonstrate how to view and interact with the visualization. Since this was framed as an editorial visualization, we’d need to avoid unnecessary features that would give it the look and functionality of a tool.

Image

Our scientists are interested in browsing datasets according to lineage and disease context. They requested that we build a visualization application that allowed them to answer these questions:

  • What data is available for my disease type of interest?

  • Given this disease type, are there any gene dependencies that show up more often in this disease?

  • Are there any drug sensitivities associated with these genes?

  • How strong are these dependencies and sensitivities?

At the time, DepMap only presented a single page for each disease context and lineage (for example, Ewing Sarcoma, which is associated with bone tissue), which contained two large tables. The first listed all of the cell lines associated with the lineage, but in order to find out if specific data was available, you had to visit the specific cell page. The second listed all drug sensitivity and CRISPR dependency enrichments, which also led to individual

How it Functions to Tell our Story

Since we wanted this to act like a story that you can interact with, we chose a scrolly-telling format to help guide users through the narrative. Broad leadership selected 10 landmark scientific subjects to act as story points. We cleaned data from the National Library of Medicine's PubMed database in order to extract standardized Medical Subject Headings ("MeSH" terms) to populate the visualization as rows.

Each story point would trigger the reloading animation that produced a refined interactive visualization showing only corresponding data for the subject, along with a text panel that provided contextual information (definition and links to corresponding papers).

Since our dataset is HUGE, the entire viz would have to be incredibly small to fit on a screen, therefore rendering it illegible. Showing the entire viz as part of this reloading animation helps give users a sense of the scale without forcing them to squint in order to reveal subjects.

The story culminates in a main interactive visualization that allows you to explore the entire dataset by selecting a term, or comparing sets of terms.

Visual Encodings and Design

Visual encodings provide meaning to visualizations by associating a quality (color, darkness, size, and shape) to a value of data.

Our visual inspiration is a sample of DNA fragment base pairs viewed under electrophoresis. In the example we referenced, fragments that are larger in length have a darker and taller rectangle.

In our flat visualization, each row represents a MeSH keyword. The intensity of each bar reflects the number of times a keyword appeared in Broad-authored scientific papers published in a given year. The darker the bar, the more that keyword appeared in papers from Broad scientists.

A three-dimensional view was discovered while we were all mobbing the code. While three-dimensional visualizations tend to be hard to navigate (don’t get me started on 3D pie charts), in an expository setting, this dimension can give a better idea of the number of times a term appeared.

Layout Design
Adhering to constraints established by the Broad styleguides (limit to Helvetica, colors must complement flagship brand colors), I styled the content sections to resemble medical journal layouts in order to preserve the look and feel of an editorial and avoid design patterns associated with analytical tools.

Image

Our scientists are interested in browsing datasets according to lineage and disease context. They requested that we build a visualization application that allowed them to answer these questions:

  • What data is available for my disease type of interest?

  • Given this disease type, are there any gene dependencies that show up more often in this disease?

  • Are there any drug sensitivities associated with these genes?

  • How strong are these dependencies and sensitivities?

At the time, DepMap only presented a single page for each disease context and lineage (for example, Ewing Sarcoma, which is associated with bone tissue), which contained two large tables. The first listed all of the cell lines associated with the lineage, but in order to find out if specific data was available, you had to visit the specific cell page. The second listed all drug sensitivity and CRISPR dependency enrichments, which also led to individual detail pages.

This was a messy experience a lot of page-hopping and abandoned research. Our goal was to create a visualization application that

Image

Sunburst vs Divided bar chart for the Overview Page

A sunburst is a standard way of visualizing data according to hierarchical organization. It appears frequently on browsable data portals.

However, a use case emerged where scientists were interested in the overlap of data types (does my context contain CRISPR and RNAi screens? If so, how many?). This would be difficult to show within a sunburst.

A divided bar chart can be leveraged to show both the amount of cell lines while aligning to show where datasets are available in different screen categories.

Volcano Plot vs Scatter plot for Gene and Drug Sensitivity Pages

Streamlining Navigation

Oddly enough, figuring out the data was the easy part. Fine-tuning the navigation took up the majority of our time.

We needed a way to sequentially navigate from one story point to another. We also didn’t want to confine our users to having to go through the entire story each time they wanted to browse the main interaction.

Another use case that popped up after initial testing with a small group of scientists was that some were more interested in a specific subject (such as Genomics or COVID) and didn’t care much for the rest. So we needed to design a way for them to skip over story points and spend their time browsing their subjects of interest. The same with those who did and didn’t want to see a list of corresponding publication links

Ultimately, we decided on adding Back and Next buttons to the main content along with a side menu that divided the site into story points, with a direct link to the main visualization.

Main Visualization and Infographics

The initial design for the main visualization contained an input/dropdown text box, along with an “enable multi-select” check box for term comparison, and a list of corresponding stats about the selected term.

After the initial testing round, users were confused on what the “enable multi-select" function did. Unchecking the box also reloaded the entire visualization, so they had to start over.

This is a common pitfall in scientific and visualization design; you assume your users know exactly what to do, however, this is a brand-new interface. It still needs a degree of explanation and guidance to help them understand what they are navigating.

I redesigned this as a toggle and changed the language to “Lookup single term/Compare multiple terms” help guide users through their search. We also added enhanced functionality to the multi-term search, such as the option to view as a histogram and filter options that help you sort by frequency or year.

The list of statistics were informationally dense, so I proposed dividing the stats into three interactive visualization. After three iteration rounds with our junior engineer, we arrived at the final layout and functionality.

Image

Our scientists are interested in browsing datasets according to lineage and disease context. They requested that we build a visualization application that allowed them to answer these questions:

  • What data is available for my disease type of interest?

  • Given this disease type, are there any gene dependencies that show up more often in this disease?

  • Are there any drug sensitivities associated with these genes?

  • How strong are these dependencies and sensitivities?

At the time, DepMap only presented a single page for each disease context and lineage (for example, Ewing Sarcoma, which is associated with bone tissue), which contained two large tables. The first listed all of the cell lines associated with the lineage, but in order to find out if specific data was available, you had to visit the specific cell page. The second listed all drug sensitivity and CRISPR dependency enrichments, which also led to individual detail pages.

This was a messy experience a lot of page-hopping and abandoned research. Our goal was to create a visualization application that

Image

Sunburst vs Divided bar chart for the Overview Page

A sunburst is a standard way of visualizing data according to hierarchical organization. It appears frequently on browsable data portals.

However, a use case emerged where scientists were interested in the overlap of data types (does my context contain CRISPR and RNAi screens? If so, how many?). This would be difficult to show within a sunburst.

A divided bar chart can be leveraged to show both the amount of cell lines while aligning to show where datasets are available in different screen categories.

Volcano Plot vs Scatter plot for Gene and Drug Sensitivity Pages

Next Steps

As of Dec 1st 2024, Broad20 has been publicly deployed. We are currently collecting feedback from users on whether any sporadic bugs are appearing and if there are other scientific topics of interest that they would like to explore.

We are collecting viewing and click statistics to measure how many active users are visiting the site and if there is any rate of abandonment that would suggest improvements.

Since we don’t collect revenue, we measure success through section views, citation, and linking; if we can get and maintain the industry standard of 10,000 visits per month, we can make a case for funding to build additional companion visualization sites.

Thanks for checking out this case study.