top of page

PhenoCODE

PhenoCODE is a simple GUI based exploration of phenotypes and linking of phenotypes to genomic data for increased analysis power. Customers can link or ingest their own phenotype database or access pre-built PhenoCODE configurations such as the UK Biobank.

WHAT

Java application

WHERE

WuxiNextCODE

WHEN

Project Overview

Pharmaceutical companies expressed their interest to learn about metadata, in the context of genomic data, in regards to UK Biobank.

 

WuxiNextCODE already had a language (SDL - Set Definition Language), which could explore phenotypic data with a preliminary UI component. The goal was to understand the market needs, competition, and release a consumer-ready product with real data to a customer.

2017-2018

​

Project duration 11 months

My Contribution

  • Organized and conducted multiple interviews to understand the customer/user needs in detail

  • Defined user stories and prioritized them with PM (Product Management), and scientists

  • Designed a vision (architecturally and in detail), and a separate MVP solution tailored to development capacity

  • Directed the developers, Product Owners and worked together with multiple teams to deliver the highest priority search function in the first release of PhenoCODE

The Journey

We started with gathering the insights from stakeholders in-house and from our potential customers. I led multiple interviews, semi-structured or unstructured depending on the interviewee. I created a comprehensive Confluence space to share all the findings in an organized manner. The strategy from executives was high level, which required a lot of work to get to the bottom of the problems in detail. We aligned with the developers to lay out a release plan, which was updated multiple times based on the customer's timeline.

We learned a lot (we generated more than 300 pages of confluence), which I cannot share, but here are some of the highlights:

  • From our data science team about the data:

    • UK Biobank has 502,616 participants (PN's in this case), which is supported by an also big metadata structure, which describes the phenotypic distribution of the PNs. It has approximately 3280 dimensions. One of the dimensions like the ICD10 codes contains 8159 domain values. The whole data has 1054 dimensions with a multilevel domain value structure. You can see some fun facts below (thanks to Eddi). While browsing hundreds of thousands of those relations (I have only 9 here - imagine do this exercise for 100,000 times) it is pretty hard to notice any glitches. You can see one example in the last picture below: the genetic sex deviates from the reported one, which is a hard to notice data quality issue, but for scientific studies, it is an important one. Helping to find a needle in the haystack was one of my design challenges (which was one important insight and why I pushed search to highest priority)

What we have also learned

  • From our scientists about their approach:
    • They wanted to search for specific phenotypic traits, like asthma. They wanted to see all the results and decide based on the data source whether to use them or not. For example, self-reported BMI is less valuable than the BMI measured by a physician.

    • They build cohorts, analyzed, tweaked, analyzed again, refined, analyzed, etc. It was a very iterative process, and pretty chaotic.

    • Reproducibility in scientific research and exploration is a must.

    • They were typically not tech savvy, but writing a regular query is pretty standard. Some of them wrote their own ML algorithms.

    • AI is awesome, but they hate black boxes. Even ML is a black box. AI by definition is a black box.

  • From Pharma companies:

    • They are really protective of their data, information and even methodologies.

    • Even if a drug development trial can take 10 years they wanted results now.

    • They already had tools, processes, which they wanted to leverage. Just mentioning proprietary solution will lose their interest.

    • They have lots of money, but their investment pattern may not make sense for outsiders.

    • Data sharing between projects can be against their interest.

 

Besides all the lessons, we had to figure out how we deliver, who is delivering, what kind of artifact at what pace. We took a card sorting approach for that, which in this case served the purpose of bringing the different teams together.

Architecture & Ideation

Based on the SDL capabilities and the code that already existed, I explored the potential direction that the product can grow into. I designed multiple systems to fulfill the user needs. I involved field scientist for validation as well as developers for viability checks.

 

As I have mentioned above scientists can work in pretty chaotic ways, with lots of iterations. The information they need for specific decisions may or may not relate to disease, size of the data, well-known factors, already annotated variants, etc. Actually many cases were different just because they solved those differently before. They had their Excel workflows in small scale, but that could not work with a UKBiobank size data (see size info above).

 

My desk looked like below all the time and I bugged scientists to walk through the flows countless times.

I had multiple product plan versions and phases, but I kept two from the beginning:

  1. The envisioned product. I wanted to have a clear vision about the future of the product, which aligns with the real user flow

  2. I had an MVP proposal, which had less freedom, could contain limited development based on the existing code pieces

 

The vision helped set up the priorities for the MVP, plus I could plan on multiple phases in development to see how to get there eventually and bring value in each phase.

 

I kept the vision realistic as well. I baked what we had in pieces and reorganized those to fit the user's mental model and their flow. Instead of reinventing the wheel, I worked with other Product Managers to plan to open up the API layer to be able to fit already existing R libraries into our flow.

 

My decision making was based on design principles, user and market insights, and it was aligned with the user flows all the time. From the beginning, I made workflow diagrams, user story maps in the necessary detail to facilitate discussions. I used different diagrams for different conversations, but the latest version was available in the product plan artifact all the time (with the links to the historic ones as well).

At this point, it is important to mention, that I have noticed that managers and many decision makers leaned towards the belief that scientists can use complex software. They are smart, they will figure it out - they said. We all know if the software is not aligned with their flow it is just a potential place to make mistakes. They are used to using bad software, but they can really thrive on good software. Because of that, I have organized a UX training course for developers and product owners and I spent enough time to highlight important human factor learnings from the history of ergonomics, like the Three Mile Island accident to dispel the myth of giving monstrous solutions to scientists, just because they are smart.

Align with a Big Picture

Our leaders along with Product Managers made sure that the offer fits into the whole product portfolio we planned to release to the customers. PhenoCODE was a strong selling point with the only capability to explore non-genomic data with a UI.

Detailed Concepts

To communicate, present, and validate the ideas I built high fidelity prototypes, which I could use for presentations and stakeholders could play with that to explore the concept in detail. The prototype contains multiple concepts and development plans, which aligned with the future vision of DiscoveryCODE, Phenotype Catalog (a planned v3 release concept) and PhenoCODE in detail.

The winning concept for me (based on usability laws, IA, workflow analysis and many, many feedback) separated bigger tasks in one page and let the scientists open up as many of those as needed. For example, organizing phenotypes into a cohort can be found on one page, which has every feedback to learn about that. Multiple cohorts followed the simple browser tab metaphor. One can save a complex phenotype definition into a named cohort, which got to a separate page as newly sequenced data should be able to fall into saved cohorts. Analysis can run on cohorts and the end result looks like an Excel table (because scientist already used to that for small data). There was a separate plan for clinical research, which aligned with this approach, so in a distant future, different products could meet.

MVP Development

When we get to the development phase I was promoted to be the Product Manager of this project, so I worked with Product Owners, Developers, and all the stakeholders to execute our plan and release the first version of PhenoCODE. After prioritization the main selling point was the search function, which required a lot of persuasion to develop the full search, instead the development leader preferred a split search (which technically required the user to define the search domain first, but that was just not the right fit for the users, as the number of domains was thousands as well). Product Owners generated Jira tickets from the user stories I made and we could track the development. I made simple summary pages for higher level people to be able to see the progress without the need to understand each and every ticket.

Jira Progress.png

Finale

With the Field Application Scientist team we defined minimum acceptance criteria, which the product successfully passed and got ready to release.

 

The MVP solution was cut back enough to let us build this product really fast and gather real feedback from our customers, which gave the company the benefit of predicting the future of our products.

 

Because some developers put many extra miles on the most important features, I made sure that HR and C-level executives learned their names and their extra work got properly noticed.

bottom of page