Data Anonymization—digitization of a multi-step machine learning supported workflow
🔒
This is a white-labelled preview of the project. I'm currently working on a password-protected full version.
Interaction Design
Pencil & Paper
Client
A global health IT leader providing data de-identification solutions for clinical studies through a risk-based approach
Nov 2020 → Nov 2021
This is a very special project to me, one of the largest clients I got to work with through Pencil & Paper. It was the perfect combination of complexity, high-stakes privacy concerns and creative freedom.
They entrusted us with conceiving a streamlined end-to-end digital experience for a core workflow of theirs. This client essentially offers a suite of solutions to government, hospitals and pharmaceutical to de-identify datasets from clinical studies and measure risk for them. These involve highly specialized processes executed by analysts —and we're talking top of the crop brains; masters in genetics, molecular biologists, PHDs in neuroscience, biochemists, etc.
tl;dr
In a nutshell, we started out by capturing as much subject matter as we could from their team of experts; what the current workflow's like, where there’s most friction, just how strict the privacy regulations are, and where the machine learning component plugs in.
After running user interviews and testing our evolving proposals, we landed on a streamlined platform that was able to guide and support analysts in every step of a project process; from triaging the (incredibly) large documents, to creating samples of annotations for the machine to train on, to a seamless handover to the risk measurement model and to the final client-facing report generation.
Challenge —what we needed to achieve
Our initial user interviews revealed significant workflow inefficiencies and a clear lack of project status visibility.
Analysts frequently relied on somewhat hack-y workarounds and manual cheatsheets to manage their workflows, leading to duplicated efforts and room for human error. Senior analysts spent an large part their week training new team members, which speaks to the original steepness of the learning curve.
Managers and leads were forced to spend valuable time flipping between servers and multiple monitors just to piece together the critical information they needed to get a project's current status. Also, the sensitive nature of the data they were handling often made them have to talk on the phone, further complicating workflows and hindering collaboration.
What pained us the most was observing this misalignment between the brilliance of the users —PhDs and experts in neuroscience, biochemistry, genetics, and molecular biology— and the daily friction they had to put up with.
These pain points confirmed the urgent need for a solution that respected the expertise and valuable time of these professionals while reflecting the seriousness of the task at hand, given the strict regulations around privacy and health data compliance.
The business goals we were tasked with:
Improve the
✨ demo-ability ✨
of their innovative solutions
Reduce the
⏱️ time ⏱️
required to complete a project
Increase
🤝 collaboration 🤝
opportunities
Allow for more
🤖 automation 🤖
in the workflow
Ease the
🧠 learning curve 🧠
for junior analysts
Status quo —what we started with
For a single project, analysts had to process the dozens of 100,000+ page documents that come out of a clinical study. The manual workflow created a lot of context switching, room for human error and required to run scripts in the command line. “Collaboration” took the shape of 1,000’s of annotations and highlights on PDF readers and enormous comment threads in Word documents. All this while prioritizing highly complex, intricate and strict compliance regulation standards of the health data privacy sector.
They used to rely on Jira tickets to track progress on projects, with tasks intentionally passing through multiple team members to benefit from a “second pair of eyes.” While this collaborative approach had clear advantages, it definitely had its drawbacks.
Each team member had their own way of completing steps, since much of the process knowledge was undocumented and existed primarily in the minds of senior analysts. This eventually led to them including rigorous quality control at every stage, adding friction to the process. To compensate, the team relied heavily on manually maintained cheatsheets, which quickly became outdated and required constant upkeep.
To give you an idea, their standard was to complete 40 studies per month. And that was with a team of 30-40 analysts at the time.
Process —how we worked it out
Gathering the right context —or, gathering the context right
With a project of this kind, the only way to get set up for success is to capture a really precise context map. This method is part of the Discovery phase over at Pencil & Paper and I don't know why this is not more of a thing.
A context map is essentially a visualization of all the dynamics at play for a certain project. In our case, we needed to learn the ecosystem the client is part of. That includes the customers’ mindsets, industry trends, competitor landscape, internal hierarchy and structure (because in this case our end users were internal), and much more.
This is where we start honing in on the business objectives our design involvement will aim at and the definition of our scope (this is also typically a fruitful time to start capturing definitions of internal terms and abbreviations we suddenly have to catch up with).
Mapping the full initial workflow
We ran quite a few interviews to observe the analysts' process in action. These sessions consisted of them screensharing and walking us through typical tasks, and us asking * infinite * questions. We would capture our learnings as we went in this workflow map. Once we'd consulted with people from each role, we were able to piece it all together.
User interviews & personas
We also held more conversational user interviews to get to know each role's responsibilities. We asked them things like "What does a typical day look like?" and "What does the ideal future workflow look like?". This is also when we started really uncovering the friction points mentioned earlier.
“Consistency is really really really important. If one person does one type of thing, but not the others, it’s possible for [Quality Control] to not spot it.”
— Junior Analyst
“Having a centralized place with an up-to-date bird's eye view of current projects would be the best. "Good morning Mary, here's the status for November", right?”
—Technical Lead
“A typical day? It's busy, to be polite.
I have a lot of hopes and dreams for [this software] to fix my life at work”
—Project Manager
At this point we had a pretty good grasp on who were they key personas we were going to have to optimize for, so we wrapped up our learning in cute persona sheets.
Low-fidelity end-to-end solution
As all of the above was unfolding, inevitably, we were starting to come up with interaction and flow ideas already. We eventually captured enough explorations to connect everything into end-to-end low-fidelity wireframes which we then ran back by them.
Solution —what made it to prod
After some feedback and iteration rounds, we developed the whole thing in a high-fidelity interactive prototype, a process which led to uncovering yet another layer of detail; naming conventions, data considerations, system statuses, etc. This always happens when end-users start seeing real words on UI mockups. They'll bring up things like "These labels are typically 50 characters long so we need wider columns" or "This machine learning step can actually take up to 3 hours so we need to be able to get out of that screen momentarily".
Outside of these, the reactions we were getting from analysts were just priceless. Feelings of amazement, excitement and relief were tangible through the video calls. We all felt like we landed on a solution that would better their work lives considerably.
In the end, we reached a consensus that fit the scope, and here's how it went:
Analysts would start by triaging all the pages of each document of a project in high-level categories.
Large sections of the documents got assigned to predetermined categories to help the model know where to look for what.
Analysts had to manually “tag” specific words and strings by type, to give examples to the machine learning model.
After the model did its “pass”, a collaborative step of human review happened to ensure perfect compliance.
Another machine-supported task was risk measurement. This was completely built-in and allowed analysts to switch to other projects in the process.
The workflow had the reporting step built-in, to get the project ready to send to the client all in one place
By this point, I was fully embedded within the team’s sprints and ceremonies, supporting the implementation phase. Over the course of several months, I contributed by refining designs, addressing iterative feedback, and ensuring that the experience kept meeting both user needs and technical requirements. This involved close collaboration with developers to make adjustments, introduce versioning updates, and resolve the inevitable design challenges that arise during the build process.
Kind words from the client team
“The professional approach to engaging with the team is among the best I've seen. They communicated in a timely manner, were consistently upbeat and optimistic, and produced outstanding designs.”
🔐
Read the full case study here
Access the password protected page (coming soon)