I was in a podcast recording with Juliana and Simo this week, and we talked about event data and how to design it.
And Simo asked an essential question: why can't we collect the events we need right now, name them correctly, and stick with that? That got me thinking. Because he is right, this is the better approach in some situations. But my gut feeling told me that these are exceptions. I thought a bit about it.
I explained it in this way: The agile approach works for quick insights, especially when you are one person or a very small and hands-on team. Beyond this, it fails.
It does not fail technically (ok, to be true at some point, this will also happen), but it fails for the user experience.
When people ask me why I recommend reducing the number of event definitions (names), dashboards, or anything in front of the user, this is all about data user experience.
A product analytics example - you want to find out what the retention looks like for a new free trial account, coming back and doing one out of three important activities in our product.
Imagine selecting these events from a catalog of 150 unique event definitions. And there is most likely no documentation of all events and most likely no 100% clear standard of the naming. This can take up to a week. And this is a bad data user experience for me.
Let's dig deeper.
What is user experience?
"The user experience (UX) is how a user interacts with and experiences a product, system or service."
"It includes a person's perceptions of utility, ease of use, and efficiency"
Utility, ease of use, and efficiency are already good for deeper analysis. But maybe we can get more.
Let's look into Peter Morville's UX honeycomb:
Source: http://semanticstudios.com/user_experience_design/
And more here: https://www.interaction-design.org/literature/article/the-7-factors-that-influence-user-experience
Here we find:
Useful - If a product lacks utility or purpose, it will struggle to compete in a market filled with useful items. However, "usefulness" can be subjective and include non-practical benefits like entertainment or aesthetic value.
Usable - Usability focuses on enabling users to achieve their goals effectively and efficiently, and products with poor usability, like first-generation MP3 players, are less likely to succeed compared to more usable alternatives, such as the iPod.
Findable - Findability is crucial for a product's success and user experience, as it ensures that the product and its internal content are easy to locate, much like how a well-organized newspaper enhances readability.
Credible - Credibility is essential for a product's success, as users seek trustworthy options and are unlikely to give a second chance to products that fail to deliver on promises, impacting both user experience and business viability.
Desirable - Desirability, influenced by factors like branding and emotional design, can set similar products apart, as seen in the preference for a Porsche over a Skoda, and highly desirable products are more likely to generate word-of-mouth promotion.
Accessible - Accessibility in design is often overlooked. Still, it is crucial for reaching a broader audience, including the nearly 20% of people with disabilities, and it not only benefits those with impairments but often makes products more accessible for everyone to use while also being a legal requirement in many jurisdictions.
We will take these factors and apply them to typical data UX scenarios. But before we do that, we look at where data UX happens.
Where is data UX happening?
The frontends
Most obviously, we can find data UX when people are trying to use the data to find and get valuable insights. These can be:
Dashboards - luckily for this asset, we have a UX discipline. Plenty of books have been written about Dashboard design.
Analytics tools - Compared to the dashboard, these tools focus more on data exploration supported by a tool. Some have very poor UX, but the leading ones enable users well. But they come with training. All analytics tools need training and learning the concepts before you master them.
Monitoring & Alerting - Mikkel Dengsøe wrote an excellent piece about the challenges of getting alerts right and how to establish processes and standards to work with them.
https://medium.com/@mikldd/data-tests-and-the-broken-windows-theory-60185afaade9
Applying our UX factors for data frontends:
Useful- means in this context that the dashboard's insights can bring value to the viewer.
Useable - means that core functions of the dashboard are easy to understand, how to select a timeframe, how to add a filter, and how to understand that there might be multiple pages.
Findable - Important one. Especially in bigger setups (+15 dashboards), it is often hard to find the right dashboard. And a search is often not a solution here. It needs a structure and a good discoverability approach.
Credible - Can we trust the data, and is it fresh? Good teams work with indicators for both on the dashboard.
Desirable - Most likely the most ignored aspect (only challenged by accessibility). Can a data frontend be made desirable? Of course, it can. I can't work with BI tools with poorly designed Charts and Dashboards (looking at you: Power BI). If you have two similar useful and useable dashboards, users will likely pick the one with a good visual design over the other.
Accessible - In the UX sense, this means making interfaces accessible for users with disabilities. This should be something you try as well. So far, the tool limits you (I have yet to hear of good screen reader support for charts so far). But some basics, like contrast and accessible language, should be possible.
Another aspect is who can access the data. Data is often isolated and restricted from broader usage. Often for no real reason.
The data structure
People who work in dashboards or any analytics tool will care about something other than the data structure. However, anyone working to prepare the data for these systems will love an excellent data user experience.
How can we improve the data user experience here:
Naming of tables, models, and columns
Naming always sounds boring, but it has natural superpowers. A straightforward naming convention used everywhere can improve the data UX significantly. For example, if your timestamp fields are named load_time, ts, time_stamp, timestamp_,... you need plenty of more time to find them.
To solve this, you need two steps:
First, define a naming convention (e.g., timestamp is always ts when representing event data)
Second, a system that checks it and enforces the naming convention. This is a lot harder; as far as I know, no tool can provide this.
This will improve the Usable and Findable aspect.
How you write SQL
Styleguides for SQL are an excellent idea. Using SQL formatters (and your custom configuration) is a great way to establish a more straightforward UX. When a query or a dbt model always has a predictable structure, it makes it easy to find things and to work with the query.
As an example, here is the dbt style guide used by Gitlab:
https://about.gitlab.com/handbook/business-technology/data-team/platform/dbt-guide/
CTEs might cause performance issues, but are an excellent UX tool for making a model or query more readable and customizable.
This will improve the Usable, Desirable, and Accessible aspects.
Ownership
A schema has shifted, the value pattern in a column has changed, and something downstream breaks. Naturally, we are having discussions about data contracts. One thing in data contracts can be applicable much earlier: ownership.
Knowing who produces the data and how to write when something looks weird or breaks are already significant.
This will improve the Credible aspect.
Semantic continuous metadata
This is a broader term where ownership will also find its place. Specific goods in the producing industry have bar codes; behind them, they have an extensive history and metadata. When we have metadata about a column to know:
about the source
the ownership at the source
if this is PII data and which level it has
This is where data catalogs can shine. I'm not sure how well this is supported there, but it is definitely something I will test more deeply in the future.
This will improve the Usable and Findable aspect.
Architecture and Design
Plenty of problems can be avoided and solved by having an efficient architecture or design. A data model comprising 300 dbt models gives everyone a hard time in the daily data user experience. And guess what? This will also be something a data catalog can't solve. Imagine you reduce this model to 30 dbt models. This increased the UX significantly.
This will improve the Useful, Usable, and Desirable aspects.
The data itself
But what about the data itself?
Now, it gets more tricky. But there are some aspects we can look into:
The right form of the data
Most data types enforce the right form of data, like boolean, integer, or float. But others can become a mess. We are discussing string/varchar or their newest friends like JSON or OBJECT. They have their rightful place, but please handle and use them carefully and only where it makes sense.
Using the right type would be a Usable and Credible criterion for the user experience.
Consistency of the data
When we look into string values, there are at least two types. The widespread variances are ones where the values can be unique or nearly unique. Examples are email addresses or any user-generated content. These types of data are often useless for analysis in its original form. Then, there are string columns with enumerations a defined list of values. This is great for analysis but can also be a pain when enumerations are not checked and enforced with tests and rules. Ensure that enumerations and cardinalities are tested and monitored (cardinality, in this case, is the number of unique values in a string column).
Well-monitored string enumerations make the user experience Usable and Credible.
Avoiding widespread variances in string fields improves the Useful user experience.
The NULLs. NULLs happen, and in many cases, they don't hurt. But sometimes they do. Make conscious decisions when you should replace NULL values with something meaningful like "not provided" or "not applicable" - anything that helps the data user to understand the value immediately.
Replacing NULL values with understandable values can improve the Useable and Desirable user experience.
How to work on your data user experience
Reading all this sounds nice, but how do you incorporate this into your daily work?
I would start with creating checklists of things you want to cover; most of them would be regular checks (monthly or quarterly). And then, I would set up a standard survey and do regular interviews.
Here is my example checklist:
Improve Useful
I survey all data users, asking if they work with data, how disappointed they would be if the data were gone, and their blockers and use cases. This would be every quarter.
Improve Usable
Conduct user tests for the significant dashboards or tools - to learn where they struggle.
At least 1x a month.
Look at cycle times and time to recover for your analytics engineers. Are the times long? Do they improve?
Is a SQL style guide in place and used for queries added to the repo in the last four weeks?
Are naming conventions in place, and are they followed - check commits of the last four weeks - ideally automatically.
Improve Findable
Track the request where people ask for things that already exist. Ask them why they could not find it and how they tried.
Improve Credible
Ask about the data trust in the survey; this can also include asking for specific metrics and if they are trusted. This would be in the quarterly survey.
For essential source tables, are the sources and the source owners documented?
Improve Desirable
It takes a lot of work to test. You can develop a visual guideline, apply it with one or more dashboards, and ask for feedback by comparing before and after.
Improve Accessible
How many people have access to the data? How is your data user / all people in your company ratio?
Generally, the best way to monitor the data UX is to work with a continuous survey and regular user tests and interviews. This is a must-have setup for any data team if you want to improve your impact and role in the company.