Tracking/Measurement/Collection/Creation - what was the question again?
Trying to define something that needs definition but has a history that can't be changed easily.
Roughly two years ago, I wanted to do something that any business mentor recommended. Specialize on something and create a category where people can recognize you easily since the generic space is too crowded.
I did anything across the data stack at that time: from the collection to pipelines, modeling, and monitoring. And with no surprise, everything was at a good enough quality level, which left me pretty unhappy.
To find the special niche, I checked on two sides: What do I really like to do, and what is in constant need in all projects I have done so far?
And it was easier than I thought: I enjoy working where we create the data - this can be in frontends, backends, or webhooks. Just the place where something happens in the application, and we say: "Hey, that is interesting. Would it be ok when we pipe this down to an analytical system".
And luckily, this was (and is) also one of the topics that come up in any project (in different severity levels).
I have my niche, but how do I name it?
That was a lot harder than finding the niche. And it still is. And this opens up the topic for this post.
Let's start with a technical definition:
"When a specific action happens in an application (platform-agnostic), it can trigger an event, and we can send this event data to an endpoint where we store it for the analytical purpose (the purposes could also be automation, ML models,...)".
Quite too mouthful for a landing page headline.
At that time, I tested with potential clients, which term let them immediately understand what I am offering. And it turns out there is one term that does precisely this:
TRACKING DATA
Well, it was not something to open up a bottle of champagne. It's great that people immediately understand it (Do we need to fix your tracking - hell, yes, when can we start). But not so great when the term is a bad reputation.
But let's start there with our journey of definitions:
"Web tracking is the practice by which operators of websites and third parties collect, store and share information about visitor's activities on the World Wide Web" - Wikipedia (https://en.wikipedia.org/wiki/Web_tracking)
And Matthew Brand brought up an etymologic reference when we discussed on LinkedIn:
Taking the initial "trac" and the Wikipedia definition - gives us an excellent start to why tracking has a hard edge. We follow an individual digitally and collect their foodprints (or better fingerprints) along the way and build (maybe even unintentionally) a profile. We intend to keep following and see where it goes, and we stop when a goal has been reached that satisfies us (the famous conversion) or when we lose track. Tracking on this level is not so far away from stalking, isn't it?
Why do people immediately understand when I mention tracking what service I offer? For historical reasons, I guess. At least as long as I can think, tracking and analytics always went hand in hand. With tracking pointing to the marketing part where you tracked the performance of ads. And here, the tracking part was essential because to measure a success of an ad, you need to follow someone from the click to the conversion.
What about Measurement?
Stephané Hamel used a definition quite like this:
Because he puts it in contrast to tracking, where the purpose is to observe users, and sees measurement as a more generic approach, with no intention of identifying individuals.
But this can also be true since the measurement is a much broader term. As Matthew put it here:
So tracking is just a special measurement, which I think is true. But I still like the picture of a more generic measurement that does not need any identifier to collect an individual activity over time.
Maybe measurement is the innocent version, where tracking already took the apple. Ok, perhaps too biblical.
What about data collection?
Interestingly, since I walk in different worlds on the data engineering island, I often come across data collection.
My not scientific explanation for this is that data collection is usually ignored by data engineers. The data drops in there, and then the real work starts. Where it comes from? We don't care since we have to clean up anyway.
This also describes the term collection in a better way. We only pick up what is already there and left by someone or something else. So if this data contains any tracking purpose, we might not care. Or maybe if we do, we ensure we tokenize it properly.
What if we could come up with something friendly and positive?
Maybe Data creation?
I don't know if Yali Sassoon was the original person to come up with the term in this context, I don't know. But it was the first time I saw it.
What I like about this view is the proactive part and the real purpose of the process. I create this data point because it helps me to understand if we have generated value for our customers. It's beyond the simple: "let's track our application," which usually ends up with 95% of data never used. It also goes deeper than measuring something, where you focus on putting the sensor somewhere without a clear intention of what comes next. And it is way more active than just collecting data.
So, this would be the winner for me.
Ok, I put "How to fix your data creation" on my landing page.
Not really.
The language used is lazy, solid, and static (ask women about "guys"), and it changes very slowly.
Sometimes you open a new area with your term because you describe a changed behavior much better than old terms. Analytics engineering is an excellent example of it.
But data creation is not there yet. But maybe we will get there in the future.
And you might have recognized that one term appeared throughout the post:
Purpose.
The purpose was something that floated in my head but was unintentional. Stephané used it with a lot of intention (same quote as above):
And Aurélie Pols brought it up just recently in a different discussion in the same context of data privacy:
The purpose is what makes data creation powerful.
I first define the purpose of the data, and then I take care of the creation. With a goal in mind, we will come up with less tracking because, to be honest, we don't need to know anything about an individual behavior for plenty of use cases.
I will extend this purpose more in a future post.
Tracking and Collection still make the most sense to me. Measurement has a different meaning and Creation, is just weird.