Why hasn’t MS Excel announced its CDP product yet?
It was quite a while ago when I heard the term customer data platform for the first time. And I had one big question: what is the difference between a CDP and a DWH?
And honestly, no one could give me a good answer then (and potentially not today). Maybe this was a first glimpse of what we call today composable.
In some DWH projects, we have already done the same thing. We collected and unified different kinds of customer data: Behavioral data and Property data from other sources like shop systems, ERPs, and analytics systems.
Based on what I know today, this was most likely a CDP. But looking at it, it already shows a problem. How do we even define a CDP?
What is a CDP
Let's unpack the Customer Data Platform package a bit to get a better idea of of what makes a CDP and what maybe not.
I have followed Arpit's content for years,; he is a great resource for making sense of the CDP space. And he spent a reasonable amount of time to define some things.
Check out Arpit’s series about composable and packaged CDPs: https://databeats.community/series/understanding-composable-and-packaged-cdps
They include a lot of foundational definitions.
He makes a difference between CDP and Customer Data Infrastructure (CDI), which makes many things more manageable when we move on.
A CDI is not a marketing category; no vendor calls it like that. One of the reasons for this might be that the event collection alone doesn't seem to be valuable enough. Alex Dean wrote an excellent post about this phenomenon (and it is no surprise that he thinks that event collection is a robust stand-alone solution - which I also think).
So, a CDI is an event pipeline. These can be instrumented pipelines like frontend or backend trackers. Or it can be endpoints that receive webhook data. A newer addition might be event streams that you collect into your Data Warehouse (like Kafka topics or CDC).
A CDP is a package that builds on top of the event data. Beyond event data pipelines, it helps with identity resolution and syncing the data back to marketing, sales, or other tools.
The sync back here is an important piece. This was the missing piece in the old CDP setup I described in the introduction. We usually had 1-2 syncs back to ad platforms like Google or Meta Ads, but that was the whole sync setup. We get to the importance of the syncs later in this post.
Composable or what?
There are minds in the tech scene that praise the marketing approach of creating a new category so you can own it as the category leader. This motion brought us things like Analytics Engineering and the composable CDP.
Honestly, I lost track of who was the first one that came up with. But one of the vendors offered a part that could make a CDP. Most likely from the reverse ETL category, to make their solution attractive for the wealthy marketing teams.
Composable means as what the Modern data stack meant - unbundling a package and embracing a best-of-breed approach. So you put together event collection, batch loading, storage, transformation, and sync in one composable solution, and voila, you build yourself a CDP.
The promise of composability is the same as the modern data stack - you pick the tools that do the job best for your use case.
And it has the same issues as the MDS - stacking eight tools together never works seamlessly.
So maybe composable CDP could have just been called the modern CDP to make it easier for people to understand.
The composable approach included many different tools for different categories made it attractive for all these vendors to push the concept of the composable CDP together. They could be part of a new category without having any new product features. Snowflake or Databricks were happily promoting the composable CDPs since it is not relevant for them what queries keep their CPUs warm.
But similar to the MDS and maybe even more, we see a step-by-step composting of the composable CDP. My educated guess is that Marketing teams have no interest in waiting for 12 months until the data engineering team has some resources freed up and composed a CDP for them. They want to start now.
Therefore, they look for all-in-one CDPs. These CDPs have a price, which is usually event collection (you might need to add new instrumentation), and, of course, since they are a bundle, the initial pricing is higher (but as we know from the MDS, a composable CDP can have sneaky increasing costs over time as well).
For the future, let's forget about composable or all-in-one and focus on the value that the solutions can get a company. Therefore, I want to examine the different CDPs based on their former functions. Because there is one interesting thing, most of the CDPs I know were something different before.
Different CDP buckets based on formers
A lot of former
As we already mentioned and referenced Alex's post here. One of the takeaways was that just event collection is not enough for most companies to meet their growth goals. And we will see this for more different models in the following. CDP is, interestingly, a category where most of the vendors today move into coming from an other category. I have a final bucket of original CDPs; there are some, but they are rare.
The former Email tools
Oh, email tools - the first instance where people thought they would do proper retention marketing by sending newsletters. They brought a category of email-sending tools. And some of them were successful and worked to provide more value to their customers.
So, they started to collect some customer event data to enhance their reporting and show the success of their campaigns.
Based on that, they created better reporting and analysis, enabling the building of audiences you could use for your emails. And then they added integrations to other marketing platforms like Google Ads.
And voila, they got a CDP.
A good example is Emarsys. We used it a lot in the old days. They did such a great job that they got acquired by SAP (not sure if this is the so-called heaven, but who knows.)
Another example is customer.io, which started with email, added segmentation, then added communication flows which could also trigger things at other platforms, and now introduced event pipelines to become a CDP.
Former email tools usually bring an activation layer with them. Often, it is just email, sometimes 1-2 more channels. The data integration came later, and therefore, they usually don't have a lot of instruments to control data quality better or transform data.
The former event pipelines
The event pipelines are the ones that start with collecting events from frontends and backends and, as a first step, usually store them in a database (or data warehouse). Additionally, they often were set up as a proxy to send these events and marketing platforms.
Segment started as this proxy. They started with an open-source tracker that made it possible to add tracking code once (and not for every tool that needs the event data) and then send the data to all tools. They were also starting to send the data to the the data warehouse at some point. And by that, they became a kind of mini-cdp, or more a CDI.
At some point, Segment offered Personas, which was their Segment builder (Segment Segments was potentially too hard to understand). They were already the core collection point for a lot of customer data, and with Personas, they introduced a way to handle identity resolution and segment building before this data was sent to the different marketing platforms.
Rudderstack, when they came out, quickly followed the same path, but in faster time (they, of course, had to catch up). This could also make them a candidate for the "no-former" bucket, but they designed a lot based on Segment's offer so that they can also live here.
Former event pipelines are more robust on the data quality side and often have services for that (also for handling PII data). They both invest a lot now in the Segmentation part to enable the segmentation with data that does not go through their pipelines.
The former reverse ETL tools
Reverse ETL as a category had a short lifetime. There might be people that say it still exists; there might be more people that say it never really existed at all.
Again, the term Reverse ETL is mostly a marketing term. The vendors had to find something that sets them apart but is still familiar enough so people get the idea of what they are doing quickly. In the end, they offer a service that is writing data from a data warehouse back into the tools (like marketing platforms) - hence the term reverse.
There is a feature in the CDPs that syncs the data to all the different platform, where it is then used to trigger emails, push, or ads.
Since they were a feature in this setup, a natural step was to become a more integrated product. So, they started to move into more CDP features. Next up was the Segmentation part, which was a natural step. Since they are built on top of data warehouse data, they offered a missing frontend for marketers to create segments based on the DWH data. And then sync these segments to the different marketing & sales platforms.
The big question now is, where they move next? Hightouch is doing the next logical step (to move away from the composable concept) and has added an event pipeline. Let's see what Census is doing here.
The former analytics tools
They are kind of late guests to the party, which is a surprise since they already have had the data and tools for some years already.
You can build segments directly in Amplitude or Mixpanel (and now also Piwik PRO) and push them to marketing platforms.
When you are an analytics platform, the sync feels like a logical and natural extension of your business. You collect behavioral data; your platform allows the creation of segments anyway. So you "just" need to add the sync part.
Interestingly, Google Analytics was quite early here - but only for Google Ads. However, the ability to create segments in GA and turn them into Google Ads audiences is still the most powerful argument for teams to stick with Google Analytics 4.
The big gap for these platforms was all customer data beyond the behavioral data. But with the new Data Warehouse integrations, they open up for this kind of data. You can now enrich your customer properties in your DWH and sync them to your event analytics tool. You can now integrate event data from marketing platforms like email interactions and combine them with your data.
The analytics platforms are my dark horse in the CDP race.
The no-formers
Yes, there are native CDPs. Maybe we can name one with a different former history but didn't fit into the other buckets: Tealium. Tealium was, first of all, an enterprise-grade tag manager. It's a bit like Segment, but pretty early, it launched the Audience Stream build on top of the tag manager.
mParticle may be the better example since they don't have a visible event data pipeline history like Segment. But perhaps they were a former as well, and I just missed it.
From a feature-set perspective, I would put them close to Segment or Rudderstack. But please correct me here if I am wrong.
It's not about customer data. It is about activation
Whenever CDP is mentioned somewhere, the big talk is about the customer data you collect from the different sources, put it in a central place, create some valuable audiences, and then pass them on to any tool that might use them.
Ingredient: Customer data
Don't get me wrong. I am getting the customer data in one place. Then, it is possible to combine them and, in this process, apply identity resolution. In the end, you will have an extensive dataset of your customer data. This is the foundation for what I will describe next. But I think this is not the essential thing. This kind of datasets have existed for a long time already. Sometimes in a Data Warehouse, most often in a CRM.
So, for me, this is not the part that makes the CDP special. As mentioned at the beginning, this was the part I did not understand when I heard about CDP in the first place since I thought we had this already.
The pricy meal: Activation
The real deal for me is the activation part.
I spend most of my life with event analytics. My typical use cases would look like this: I talk with the marketing team and pick up a challenge they have. They are not sure if the onboarding emails make an impact. So, I get into a first analysis, maybe extend our event design slightly. And finally, deliver an analysis that shows which steps in the onboarding flow need some optimization.
Then, the marketing department will hopefully pick this up and test some different mails. So I could do the same analysis again (or if I am smart, I create a funnel so they can monitor it).
Sounds good, but not really. There is a lot of handovers and explaining involved,, which is not efficient.
Here comes activation. Same example, this time different:
Still, the marketing is struggling with their onboarding sequence, so I do the same analysis. But this time, I deliver different audiences for the marketing team: all people in onboarding step 3 (where we see the most drop off), all people who have not moved on to step 4, and all people that, based on the first information, look like our core ICPs but have not moved to step 4.
Marketing can nowcan now create new communication and test them with particular campaign parameters, so we can look at the performance and potentially run more tests.
That is activation.
Our handovers are not any notebooks, dashboards, slides, or loom videos, but instead, audiences with the potential for better communication flows.
And I usually would go a step further - invest in plenty of training and loom videos to show marketing how to find audiences - and from that, they are in constant experimentation and optimization mode.
This is real enablement.
For me, the activation part is the essential part of the CDP game. And not that you have to run the activation part essential in the CDP, but that you have a closed loop from customer events & properties (analysis), segmentation, activation, and back to customer events.
Where are things moving and some bold predictions
Of course, we are curious to see where this is all going. Maybe we should even create a new feature monitoring website. To keep track of who has launched which new feature at which timestamp.
Sure, AI will be part of it (mParticle did this move already - at least in their website communication)
But what else - we will see in the following months.
So, to make it a bit more interesting, here are some predictions (aka acquisition ideas):
Rudderstack to acquire Posthog
The analytics part is missing in Rudderstack setup. Yes, they have an audience builder, but Posthog would enable them to create audiences based on deeper analysis.
Census to acquire Jitsu.
Hightouch added their event tracking (with some weird public uproar). So, to make a different example, Census could acquire Jitsu. Jitsu has a really solid event collector and could be a suitable extension.
If the Census’ treasure chest is still impressive, then they could also look at Snowplow. This would give them the industry-best collectors, event data pipeline, and data contracts in one go.
Amplitude & Mixpanel to integrate with Zapier or Make.com
Honestly, it's the easiest part. Piwik PRO did this already. This would unlock so many tinker possibilities. Especially now with the new tables feature in Zapier. This would unlock millions of activation scenarios. I also thought that they could acquire one of the marketing platforms like customer.io, but I think they are better off by integrating as many marketing platforms as possible.
In general, let’s revisit in 6 months which new CDPs came around and where the current CDPs have moved then. And by the way, I am a CDP, too (I have a perfect memory for names).
I hope, I could make the differences and parts in the CDP world a bit clearer. What are your thoughts and experiences? Let me know in the comments.
If you like this content - I also have written a deep dive workbook on event data design (part of the CDI) - have a look here:
Thank you for the shoutout and everything here is spot on!
Very thoughtful article, Timo.
Similar to the Former Email Tools, there are the Customer Engagement Platforms. Great delivery channels who have called themselves CDPs for many years now. They have largely been limited (3 - 6 month) stores of app and web event data, and are now taking steps to build sync with the data warehouse.