Jun 17, 2023ยทedited Jun 17, 2023Liked by timo dechau ๐น๐
Amplitude can also pull data from data warehouses like snowflake. Is this integration not good enough ? What is kubit's USP ?
Since amplitude is also marketing itself as a CDP, my guess is that amplitude - snowflake integration is going to get tighter with time. So, it will be hard for newbies like kubit to challenge something like amplitude. This intergation will also help amplitude counter the narrative of data warehouse-first CDPs such as rudderstack.
Jun 17, 2023ยทedited Jun 17, 2023Liked by timo dechau ๐น๐
Thanks for the detailed reply. I get the difference now. However, as with anything, there is a downside to direct-query in dwh approach.
I think tools like amplitude that pulls data creates silos that are essential. These silos are opinionated way of doing analytics and they obviously dont fulfil all user needs. And I always see the role of dwh as something that fills the gap created by these silos. All these dwh-first approaches have a downside that data engineering team has to now support use cases which were otherwise out-of-the-box in tools like amplitude. And this adds a complexity of exponential magnitude to the ressources and effort required by the data engineering team. The ROI is not at all justifiable in these scenarios! There are exceptions to this but 99% of the companies in the world will struggle with pure dwh-first approach. What do you think ?
Good point. But I don't see the product analytics as silo as essential.
There is one paradigm problem with the DWH. It is usually built for BI use cases. This means that events are second-class citizens.
But for event data or explorative analytics, you need the events as the core element. Most of the used data modeling approaches do not support that. And just giving access to the Segment or Rudderstack event tables is missing the magic I am looking for.
Using something like Activity Schema helps to change that. But it needs adoption.
But it can be built on top of the existing data models, and then the additional costs are manageable.
But with anything - it really depends on the case.
For all kinds of product analytics and marketing analytics that often require adding data from Google ads, fb ads, salesforce, mailchimp data, I think amplitude offers a nice out-of-the-box solution. If we try to build the same thing in dwh + BI tool then I think added (data engineering + data analysts) costs will be difficult to justify. Isn't it?
You need event data. So you still need a way to get events into your DWH. But if you, for example, just collect events from your application databases, there is no need for Segment or Rudderstack.
Sure, but that's always been the case. :) More a question of rolling your own or using an SDK. Do feel like a tracking layer helps with controlling different SDKs/attribution sdks/etc and keeps it tidy - so was unsure if Kubit did away with that selling point.
Really depends on how you use it. And especially where you need to send the data to and how quick. When it comes to a controlled environment I would go with Snowplow.
Amplitude can also pull data from data warehouses like snowflake. Is this integration not good enough ? What is kubit's USP ?
Since amplitude is also marketing itself as a CDP, my guess is that amplitude - snowflake integration is going to get tighter with time. So, it will be hard for newbies like kubit to challenge something like amplitude. This intergation will also help amplitude counter the narrative of data warehouse-first CDPs such as rudderstack.
I think it is in the details and what you need.
The major difference is that Amplitude loads your data into Amplitude and Kubit runs a query in your warehouse and receives the data.
This has different impacts.
- privacy-wise, having all in your DWH is preferable
- performance-wise on big data Amplitude's approach might be quicker // but needs benchmarks
- running on top of the DWH gives you more flexibility for extending the data model (AMP needs a full sync)
- you can use different leading identifiers in Kubit, like user, account, product, campaign or whatever next to each other
Thanks for the detailed reply. I get the difference now. However, as with anything, there is a downside to direct-query in dwh approach.
I think tools like amplitude that pulls data creates silos that are essential. These silos are opinionated way of doing analytics and they obviously dont fulfil all user needs. And I always see the role of dwh as something that fills the gap created by these silos. All these dwh-first approaches have a downside that data engineering team has to now support use cases which were otherwise out-of-the-box in tools like amplitude. And this adds a complexity of exponential magnitude to the ressources and effort required by the data engineering team. The ROI is not at all justifiable in these scenarios! There are exceptions to this but 99% of the companies in the world will struggle with pure dwh-first approach. What do you think ?
Good point. But I don't see the product analytics as silo as essential.
There is one paradigm problem with the DWH. It is usually built for BI use cases. This means that events are second-class citizens.
But for event data or explorative analytics, you need the events as the core element. Most of the used data modeling approaches do not support that. And just giving access to the Segment or Rudderstack event tables is missing the magic I am looking for.
Using something like Activity Schema helps to change that. But it needs adoption.
But it can be built on top of the existing data models, and then the additional costs are manageable.
But with anything - it really depends on the case.
For all kinds of product analytics and marketing analytics that often require adding data from Google ads, fb ads, salesforce, mailchimp data, I think amplitude offers a nice out-of-the-box solution. If we try to build the same thing in dwh + BI tool then I think added (data engineering + data analysts) costs will be difficult to justify. Isn't it?
When out-of-the-box works for your setup, cost-wise it is usually the better option. If you have a high event volume, it might be already different.
If you use smaller and different marketing platforms, you depend on getting an integration.
So, in the end, it is up to you to decide when a stack works and when the limitations become real blockers.
There are no best solutions in general, only in context.
I don't see how Kubit does away with something like Rudderstack or Segment, aren't there still a need for a unified tracking layer of some kind?
You need event data. So you still need a way to get events into your DWH. But if you, for example, just collect events from your application databases, there is no need for Segment or Rudderstack.
Sure, but that's always been the case. :) More a question of rolling your own or using an SDK. Do feel like a tracking layer helps with controlling different SDKs/attribution sdks/etc and keeps it tidy - so was unsure if Kubit did away with that selling point.
Really depends on how you use it. And especially where you need to send the data to and how quick. When it comes to a controlled environment I would go with Snowplow.
This is very timely Timo, thanks for putting this down!
Thanks - I need keep up with your CDP work.
Oh good luck with that. I'm trying to dial it down but it's not happening. Everyone wants to talk to me about CDP stuff so expect a whole lot more.
I would still say CDP is hotter in the market than product analytics.
Oh yeah for sure. It's too hot to handle!
Great read!
Thanks! Especially because you are an essential part of all these changes.
Wonderfully put, Timo! Look forward to your take on how BI compares in your next post.