Goldilocks Criteria: Customer Data Platforms

This is the second in a series of posts designed to help managers think about business requirements for selecting enterprise vendors and software.  Please also check out my first post on Business Intelligence platforms.

Customer Data Platforms (CDPs) inspire a lot of confusion.  Best to begin with what they are and what they are not.

CDPs are:

  • A centralized platform for storing all of the user data about all of your users
  • A platform that can be used by non technical employees to activate / action upon user data
  • An safe-haven for secure user data management, compliant with the latest regulations and best practices
  • A bridge to combine your user data with external data sets
  • A rules engine for user segment management.  Want to build cohorts of users who opened an email and clicked on a Facebook ad – no problem
  • A platform for collaboration, breaking down individual business unit data silos

CDPs are not:

  • CRM solutions designed for sales or support teams to manage intricate customer interactions and workflows
  • DMP solutions focused only on anonymous cookied / IDed users (though they are coming close to covering this feature set)
  • Tag Management solutions designed to wire up various vendor libraries and SDKs.  Many CDPs were Tag Managers, but I think the historic focus on tag management is a disadvantage to be a best of breed CDP.  Just because you were a horse, it doesn’t make you a better car

And why do people integrate Customer Data Platforms?  Centralizing user data, strengthening the intelligence around it, and democratizing access to use it should impact business goals across the board from decreases systems costs to improved conversion rates.

The basic ins and outs of a CDP.

Given all of this, let’s review my Goldilocks (“just right”) criteria for picking a Customer Data Platform:

Connectivity and I/O

Customer Data Platforms are only as good as the pipes that bring data in and out of them.  You want many different roads into the platform from plug and play SDKs / libraries to full read / write APIs.  You also want pre built connectors into the most popular data sources (CRM, event ticketing platforms, etc) and data activation endpoints (ad networks, social media channels, email service providers, etc).

Security and Compliance

As we’ve learned over and over recently, user data security and governance is no easy tasks.  Outsourcing this to a vendor may be a hard decision to make, but it’s often much harder managing and maintaining secure and compliant user data solutions internally.  You want a partner with a tract record of secure data management, comparable customers that you trust and no fear of security audits from your team or others. You also want a partner that is quick to update to changing industry rules and regulations (ex. GDPR).  Internally, you want robust rules, roles and permission settings to partition off sensitive data for specific users and use cases.

Administrative Usability

CDPs are designed to democratize data-driven activities for non-technical users.  As such, you should require a modern, usable UX for non-engineers to get busy with the data.  Some providers require light scripting for segment creation or segment activation. No good. Best to trail the administrative user experience with some of your least technical colleagues before pulling the trigger on a vendor solution.

Identity Management and Identity Resolution

There are a number of features in this functionality bucket, but in short, you want your CDP to consolidate literally all of your available user data into a singular user profile.  This might mean partnering with a device or identity-graph provider to stitch emails to cookies.

This also means flexible data storage limits so that you don’t have to discard potentially valuable user data limits.  At Viacom, a certain % of the US population visits our sites / websites or volunteers their email addresses. That said, our TV signals reach the homes and mobile devices of a much larger user base.  We need systems to allow us to pull all of our data together without worry about a vendor’s storage costs or historic architectural limits.

Real Time Segmentation Updates

You user’s profiles and segments should update in real time as they take actions on and offline.  Many CDPs update segments hourly – which is no bueno. If a user views / interacts with your website or an online ad, their profile should update immediately so they can activate to the next event in your funnel.  Many of the CDPs who came from legacy industries (again, Tag Management) are just not architectured to support real time updates. This is of growing importance.

Integrated and Automated Machine Learning

The next generation CDPs go further than data storage and segment storage.  The best support unstructured data and use machine learning to automatically create useful user segments.  Some even crawl and categorize your content (pages, emails, posts) to find interesting patterns and apply those as dynamic segments to your users.  This is the type of thinking you want to see from your Customer Data Platform partners.

The platform should also support custom data science models – whether run internally within the CDP or through easy and performant read / write APIs.

ML fanboy alert – this is one of my very top considerations when reviewing partners.

Smart Orchestration

Getting your users through a funnel from start to conversion is never easy.  Your CDP should monitor and track your progress and where possible add dynamic intelligence to usher users through funnel events and towards your target goal.  The alternative is intricate manual workflow creation and management, which is hard to set up and even harder to manage against other initiatives.

This dynamic orchestration allows for truly personalized, omni-channel user journeys – experiences and messages that change based on the individual user’s profile properties and the best likelihood of conversion.

Industry Momentum

There is a ton of investment in the CDP space right now.  You’ll want to pick a horse with recent major funding from venture capital or a strategic investors.  Many of these companies will not be in business in two year’s time.

-David


Goldilocks Criteria: Selecting Business Intelligence (BI) Platforms

This is first of a new series of posts dedicated to helping people select data tools and infrastructure. I’ve listed out the ‘perfect’ feature set for a dream product. Of course these features rarely exist in a single solution, but if they did, I’d use it! First up: business intelligence platforms.

There are many fiefdoms in the kingdom of data – from product analytics to predictive models to advanced user-facing data applications – and no single platform will address every need. So for the purposes of this discussion, let’s define Business Intelligence platforms as data platforms that non-technical business users to explore, prepare and present data germane to their work. These tools should support data-driven insights and decision making but should not require a STEM degree or General Assembly workshop to operate.

Turns out this is ancient stuff. Business Intelligence (BI) platform date as far back to the ‘60 – the 1860s – when Richard Devens coined the term in his Cyclopædia of commercial and business anecdotes when Devens used the term to describe how Sir Henry Furnese, a banker, gained an advantage over his competitors by using and acting upon the information surrounding him. Over the past few decades BI tools have matured rapidly, shifting from beastly on-premise data warehouses with text-focused UIs to cloud-based, mobile-first, lithe data platforms designed for non-technical users.

https://www.sales-i.com/a-history-of-business-intelligence

BI is defined, generally, as ‘tools for data analysis and report generation on top of data aggregated from multiple disparate systems‘. Some BI platforms sit on top of separate data warehouses, and some modern platforms serve as the data aggregator / data store as well. BI tools pack a ton of functionality, but are typically narrow scoped. You don’t “do” anything within your Business Intelligence platform, instead you investigate, learn and report on how other systems are “doing”. BI surfaces data to guide decisions made elsewhere.

BI is generally defined as ‘tools for data analysis and report generation on top of data aggregated from multiple disparate systems’. Some BI platforms sit on top of separate data warehouses and some modern platforms serve as the data aggregator / data store as well.

BI tools pack a ton of functionality, but are typically narrow scoped. You don’t necessarily “do” anything within your Business Intelligence platform. Instead you investigate, learn and report on how other systems are “doing”. BI surfaces data to guide decisions made elsewhere.

You will also see BI in the form of Embedded Analytics within various tools – like your CRM system or your Web Analytics platform. Generally, Embedded Analytics help steer micro tasks like, which email subject performed best?, vs. providing a holistic view of data across multiple sources. The best BI tools provide this holistic view – pulling in all of your data to support cross functional views and insights.

So how does this work in practice? A great use case for BI platforms is to create easy to digest OKRs dashboards for your company, teams and individuals. Your BI platform should allow teammates from different business units pull up live views of their progress towards their outcomes / goals… anytime / anywhere… on their phones… without support from business analysis or IT.

OK, enough preamble. Here are the goldilocks (aka “just right”) criteria I look for in BI platforms:

Integrated data warehouse

Traditionally, BI tools sit on top of separate data platforms managed by engineering teams. More recently, a new class of products have emerged that allow you to upload / connect to your data without engineering support. I find this to be a huge advantage as it allows moderately-technical users to get up and running without distracting / relying on external resources. (Self service also leads to challenges with data governance but that’s another story.)

As an example, imagine easily joining together all of the spreadsheets you store in Google Drive / Dropbox with live data connections to Google Analytics / Facebook Analytics / financial data / more and then exploring and visualizing this data as you choose. That’s what these new platforms do all without the help of data engineerings resources.

Data engineering for dummies

Some of the best data scientists I’ve worked with estimate that they spend 80-90% of their time on data hygiene before they can begin analysis and exploration. The same goes for business analysts.

You will also see BI in the form of Embedded Analytics within various tools – like your CRM system or your Web Analytics platform. Generally, Embedded Analytics help steer micro tasks like, which email subject performed best?, vs. providing a holistic view of data across multiple sources. The best BI tools pull in all of your data to support cross functional views and insights.

So how does this work in practice? A great use case for BI platforms is to create easy to digest OKRs dashboards for your company, teams and individuals. Your BI platform should allow teammates to pull up a live view of their progress towards their outcomes / goals on their phones – right before they go to sleep every night!

OK, enough preamble. Here are the goldilocks (aka “just right”) criteria I look for in BI platforms:

Integrated data warehouse

Traditionally, BI tools sat on top of separate data platforms managed by IT teams. Recently, a new class of products have emerged that allow you to upload / connect to your data without engineering support. I find this to be a huge advantage as it allows semi-technical users to get up and running without distracting / relying on external parties. Self service also leads to challenges with data governance but that’s another story.

As an example, imagine easily joining together all of the spreadsheets you store in Google Drive / Dropbox with live data connections to Google Analytics / Facebook Analytics / financial data and then exploring and visualizing this data as you choose. That’s what these new platforms do all without the help of data engineerings resources.

Data engineering for dummies

Some of the best data scientists I’ve worked with estimate that they spend 80-90% of their time on data hygiene before they can begin analysis and exploration. 

https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#183675d26f63

What does that mean for BI tools? Any functionality that support easy data manipulation for the sake of improved clarity is awesome. That means – joining data together via drag and drop, changing data types with a click, deduplicating rows without writing SQL is all a huge value add, extending the range of users who can go deep with the data without external assistance.

What does that mean for BI tools? Well, any functionality that support simple data manipulation is awesome. For example – joining data together via drag and drop, changing data types with a click, deduplicating rows without writing SQL, are all huge value adds, extending the range of users who can go deep with the data without external assistance.

Live data! From the cloud! On your phone!

Data that arrives attached to an emails is DOA. This is one of my absolute pet peeves. Further, once people begin offline discussion and editing of the data, the risk of multiple inaccurate versions / views of the same data set commonplace. 

BI tools need to pull from a live backend at all times. When I pull up a link to view a dashboard the data should be (pseudo) real-time, up-to-date, and time stamped clearly with the data last run.

This also means the platform should be mobile-centric. Old timers still want their landscape printouts, but there is nothing more powerful than conversing with colleagues and pulling up live data views on your phone à la minute. 

AI / ML aware

I don’t want to overstate this one as we’re in the very earliest of innings, but your platform should have the foundation of supporting automated machine-learning driven insights. You may not find these immediately valuable (they rarely are out of the box) but in a few years you be getting voice alerts when your data spikes unpredictably in ways you may not have imagined. There is no sense in investing in a platform that is not actively working on automated data insights.

As a start, I’d like to see my platform present basic statistics around the data that I’ve on-boarded. This means simple distribution and correlation reports. As you play with these statistics you’ll be able to more easily wrap your arms around the data at hand, steering deeper analysis and insights. Simple predictive analytics is another good baby step before full blown AI.

This all said, you separately need to invest in training of your teams to take advantage of these statistical insights. Leveling up the data fluency of your team is always more valuable than standing up a wiz-bang technology solution.

Narrative & collaboration focused

A perfect platform would support for metrics-backed storytelling – and not just the sharing of pie charts. That means as a product owner, I can use a BI platform to explore a set of data and then build a coherent, sharable narrative around it. That could manifest itself as a online presentation with live data at different altitudes, supported by text, images, video and other added insights. It also means that I should be able to drawn / pin annotations within the data itself.

Further, the presentation should support active conversation around what’s being presented. Unlimited named user accounts, threaded comments, open annotations, creating next step action item , @ mentions and more are a natural fit here.

Governance gone wild

Sad to say, this is critical. Like supercritical. Like, as soon as you create your second dashboard you need extreme governance otherwise you’ll never find it again or know if the data set that powers it is up to date, approved and official. 

I’ve seen smart approaches here and they center around clear labeling of the data, its origins, similar / duplicative data and more. Having easy way to validate data as “best” or “official” helps too. Ultimately, ML/AI will be a huge help in this arena.

An integrated, dynamic “data catalog” that shows you the breadth of your data, its lineage, stamps of approval, and error reporting is also must-have.

User-level data FTW

BI tool typically play in the aggregated, anonymous altitude. You can see how all your site visitors behave, customer acquisition by location, sales by campaign, etc. Data is viewed on the content, page, campaign, location level – but rarely at user level. In a perfect world, a graph model would be deployed at the atomic event level allowing pivots at the above altitudes but also down to the user level.

A new breed of system called Customer Data Platforms is jumping into the fray here, promising a single view of the user. These CDPs are being leveraged today by Marketing and Sales team but the application of this user-level view to more typical BI use cases is immense. Perhaps CDPs are the topic of the next post in this series…

Live data! From the cloud! On your phone!

Data that arrives embedded within emails or as an excel attachment is Dead on Arrival. That is one of my absolute pet peeves. Further, once people begin to discuss and edit the data set, the risk of multiple versions / views of the same data becomes legitimized. 

BI tools need to pull from a live server at all times. When I pull up a link to view a dashboard the data should be (pseudo) real-time up to date or time stamped clearly with the data last run.

This also means the platform should be mobile-centric. Old timers still want their desktop-focused printouts, but there is nothing more powerful than conversing with colleagues and pulling out live data views on your phone à la minute. 

AI / ML aware

I don’t want to overstate this one as we’re in the very earliest of innings, but your platform should have the foundation of supporting automated machine-learning driven insights. You may not find these valuable right away (they rarely are) but in a few years you should be getting voice alerts when your data spikes unpredictably. There is not sense in investing in a platform that is ignorant to this coming trend.

To start, I’d like to see a platform present basic statistics around the data that I’ve onboarded. This means basic distribution and correlationsinformation. As you play with these basic metrics you’ll be able to more easily wrap your arms around the data at hand, informing deep analysis and insights. Simple predictive analytics is another good baby step before full blown AI.

This all said, you separately need to invest in training of your teams to take advantage of these statistical insights. Leveling up the data fluency of your team is often more worthwhile than the data platforms that they utilize.

Narrative & collaboration focused

A perfect platform would allow for metrics-backed storytelling, and not just the sharing of data dashboards. That means as a product owner, I could use a platform to explore a set of data and then build a coherent, sharable narrative around it. That could manifest itself as a online presentation with live charts (naturally) surrounded by text, images, video and other added insights. It also means that I should be able to drawn / pin annotations to the data itself.

This also means that the presentation platform should support conversation around what’s being presented. Unlimited named user accounts, threaded comments, open annotations, tasks lists, @ mentions and more are a natural fit here.

Governance gone wild

Sad to say, this is critical. Like supercritical. Like, as soon as you create your second dashboard you need this otherwise you’ll never find / know which data is most recent, best, approved and official. I’ve seen smart approaches here and they center around clear labeling of the data, it’s origins, similar / duplicative data and more. Having easy way to validate data / views as “best” or “official” helps too. Ultimately, Machine Learning will be a huge help in this arena.

An integrated, dynamic “data catalog” that shows you the breadth of your data, its lineage, validations and error reporting is also must-have.

User-level data FTW

BI tool typically play in the aggregated, anonymous altitude. You can see how all your site visitors behave, customer acquisition by location, sales by campaign, etc. Data is viewed on the content, page, campaign, location level – rarely at user level. In a perfect world, a graph model would be deployed at the atomic data layer allowing pivots by the above altitudes but also on the user level.

A new breed of system called Customer Data Platforms is jumping into the fray here, promising a single view of the user. These CDPs are being leveraged today by Marketing and Sales team but the application of this granular view to more typical BI use cases is immense. Perhaps CDPs are the topic of the next post in this series…