Welcome to Predictive Marketing University

Jacob Shama, PhD
Learn how to leverage data science to improve your marketing and boost revenue

Data Sources



Predictive algorithms are fed with Data. Therefore, robust, accurate, and updated Data are key for achieving the best results. Best marketing data drives best marketing decisions. The vast growth of data from the web and social media has created huge opportunities for boosting predictive marketing.

In this section we will review some data sources.

The first data source is the company database. Companies have always collected data on their CRM. However, the main goal of CRM is to collect data about customers, not prospects. To address the needs of marketers, a new system has been created—the marketing automation platform. Marketing automation collects data about leads. The data is used to nurture the leads and slide them down the funnel. It is a process machine.

There are two types of data that marketing automation platforms collect and use:

  1. Firmographic data, like company, company size, revenue, and contact names, job titles, etc.
  2. And behavioral data, which includes data about emails and marketing assets that the prospect engaged with–website visits, marketing events and more.

These data can then be used for basic marketing actions. However, data captured in organizations today has a few challenges.

Company data gives only a fractional view of a lead as it mostly contains contact information such as names, job titles, and address. On average, the company database contains 10 data points on each prospect, as compared to more than 1,700 that you can find on leading predictive marketing platforms.

Another challenge with company data is that it is stale and rarely updated. Typically, after a lead or a contact is created in the CRM, sales reps rarely update them with new information. However, people change jobs or get promoted, companies grow or launch new products, and their needs change. Data keep changing.

Nevertheless, this will not be reflected in your CRM or marketing automation data.

The third challenge is that CRM data is not clean. When multiple people enter data over a long period of time, it becomes less and less clean.

The fourth challenge is that data structure is not standard. The same variable can be entered in many different ways. For example, states can be entered as two letters or as the full name. Job titles that are the same can have different names. Nonstandard data makes it hard to segment and draw meaningful conclusions.

Last, as with any database, some of the variables are missing from the company database. Inbound leads may have left some of the fields in the form empty, and leads from different sources may have different fields associated with them.

To Drive Predictive Marketing–Marketing Data needs to be Rich, Updated and Accurate.

The second data source is social data. Social is one of the fastest growing online domains.

While engaging in social sites, people leave a large online footprint that can provide high visibility to their profiles, interests, and activities. This can all be mined to provide a very rich and robust dataset.

Here are the types of data that can be mined from social networks and social media:

Social profiles: Social profiles contain a wealth of information including résumés, interests, training, education, job responsibilities and more. This can be one of the best and most robust sources of information about individuals that is available online.

Just as sales reps review LinkedIn profiles of individuals they talk to before engaging in a demo, the same can be done automatically and at scale. In addition, social profiles can continuously be updated by individuals to reflect promotions, job changes, and new skills. Social profiles can enrich any dataset with relevant data.

Social links: This is what you can learn from looking at people’s friends and colleagues. This is one of the hottest fields in math and computer science: network analysis. When you analyze the connections between individuals, you can form clusters that can teach you a lot about a person’s education, training, interests, goals, and needs.

A great way to have visualized this was a LinkedIn tool called InMaps, which is no longer support. LinkedIn’s InMaps analyzed the connections of an individual to form a cluster. The algorithm did not know anything about the person except for how people are connected among themselves.

When you analyze the results though, you do see clusters that make sense. Friends from college can form one cluster; colleagues from a former job, another, etc. This is how algorithms that analyze big data can discover things that we as humans need more time and knowledge in order to know.

Social activity: Social activity includes all of those likes, shares, tweets, and retweets that we do on social networks. These can actually contain valuable information regarding interests as well as wants and needs. Sharing lots of images of new cars? You may be in the market for a new car or just a car enthusiast.

Just linked or tweeted an article about marketing automation? You are more likely to be a marketing automation power user. This wealth of information is invaluable for marketers when analyzed at scale.

In sum, social networks contain a wealth of information, by analyzing social profiles, social links and social activity.

Company websites is the third data source.

Websites may sound obvious, but the real challenge there is making sense of all of the unstructured data that includes text, images, and web technologies. We divide the data that can be obtained from websites into two main sections:

First, we have the public website information. This includes:

  • Products
  • Company descriptions
  • Keyword density
  • Management
  • And other information, like press releases or financial disclosure information.

The section that is less obvious is the data that exists under the hood. This data includes:

  • Web technologies
  • Inbound links
  • Traffic and rankings such as Alexa
  • Online advertising tactics including PPC and retargeting
  • Online engagement
  • Web technologies used on the website such as chat, forums, and marketing automation platforms.

Websites indeed contain wealth of information and much of it is hidden from plain sight.

News outlets and blogs are the third data source for predictive marketing. Mining and analyzing news and articles at scale can provide insights that are hidden from plain sight. Imagine a dedicated person reading industry news for every one of your leads and updating it on a constant basis with new information.

This is the power of data-driven predictive marketing. Websites and blogs contain a wealth of information.

The challenge is to crawl millions of blogs and news sources to get the information, and then use sophisticated text analysis in order to understand it and find the valuable data points.

The fifth data source is private and public databases. Databases contain a wealth of information that can become helpful data points for predictive models.

This data includes:

  • Financial statements
  • Government filings
  • Company financials
  • Patents

Putting these databases together with our data can teach us a lot about a company’s growth and reveal data that is not available anywhere else.
Very important data that every marketer would like to know is buyer fit to an offer and purchase intent.

A great example for buyer intent is search. If we search online for a vacation in Hawaii, this reveals not only that we may be shopping for vacation, but also our preferred destination. While search data is private and is not available for marketing use, other types of data can indicate buyer intent.

For example, social networks, if you are asking people for reviews or recommendations, you are very likely to be looking for that product or service.

Data, Predictive Analytics & Marketing Clouds

To learn more about “Data, Predictive Analytics and Marketing Clouds: The Platform For The Modern Marketer”, view the slides and webinar recording presented by John Bara (Mintigo), Jay Famico (SiriusDecisions) and John Stetic (Oracle Marketing Cloud).
View the slides and webinar replay now!