Preparing Data For Analysis
MODULE 2 LESSON 3 – PREPARING DATA FOR ANALYSIS
The online data that we reviewed in the last chapter contains huge amounts of text and some computer code. Now we need to take this data and create meaningful indicators that we can then use in our predictive models. This chapter will review the process of turning raw data into meaningful marketing indicators that are robust, accurate, updated, and ready for analysis.
First, to understand the process in which we create the marketing indicators, let’s review the type of data that we can get from the web. Online data can be divided into two main parts: text and code. There are, of course, other types of data such as images and video, but we will not touch on these today.
Analyzing text includes web copy, news, key employees’ résumés, and marketing tactics. Underlying code analysis includes online technologies, online advertising, third party code, and online tracking and measurement.
Online data is processed through the technologies of crawling and mining.
Crawling is a process of creating a robot that goes around the web and reviews the content of websites, social networks, and online database. The robot then collects the type of data that you need. Data mining is making sense of all of this data.
A simple example–Imagine that we just crawled the website of Eloqua and found a blog post about Eloqua’s “Modern Marketing Experience.” Data mining will inform us that Eloqua uses tradeshows as one of their marketing tactics.
When analyzing online data, one of the most important yet challenging processes is turning unstructured data into structured data.
For example, look at this bio of Richard Branson. You can immediately understand that Richard Branson is the founder of Virgin Group. Here the data is relatively structured so that computers can understand this with relative ease.
However, imagine that this information is within a blog post or other less structured webpage. Then, the challenge of creating structured data becomes more interesting and complex.
Another challenge in creating an accurate set of indicators is standard format. Many job titles can have dozens of permutations, which basically mean the same thing.
Have you ever looked at the company database and seen the number of ways you can write United States, not including typos?
Now, if we keep all of these variations in place, while we may know that we actually mean the same thing, a computer may treat them as completely different indicators. Therefore, before we start with predictive modeling, we need to make sure to collapse all variations of the same indicator into one. Otherwise, the predictive power of our model will be significantly diminished.
Why is matching important?
In the previous modules, we described a large number of sources of data. When data comes from different sources, you need to combine it all together. You need to understand that all of the information that you found online relates to the same company or individual, or in our example, to match all of the data that we found online on Richard Branson with his record in the company database.
Matching is a critical process because, if we are unable to link online data with our company data, all of our crawling and mining were in vain. Therefore, you need to make sure that you can get a very high matching rate so that you’ll end up with a robust dataset that is ready to apply in predictive modeling.
What is the actual process of creating a marketing indicator? Here is an example of identifying companies that use Microsoft SQL Server. Finding Microsoft SQL Server users is not an easy task, as there is no central database of users or any clear evidence or indication of its use.
Therefore, identifying those users requires complex detective work that is done at scale through the use of data mining and predictive modeling.
In this example, we start by gathering all of the evidence that we can about whether a company is using SQL Server.
- Hiring: Are there any SQL server titles mentioned in any openings that the company advertised?
- Org Chart: Can we find any SQL Server manager or IT manager with similar qualifications?
- Website Content: Is the company a Microsoft partner?
- Third Party Sites: Are there any case studies that mention an SQL server, or any press releases, or blog posts that do?
After we gather all available evidence, we use predictive modeling in order to crunch the data and figure out the probability that the company uses the Microsoft SQL Server. This technique requires data mining at a huge scale and massive computing power in order to crunch the numbers.
However, this example is only for one indicator for one company. In real life, the same process is done to a few thousand indicators and for millions of companies!
As a rule of thumb, demographic marketing indicators such as company size are easier to obtain. However, the more granular you get, the more meaningful your marketing indicators become.
Typically, more specific marketing indicators have more business value and tend to be better predictors of fit. These include information about technology, business practices, hiring and organization structure.
On the other hand, the more specific a marketing indicator is, the harder it is to develop.
To learn more about “Data, Predictive Analytics and Marketing Clouds: The Platform For The Modern Marketer”, view the slides and webinar recording presented by John Bara (Mintigo), Jay Famico (SiriusDecisions) and John Stetic (Oracle Marketing Cloud).
View the slides and webinar replay now!