m88 asia
Kevin Klein – Search Engine Land News On Search Engines, Search Engine Optimization (SEO) & Search Engine Marketing (SEM) Fri, 10 May 2019 19:50:18 +0000 en-US hourly 1 https://wordpress.org/?v=5.2.2 Leveraging data science to illuminate the modern consumer decision journey /leveraging-data-science-to-illuminate-the-modern-consumer-decision-journey-316799 Mon, 13 May 2019 12:00:33 +0000 /?p=316799 When analysis of behavior focuses on query relationships, we find searches are seldom linear and occur in clusters that don’t necessarily align with funnel-like behavior.

The post Leveraging data science to illuminate the modern consumer decision journey appeared first on Search Engine Land.

Today’s consumer decision journey is rapidly expanding into something bigger and more intricate than ever imagined when the internet first arrived on the scene. Searcher behavior continues to evolve at breakneck pace, especially as new forms of mode and modality push marketers further into the realm of AI and machine learning. In order to fully understand the complexities of the new consumer decision journey, today’s advertising teams must accommodate a new role: data scientist.

The advertiser analytics group at Microsoft Advertising (my employer) is taking deeper dives into internal search query data to help marketers get the visibility they need. What exactly does today’s CDJ look like? Well, like this:

What we’re looking at here is an actual representation of recent search queries on Bing related to enterprise cloud software. It’s a network comprised of search queries and the relationships between them, with relationship defined as searches conducted by the same person in a close window of time. Let’s dive in and explore.

Messy, right? Well, understanding searcher behavior is a complicated problem. First off, let’s look at all the different communities within this network, which are visualized by color. It should become quickly apparent that these queries are clustered thematically; queries around VPN are given their own color, and big players in the space such as Azure and AWS have large communities. It is important to note that queries are not placed in communities based on the content of the query itself, but rather based on the regularity with which they are searched by the same user. This is an important distinction, and it gives us something that is hard to come by: a raw, unbiased look at where brands are positioned in their space.

Focus on query relationships

The size of a community is always an interesting factor, but it is the relationships between queries that can best unlock hidden insights. No matter what your product or brand space, there are queries that exist in one community, but have relationships with queries in other communities. For instance, we see below that the queries “cloud computing” and “IoT” have relationships with each other, and with the Azure and AWS communities. This is the connective tissue that drives deeper insights into your customers, your business and your competitive landscape.

The key takeaway with relationships is that the vast levels of interconnectivity between queries illustrate the true sophistication of searcher behavior. Searches are seldom linear, occurring more in clusters that don’t necessarily align with funnel-like behavior. Enduring convictions about consumer intent, loyalty and the different types of contributions made by brand and non-brand queries are challenged by the data. To help drive the point home, let’s extract some insights that are only accessible with this acknowledgement as a prerequisite.

We’ll start by isolating the query “what is AI?” We can instantly see in our network that this query has been searched by users who have also searched “what is artificial intelligence” and “AI.”

In turn, we see associations between these terms and brand nodes, such as “IBM” and “AWS.” However, ultimately, we are able to see that “what is AI” is a part of the same community as “IBM,” telling us that many people are searching for both. IBM is doing a superior job at positioning their brand closer to these types of consumer questions.

How about one more example of how embracing the intricacy of searcher behavior can open pathways to a more comprehensive understanding of industry dynamics? Like other big players in this space, Google has its own community within the network. The query “Google Cloud” is the central node in this community, and based on what we’ve seen in other communities throughout the network, we would likely presume that other queries within the community would also be related to Google’s cloud product. However, this community defies our expectations; it contains a mix of competitor and non-brand terms, many of which contain the term “cloud.” From this, we can denude that Google has positioned their brand close to the term “cloud” – a nice mindshare win for them, and an opportunity for their competition.

AI in the CDJ

How can today’s marketers manage such complex customer journeys? Firstly, it’s important to have a strong partnership with your publisher. Your account teams are champions for your business, and part of that is stewardship for relevant data. The second thing is to invest in data science within your advertising program. Unraveling intricate problems will often call for technical expertise, and consumer behavior grows more sophisticated with every technological advance. And finally, AI and machine learning are already being infused throughout the space to help marketers collect, analyze and leverage massive amounts of data to reach future customers in better ways.

The post Leveraging data science to illuminate the modern consumer decision journey appeared first on Search Engine Land.

Analyze data distribution more accurately with time series /analyze-data-distribution-more-accurately-with-time-series-308381 Mon, 26 Nov 2018 12:30:04 +0000 /?p=308381 Using a time series layer to histograms and box and whisker plot visualizations can add true diagnostic value. This is part three of a three-part series about Bing's data distribution tools.

The post Analyze data distribution more accurately with time series appeared first on Search Engine Land.

Welcome back to this series on data distributions, as part of the larger initiative to evolve your analyses with the Bing Analytics Group. Before we move onto using distributions to understand changes in your data over time, let’s recap what we learned in part one and in part two.

In part one of this series, we explained why taking averages at face value can be misleading, and leave us with an incomplete understanding of what’s going on in an account. We established that using data distributions is an effective way to control for that possibility, and then we covered on how to analyze a data distribution using a histogram as a visual aid.

In part two of this series we examined the same set of data using a box and whisker plot.

And we left off with the declaration that a graduate of the first two parts of this series should be able to identify the relationship between these two visualizations of the same data.

With this baseline knowledge firmly tucked into our belts, we move into the realm of using data distributions as time series. While there are some excellent ways to incorporate histograms and time series, none are immediately available in Microsoft Excel.

First things first, in order to get the most granular understanding of our distribution as possible, we’ve been segmenting our performance reports by keyword and by day, but now we’re going to add another layer to the time grain: month.

Before we get into the distribution views again, let’s visit an example of some conventional business intelligence about CPCs over a period of six months.

A likely analysis of a view like this would be something like, “There were relatively stable CPCs between November and February, before encountering pricing volatility in March and April.” That’s all fine and well, but we’re leaving a lot of information on the table by using averages instead of distributions.

So let’s turn these summaries into distributions.

At a glance, one thing jumps out immediately, and that’s the behavior of the outlier CPCs in April ’18. In the five months before that, outlier behavior was pretty consistent, with an upper threshold of around $50. In April this advertiser suddenly saw several instances of a keyword with CPCs over $60, and ranging up to $100, which is certainly an item of interest for optimization.

However, the presence of the outliers are skewing the y-axis, and making trends within the quartiles difficult to ascertain. In order to elucidate that quartiles a little bit better, remove the visualization of the outliers. This is made easy in Excel. Right click your plot, select “format data series,” and then uncheck the “Show outlier points” box.

This is the same data, outliers removed. Note the top of the y-axis now caps out at 20, where before it ranged to 120.

We can immediately see that the fourth quartile range is the most sporadic from month to month, and the third quartile range is also more volatile than the first or second quartile ranges. Importantly, the median CPC is consistently lower than the mean CPC, which is owed to the influence of the fourth quartile range and the outliers. Furthermore, remembering that the “x” represents average CPC, the top threshold of the fourth quartile range appears to have a distinct relationship with average CPC.

This is a good example of how looking at distributions provides the advertiser with more information that has true diagnostic value than the summary mean.

On behalf of the Bing Analytics Group, we hope you feel you’ve evolved your analyses with this series. Look how far you’ve come!


The post Analyze data distribution more accurately with time series appeared first on Search Engine Land.

A closer look at Bing’s box and whisker plots to analyze CPC data /a-closer-look-at-bings-box-and-whisker-plots-to-analyze-cpc-data-308362 Tue, 20 Nov 2018 12:30:56 +0000 /?p=308362 The box and whisker visualization offers a view of both the mean and median along with four quartiles to identify statistical outliers. This is part two of a three-part series about Bing's data distribution tools.

The post A closer look at Bing’s box and whisker plots to analyze CPC data appeared first on Search Engine Land.

Today, to build upon our working knowledge of data distributions, we’re going to be analyzing CPC data using box and whisker plots. If you missed the first installment, get caught up on histograms and meet us back here.

data distrution chart

If you’ve finished part one of this series, then the histogram on the left should look familiar. The plot on the right is a box and whisker plot, created from the very same set of CPCs that we used in part one. Hooray for continuity!

First, let’s ground ourselves in some basics. Because we are not segmenting our data in any way, and therefore using only one distribution, the CPC value will be expressed on the y-axis, and the x-axis will be null.

Now, let’s go through the components of the box and whisker plot. First off, the x.

This x represents the mean value of the distribution, which you’ll recognize as the simple average often associated with your search data. For the purposes of this exercise, the X is your average CPC. To that end, the line in the middle of the box represents the median.

While getting both the mean and median of the distribution in the visualization is a wonderful feature of the box and whisker plot, the four quartiles can help divine a lot of information that we can’t get at through a histogram.

The bottom threshold of the box (or left-most threshold for a horizontally justified plot)  is the lower quartile, or first quartile, or Q1, and it represents the number such that 25 percent of observations are less than it and 75 percent are larger. In this context, think of an “observation” as a single data point.

The top threshold of the box (or right-most threshold for a horizontally justified plot) is the upper quartile, or third quartile, or Q3, and it represents the number such that 75 percent of observations are less than it, and 25 percent are larger.

Following this same notation, you can also infer that the median serves as the second quartile, given that 50 percent of observations are greater, and 50 percent are lesser.

This can admittedly becoming a little confusing to keep track of. We’ve found that something that helps with intuition is to think of the quartiles as possessing ranges, and remembering that each range contains roughly a quarter of the total data points in the data set. Perhaps this pursuit would be frowned upon by the statistician purists of the world, but we take a bright view of whatever helps you learn. Hopefully the visual below helps conceptualize.

Now we’re getting somewhere, right? We can observe that the first three quartile ranges of this distribution have a pretty comparable range of values. But the fourth quartile range is a much broader stroke. For this advertiser to lower their CPCs, a focused and precise tactic would be to isolate keywords that fall within that fourth quartile range, and modify the attendant bids.

Alright, but what about those dots?

Data points that render as individual dots can be considered statistical outliers in the context of a data distribution. In our hypothetical scenario, the advertiser is looking for tactics to mitigate CPC cost. In addition to the fourth quartile range, this advertiser should investigate the keywords responsible for these outlier values, and act accordingly.

Hearken back to part one of this series for a moment, and recall that our distribution is right tailed, meaning that the skew is towards values that are greater than the median. Knowing what you know now about both histograms and box and whisker plots, you should be able to intuit the relationship between these two visualizations of the same data.

In the final part of this series, we’ll explore using distributions to identify changes in your data over time.



The post A closer look at Bing’s box and whisker plots to analyze CPC data appeared first on Search Engine Land.

Bing histograms reveal better business intelligence metrics with data distribution /bing-histograms-reveal-better-business-intelligence-metrics-with-data-distributions-308343 Mon, 19 Nov 2018 17:19:19 +0000 /?p=308343 Simple average is ok but histograms of your CPC data can help to better understand outlier data points. This is part one of a three-part series about Bing's data distribution tools.

The post Bing histograms reveal better business intelligence metrics with data distribution appeared first on Search Engine Land.


In the field of business intelligence, and specifically, the BI extracted from search performance, averages are ubiquitous. Cost-per-click, cost-per-acquisition, and average position are metrics that should immediately come to mind, but others such as average order value lay in the weeds as well.

There’s nothing inherently wrong with a simple average, but in many cases they can be useless or misleading because of their susceptibility to extreme influence by outlier data points. To briefly illustrate the point, consider a portfolio of ten keywords. Nine of those keywords have one click each, all at the cost of $1. The tenth keyword also has one click, but this one came at a price of $6. This brings the average CPC of the portfolio to $1.50, which is an obfuscation of a lot of important information.

Of course, portfolios are generally much larger than ten keywords, and with scale the opportunity for averages to muddy the waters of your analyses also grows. As such, the aim of this three-part series is to help you become comfortable thinking about your data regarding distributions, which will help bring more information and context to your business intelligence metrics, and help you depend less on averages.

Let’s start by highlighting the difference between a summary view and a distribution view, moving forward with CPC as an example. Below is a standard method for visualizing CPC performance for a single month.

bing analytics chart

But we can immediately unlock a lot of information about this month by segmenting the keyword report we pull down from the Bing UI by day. Since we’re working with CPC data, we’ll want to remove any line items from the Excel file that have 0f clicks. Once we do that, select all your CPC data for the month, and create a histogram.

Our resulting plot is below:

bing analytics chart

The histogram is a common visualization for data distributions. It features a binned x-axis, which means that each tick on the axis represents a range of values. Each time a value is represented in the dataset, it is binned accordingly. The cumulative count of values within a given range is called frequency and is represented on the y-axis.

bing analytics chart

Next, calculate the mean and median of your CPC data. In Excel, achieve this using the =AVERAGE() function for mean and the =MEDIAN() function for the median.

Remember that our Average CPC for the month was $6.82. Our median CPC comes in at $6.01. That’s a whopping $0.81 difference and an absolutely valuable piece of information for this advertiser.bing analytics chart

The gap between the mean and median CPC is caused by the right-skew of the distribution. The farther along the tail a value is, the more that value is capable of influencing the mean. All data points have an equal influence on the median.

Before we looked at this distribution of CPCs throughout one month, all we knew was that the average click cost was $6.82. Now we understand that the advertiser had a much higher probability of receiving a click in the $4.20 to $6.30 range than they did in the $6.40 to $6.90 range.

Histograms are just the tip of the iceberg when it comes to understanding data distributions. In the next part of this series, we’ll explore this same dataset using a box and whisker plot.

The post Bing histograms reveal better business intelligence metrics with data distribution appeared first on Search Engine Land.

m88 asia