Mistakes to avoid when blending data with Google Data Studio

The goal is this article is to highlight and explain some subtleties of Data Studio Data blending. I hope it will help my readers avoid and understand the most common pitfalls.

Only rows with matching keys are returned.

The first thing to known when blending data is that only rows with a key from the primary data source matching a key from the second data source are passed into the blended data. In the example below, you can see that Superman doesn’t appear in the blended data because he is not listed in the first data source. The direct consequence of this is that the order of data source when blending them is crucial.

Data Source blending order

In the example below, we have the same data source as seen previously but blended in a different order. Superman now appears in the list of the blended data.

Missing values

We know that only rows with a key from the primary data source matching a key from the second data source are being passed to the blended data. Looking at the example below, you can see that the first data source contain the client ID 4813, which is not in the second data source. Consequently, rows from client ID 4813 are passed in the blended data with _null_ values for the columns of the second data source (“Source / Medium” and “Revenue”).

Duplicated rows

Blending data can generate duplicated rows and biased your calculation. Looking at the data set below, we can see that client ID 2510 have two sessions (mobile and desktop) for only one purchase. When blending the data sources, the unique row for the client ID 2510 is duplicated in two rows (one row for each session of the second data source). One-time purchase of Client ID 2510 appears now in two different rows. Blended data total revenue is 50 when it should be 35!

Final thought