May 2, 2022   Dashboards / Marketing /

Power BI: Basket Analysis

In this tutorial, I will teach you how to run a Basket Analysis in Power BI. 

The Basket Analysis uncovers the products bought together by consumers. In the context of a grocery store, for example, this analysis might show that whipped cream and berries are frequently bought together. 

Based on this result, you can:
– place the products close to each other (if in a physical store) to encourage cross-selling
– suggest the second product when the consumer places the first product in the cart (if in an online store)
– sell the products together
– offer discounts for buying the products together

This analysis is not limited to products. The basket analysis is also useful to explore the visited pages of a website, the questions raised in support tickets, and even the dishes ordered in a restaurant.

Below is what we will build together: a Basket Analysis Network and a Basket Analysis Chart.

Three key concepts are necessary to understand for basket analysis in Power BI: Support, Confidence, and Lift.

Support

This is the easiest of the three concepts. It is the percentage of transactions that include two specific products.

Using the example above, whipped cream and berries appear together in 0.9% of transactions therefore their support is 0.9%. In other words, on 1000 transactions there are on average 9 transactions with whipped cream and berries together.

Support is important for two reasons:

– support should be high enough to make the cross-selling opportunity financially interesting. Cross-selling products with very small revenue is probably not worth the effort

– product combinations that happen rarely (low support) should not be used because statistical analysis is not reliable on rare combinations.

Confidence

Confidence is the percentage of transactions that contain the two products together, out of the transactions containing one of the two products.
As a consequence, there are always two confidence numbers for each couple of products.

For the relationship between whipped cream and berries:
– 12.6% of consumers bought whipped cream and berries, out of all consumers that bought whipped cream
– 27.2% of consumers bought whipped cream and berries, out of all consumers that bought berries

Confidence is useful because it gives the direction of the cross-selling. We can make the hypothesis that it's easier to sell whipped cream to someone buying berries (whipped cream is a nice addition to berries), compared to the opposite. 

Confidence, however, does not give us the strength of the relationship between the products. One product might be a very common purchase so that it generates high numbers for confidence, independently from the second product.

Lift

Lift gives the strength of the relationship between a basket of two products. 

  • A lift close to 1 indicates that there is no important relationship between the products.
  • A number higher than 1 (positive relationship) indicates that the consumers buy the basket more frequently than it would happen by chance
  • A number lower than 1 (negative relationship) indicates that the consumers buy the basket less frequently than it would happen by chance

Whipped cream and berries have a Lift of 3.8, meaning that consumers buy them together about 4 more times than it would happen by chance. This is a quite strong relationship.

To recap:

  • Support indicates the frequency of the products being sold together. A high support means that the basket analysis statistics are more reliable. Additionally, higher support makes the basket of products more interesting financially.
  • Confidence indicates the direction for cross-selling
  • Lift indicates the strength of the relationship between the products

For a more detailed explanation of Support, Confidence and Lift you can check out the blog post here.

Let's now move to Power BI for the analysis.

Load the basket analysis data to Power BI

For this tutorial, I use the sample dataset available on the Salem Marafi blog. You can download the sample grocery dataset from here, to follow along with the tutorial.

Load and add an Index column

First, load the CSV file in Power BI by going to Get Data and then Text/CSV.

Get data CSV Power BI

If your Power BI looks different, it's because I switched to the Power BI Updated Ribbon. 

You can activate the same ribbon by updating to the latest Power BI version, then going to File -> Options and Settings -> Options -> Preview Features and choosing Preview Ribbon. More info on that here.

Below is what the file looks like after you load it.

Groceries dataset for Power BI Basket Analysis

There is one row per transaction, and the columns have the products bought in the same transaction. For example, the transactions at row 10 contain whole milk and cereals.

The first step is to add an index column; the index column will allow us to analyze the transactions separately.

Go to the Add column and click on Index Column.

Add Index column

The result is the same table with an added index column. It doesn't matter if the index column is at the beginning or at the end.

Groceries dataset with Index column

Unpivot the table and remove unnecessary columns

The next step is to unpivot the columns with the products. Right-click on Index and choose Unpivot Other Columns.

Unpivot Other columns

The result is a table with three columns: the first column has the transaction index, the second one has the product number and the third one the name of the product.

Groceries table unpivoted

The second column (Attribute) is useless so you can delete it. Right-click on Attribute and choose Remove. 

The Value column also has empty rows to delete. Click on the arrow next to it, then deselect (blank) and choose ok. This removes empty rows from the table.

Remove empty rows

The final step for the data loading is the rename the columns. Rename the first column to 'Transaction ID' and the second column to 'Item'. You can rename the columns by double-clicking on the column header.

Groceries dataset with renamed columns

We are done with the data preparation! Click on Close & Apply to go back to Power BI.

You should see the 'groceries' table loaded on the right side of the screen.

Groceries table

Create the basket analysis table in Power BI

The Basket Analysis table has one row for each combination of two products. Furthermore, the table has one column for each one of the indicators described above: Support, Confidence, and Lift.

The Basket Analysis table is created with a DAX formula. Click on Home, New Table.

New calculated table Power BI

In the textbox that appears, enter the following formula:

Basket analysis = 

FILTER(
CROSSJOIN(VALUES('groceries'[Item]),SELECTCOLUMNS(VALUES('groceries'[Item]),"Item2", [Item])),
[Item]>[Item2])

There are two parts to the formula: FILTER and CROSSJOIN. CROSSJOIN generates the table by creating one row for each couple of items. Specifically, it creates one row for whole milk plus citrus, one row for whole milk plus coffee, one row for citrus plus coffee, etc.

FILTER removes the duplicate rows. 

We now have a table with all possible combinations of products.

Basket Analysis table

The following step is adding a calculated column with the concatenation of the two items. Click on New column (next to New table) and enter the DAX formula:

Basket = [Item]&" – "&[Item2]

This is the result.

Basket Analysis table with added column: concatenation of items

Now let's move to the interesting part: the indicators formulas.

Add the Support column

The Support of the basket tells us the percentage of the transactions containing the two specific items. 

Formula for Support - Basket Analysis

Here is DAX the formula:

Support basket = 

var item1=[Item]
var item2=[Item2]

var transactionsWithItem1 = SELECTCOLUMNS(FILTER('groceries','groceries'[Item]=item1),"transactionID",[Transaction ID])

var transactionsWithItem2 = SELECTCOLUMNS(FILTER('groceries','groceries'[Item]=item2),"transactionID",[Transaction ID])

var transactionsWithBothItems = INTERSECT(transactionsWithItem1,transactionsWithItem2)

RETURN COUNTROWS(transactionsWithBothItems)/ DISTINCTCOUNT(groceries[Transaction ID])

There are several variables in this formula: transactionsWithItem1, transactionWithItem2, and transactionsWithBothItems.

transactionsWithItem1 returns a table of all transactions including the first product. transactionsWithItem2 does the same with the second product. transactionsWithBothItems returns a table of the transactions including both products. 

Finally, COUNTROWS counts the number of transactions with both products and DISTINCTCOUNT counts the total number of transactions.

Now we move to the Confidence columns.

Add the Confidence columns

Confidence is equal to the percentage of transactions with both products (called Support of the basket) divided by the percentage of transactions with only one product (called Support of the product).

Formula for Confidence - Basket Analysis

Below is the DAX formula for the Confidence of product one for the basket. The logic is exactly the same for product two.

Confidence Item 1 -> Basket = 

var item1 = [Item]
var numberOfTransactions = DISTINCTCOUNT(groceries[Transaction ID])

var supportItem1 = COUNTROWS(FILTER('groceries','groceries'[Item]=item1))/ numberOfTransactions

RETURN [Support basket]/supportItem1

Add the Lift column 

The formula for the Lift is the following.

Formula for Lift - Basket Analysis

Let's take the example of unrelated products: for those products, the percentage of times the products are bought together (support of basket) should be similar to the percentage of times product 1 is bought, multiplied by the percentage of times product 2 is bought.

In this case, the numerator and denominator in the formula have similar values and therefore the value of Lift is close to one. A value of Lift close to one indicates no relationships.

When Lift is higher than one, the two products are bought together more times than would happen by chance. A lift of 4, for example, indicates a strong relationship: the products are bought together four times more often than would happen by chance.

Below is the DAX formula for Lift.

Lift = 

var item1 = [Item]
var item2 = [Item2]

var numberOfTransactions = DISTINCTCOUNT(groceries[Transaction ID])

var supportItem1 = COUNTROWS(FILTER('groceries','groceries'[Item]=item1))/numberOfTransactions

var supportItem2 = COUNTROWS(FILTER('groceries','groceries'[Item]=item2))/numberOfTransactions

RETURN [Support basket]/(supportItem1*supportItem2) 

Now we have all three indicators, and we can move on to visualizations.

Basket Analysis table with Support, Confidence and Lift

Add the Basket Analysis Network visualization to Power BI

The network chart is not a standard visualization so it needs to be imported from the marketplace.

Power BI Import from marketplace

Search for Network and add the Advanced Network Visual (Light Edition).

Power BI - Add the network visual for basket analysis

Add the Network Chart to your report and configure it this way: drag and drop Item and Item2 in Nodes, then Lift in Measure.

In the filters, add a filter on both Lift and Support basket. In this tutorial, filters of 2.2 for Lift and 0.6% for Support basket work well. The correct filters, however, depend on the data analyzed.

Configure the Network visual

Below is the result!

You can use the bar on the left to zoom in and out. The size of the bubbles is equal to the Lift.

There is one thing to be careful about: when a product has connections with multiple products, the number shown is the sum of the Lift for the various connections. This number is meaningless.

For example, the chart shows a Lift for the whipped cream of 16: this is the sum of the Lift between whipped cream and butter (2.6), between whipped cream and curd (2.7), etc.  

Even if the number itself is meaningless, a bigger number means that the product has important Lift (relationships) with other products therefore the product deserves further analysis.

Now on to the final visualization.

Add the Basket Analysis Chart visualization to Power BI

The Basket Analysis Chart is a traditional Scatter chart, with the filters on Lift and Support Basket as before.

Configure the Basket Analysis visual

You can add the Confidence columns in the Tooltip section so you are able to see Confidence information when you hold the mouse on a chart point.

Add the confidence columns to the Tooltip

And here is the result!

You can now analyze your product sales and find cross-selling opportunities! I hope this article was helpful to you.

Enjoyed the article? Subscribe to the newsletter and download the Power BI of the article for free.

    Share this post

    AUTHORLuca
    I help SMEs who want to get a clear view on what drives their profit. After becoming a CPA and 10 years in Corporate Finance between the US, Europe, and Asia I discovered my real passion. I love helping Small and Medium Companies figure out how to improve their profit and be sustainable.

    Join the discussion

    34 Comments

    • Felix says:

      there is a bracket missing in line 6 of the Lift Dax formula
      var numberOfTransactions = DISTINCTCOUNT(groceries[Transaction ID])

    • Stephan says:

      The use of intersect to calculate the support seems to provide the possibility of the "confidence of product 2" levels larger than 100%.
      Since INTERSECT(transactionsWithItem1, transactionsWithItem2) provides different results than INTERSECT(transactionsWithItem2, transactionsWithItem1).

      • Luca says:

        Thank you Stephan for your comment.

        To ensure I reply correctly, I went to review the specifications for the INTERSECT formula in DAX (https://docs.microsoft.com/en-us/dax/intersect-function-dax).

        To my understanding, INTERSECT(transactionsWithItem1, transactionsWithItem2) provides different results than INTERSECT(transactionsWithItem2, transactionsWithItem1) only if there are duplicate rows in either transactionsWithItem1 or transactionsWithItem2.

        transactionsWithItem1 should be a list of all transactions which include item 1, with no duplicates. And the same for transactionsWithItem2.

        I imagine that you have this problem because, on your 'groceries' table, items appear multiple times for the same transaction. Is that the case?
        For example:
        Transaction 1 | Bread
        Transaction 1 | Water
        Transaction 1 | Bread

        If that is the problem, you can remove those duplicates by using DISTINCT in the formulas for transactionsWithItem1 and transactionsWithItem2.
        Does this help you?

        Luca

        • Luca says:

          I add one comment to your question because you made a very useful point.

          The tutorial above does not take into account the number of products bought in each transaction, it only looks at which products are bought.
          So it will consider these 2 baskets the same:
          basket A with 1 bread and 3 waters
          basket B with 3 breads and 1 water

          I made this choice to keep the article simple, and to avoid having a negative impact on the performance of Power BI.

          Hope this is helpful!

          Luca

        • Stephan says:

          Hi Luca,

          Thank you for this extensive answer.
          Indeed, although I did not expect any duplicates in my file there were in fact.
          So adding DISTINCT really solved this issue.
          Now, the next question I am going to analyse, why would some of these clients buy the same product? Or is this a data error?

          Thanks again for the answer.
          Stephan

          • Sosa says:

            Hi I have two question,
            1. is it normal not having any kind of percentage for some of the concatenated products (Basket column) in my Support basket column?

            2. The total sum of the Support basket column should be 100%?

            thanks a lot.

            • Luca says:

              Hi Sosa,

              thank you for commenting.

              1. It is not normal. The basket column is already filtered to show only the couples of products appearing in the dataset.
              So all concatenated products should have some values.

              2. The sum of the support does not need to be 100%.
              I'll make an example:
              Transaction 1 has products A and B
              Transaction 2 has products A, B, and C
              Transaction 3 has products B and C

              Support for the basket A|B is 2/3 (2 transactions with A and B divided by 3 transactions total) = 67%
              Support for the basket A|C is 1/3 = 33%
              Support for the basket B|C is 2/3 = 67%

              In this case, for example, the sum of the support is 167%.
              The sum of support is in general not meaningful, so no need to look at it or calculate it.

              Does this help you?

          • Luca says:

            Hi Stephan,

            I imagine that in your dataset, items are duplicated when a customer buys several units of one item in the same transaction.

            For example, your dataset looks like this for a basket with 2 loaves of bread and 1 water:
            Transaction 1 | Bread
            Transaction 1 | Water
            Transaction 1 | Bread

            This is not wrong, but there is additional information here (how many breads were bought) that is not needed by the Basket Analysis algorithm.
            That is why you can use the DISTINCT formula to remove this unnecessary information.

            Does this help you?

            Luca

    • Jeff says:

      Hi there,

      Great article, very helpfull for a newbie in PowerBI like me. 🙂

      But, I think you made a little mistake in the crossjoin function for the Basket Analysis function. If not, at least, I need a slightly different output than yours.

      Transaction 0 contains 4 products and no whole milk. But in this output table, the products are combined with whole milk which has an another transaction ID.

      In my analysis I need an output like this.

      Input:
      1 Product A
      1 Product B
      2 Product X
      2 Product Y
      2 Product Z

      Output
      Product A – Product B
      Product X – Product Y
      Product Y – Product Z
      Product X – Product Z

      So, I need basket combinations whitin a certain transactionID. And in the example above, Product X may not be combined with Product A.

      • Luca says:

        Hi Jeff,

        Thank you for checking out the article and for your comment.

        If I understand your issue correctly, you don’t want to have the couple ‘Product A – Product X’ in the table because these two products are never purchased in the same transaction. Is that correct?

        If yes, what about filtering the table based on the support of the basket?
        If the 2 products are never bought together, the support of the basket will be zero.

        Does this help you?

        If your point of removing empty baskets is about reducing column cardinality and improving performance, then I would recommend moving the algorithm from DAX to Power Query.
        Depending on your needs, you could also call Python or R code from Power Query to do the ‘heavy lifting’ of the calculation.

        I use DAX in the article because I think it’s simpler and even with its simplicity, it allows to create very insightful analysis.

        I hope the above is helpful. Feel free to comment again your thoughts

        Luca

        • Jeff says:

          Hi Luca,

          Thanks for your reply! Very helpfull. 🙂
          You are right, filtering on support of the basket = 0 will work for me.

          Have a nice weekend!

    • Pete says:

      This is very helpful – thank you. One question – could you conceive of a way to split the basket analysis into categories?

      I'll try to explain – say we have 6,000 products across 10 equally populated categories. The table created at a full product level contains (6000 x 5,999) / 2 = 18m rows. Instead, I only really care about basket analysis within categories. And so rather than 18m my required table might contain 10 x (600 x 599) / 2 = 1.8m rows.

      I realise this is pretty specific but would love any DAX pointers please!

      • Luca says:

        Dear Pete,

        thank you for checking out the tutorial.
        Regarding your question: yes, it's possible to only look at categories instead of products.

        For example, you can change your basket analysis table to include categories instead of products. The DAX new formula for the basket analysis table would look something like this:

        Basket analysis =

        FILTER(
        CROSSJOIN( VALUES('groceries'[Category]),
        SELECTCOLUMNS(VALUES('groceries'[Category]),"Category2", [Category])),
        [Category]>[Category2])

        An additional step is needed in your case: to remove the categories that appear more than once in the same transaction. This happens when a transaction has two products belonging to the same category.

        You can fix that by using DISTINCT in the formulas for transactionsWithItem1 and transactionsWithItem2 (feel free to check out the question above by Stephan on using DISTINCT).

        Does this help you?

        Another approach to reducing the size of the final table is filtering: for example, show only product baskets with a Support above 1%. Or show only baskets with a Lift above 1.2.
        This will allow you to keep the product detail, and at the same time only look at relevant baskets.

    • Pete says:

      Luca, thank you so much. My apologies – I should have made clear that I'm looking for basket analysis of products within the categories. So ideally the table would read;

      Category Item 1 Item 2
      A a b
      A a c
      A b c
      B x y
      B x z
      B y z

      So in this example there are 6 rows 2 x ( 3 x 2 ) / 2 rather than 15 rows (6 x 5) / 2.

      This may be one way but it feels a bit bulky!

      FILTER(
      CROSSJOIN(
      SELECTCOLUMNS(
      Sales,
      "CategoryA",
      Sales[Category],
      "ProductA",
      Sales[Product]
      ),
      SELECTCOLUMNS(
      Sales,
      "CategoryB",
      Sales[Category],
      "ProductB",
      Sales[Product]
      )
      ),
      [CategoryA] = [CategoryB]
      && [ProductA] [ProductB]
      )

      • Pete says:

        Plus this does seem to throw up lots of duplicates

        • Luca says:

          Hi Pete,

          You can modify the formula for the Basket Analysis table as follows:

          Basket analysis =

          FILTER(
          CROSSJOIN(
          DISTINCT( SELECTCOLUMNS('groceries',"item",[item],"category",[category]) ),
          DISTINCT( SELECTCOLUMNS('groceries',"item2",[item],"category2",[category]) ))
          , AND([item]>[item2],[category]=[category2]))

          The additional filter [category]=[category2] will ensure that you only the see basket of products within each category.

          Hope this helps!

          Luca

    • Jordan C says:

      Hi, this helped me so much! Is there a way to perform this same analysis to analyze the relationship between products in a combination of 3 or more products? It may seem redundant or not feasible using this process however the goal with my analysis is to see what could potentially be the next best product to offer to customers who purchase a unique combination of products. For example, this process helped me analyze the unique relationship between chicken and frozen vegetables. I would like to evaluate the specific relationship between chicken, frozen vegetables AND yogurt. Potentially more than a basket of 3 products like chicken, frozen vegetables, yogurt AND root vegetables. Is this possible?

      • Luca says:

        This is a very good question, thank you Jordan for it.

        Yes, it is possible to look at relationships between 3 or more products. You can even push it a step forward and include customers in your analysis (for example: customers under 30 years old that buy frozen vegetables, what other products are they most likely to buy?).

        Regarding the implementation of the analysis, I recommend to do it in Power Query with tools such as Python or R.
        Looking at relationships between 2 products is already computationally expensive (slow), and moving to 3 or more products requires even more time or computing power.

        Both Python and R have optimized libraries for the Basket Analysis, and these libraries will run a lot faster than what could be done with DAX.
        DAX is an amazing language that is easy(er) to learn and works great for developing dynamic measures. However, for more advanced numbers that can be calculated in advance, then Power Query (along with its supported languages Python and R) is a better choice.

        I use DAX in the article because I think it’s simpler and even with its simplicity, it allows to create insightful analysis.

        If you need more specific help on this, feel free to contact me at lc@finance-bi.com. I can try to give you some more advice or help.

        Whatever you decide, I wish you best of luck and an enjoyable end of the year,

        Luca

    • Kostas says:

      Hi, Luca and guys!
      I came up with a similar problem with a bit different orientation.
      I want to cluster my customers, based on the type of product that they buy.
      For example, let's say we have 5 categories of products: Papers, Foods, Drinks, Baby Products, Skincare.
      The final contribution of the above categories on the total business is 45%-20%-15%-10%-10%. My customers (hundreds) have differentiated shares, and I want each of them to be given a "name – cluster" which will indicate his strong category,and which will lead the marketing efforts.("Drinks" customer, or "food and Skincare" customer).
      Ex of customer shares:
      A: 50-30-10-5-5
      B: 25-25-10-30-20
      C: 35-15-20-15-15
      ….
      Is the share of each category on every customer Vs the total shares the right way, and how can I handle this?
      Any other proposal or guidance would be appreciated.
      Thanx
      Kostas

      • Luca says:

        Hi Kostas,

        Thank you for the exciting question.
        I have been wanting to write on this topic, and I am glad your question allows me to do it.

        Here are a few comments to answer your question:

        1- Calculating the share of each category by customer (ex: customer A with shares 50-30-10-5-5), is usually a good pre-processing for this type of data.
        This way, customers will be comparable because the shares will add up to 100 for every customer.

        2- Regarding the clusterization part, assigning to each customer a label with his strong category might over-simplify the problem.
        For example, you might notice that the strong category is Paper for 80%-90% of your customers.
        So you might end up with one cluster 'Paper' which has the majority of your customers, and then several other smaller ones. This is not too helpful, because the majority of the customers are grouped together.

        A more detailed approach would be to also look at the second category. You will then have Paper-Drinks customers, Paper-Food customers, etc.
        This would allow for better segmentation and a more powerful marketing strategy. However, on the negative part, this would generate too many segments. And too many segments will create confusion.

        To remove the disadvantages of the 2 methods above, a good solution is to use a clustering algorithm for example K-Means.
        A clustering algorithm, such as the K-Means algorithm allows you to:
        – choose how many clusters you have (you can choose a small number of clusters, for example 5, to keep the result simple)
        – create the clustering taking all of the categories into account (not just the one or two biggest categories)

        K-Means accomplishes the above by grouping together customers that have similar shares of categories.

        Here is an example of what the result of K-Means would be:
        K-Means clustering example

        This example only shows 2 dimensions (2 axis).
        You can imagine that the horizontal axis corresponds for example to the share of Paper, and the vertical axis to the share of Foods.

        The great part is that this algorithm can work with more than 2 dimensions.
        You can imagine the same chart in 3d with the 'depth' axis corresponding to the share of Drinks. And it is not limited to 3 dimensions either! It can be more than 3.

        Below is a link to a great article on K-Means :
        K-Means Clustering
        The image above comes from this article.

        3- Now for the implementation.
        I would recommend doing the clusterization in Python or R.
        If you wish, you can add the Python or R code to Power Query so that whenever you refresh Power BI, the clustering algorithm will also refresh.
        (Or you might wish to not change the clustering when refreshing the data, in that case you can run Python or R separately from Power BI).

        If you do not have experience in Python and R and you want my support, you can contact me via e-mail lc@finance-bi.com.
        Another option is to look at tutorials online on this topic: there are a lot of very good tutorials on this.

        4- Finally, another option for clusterization is to look at RFM (Recency, Frequency, Monetary).
        This can complement the clusterization by category.

        Recency tells you how long ago was the last purchase.
        Frequency tells you how often the customer purchases on average
        Monetary tells you the average amount the customers spends

        After you calculate Recency, Frequency and Monetary for each customer, you can cluster them using for example the K-Means.

        I hope this helps you!

        Luca

    • Kostas says:

      Luca thank you very much for your time and the extensive and detailed reply.!

      Actually what I had in mind wasn't to give the label to every customer based on his strong category, but on the category which has the biggest variance on the total category's contribution on the total sales. Let's say if the contributions on the total business are 45-20-25-10 and a customer has 40-35-10-15,the variances are -5/+15/-15/+5, so I would name him with the second category. I know that maybe it seems simplified, but to tell the true, I 'm trying to avoid Python (full beginner 😒) and solve the problem with DAX in PBI.
      Of course I understand that trying to solve clustering using just averages is an issue.
      I will give it a try with Python. Thank you very much again for the ability you give to contact with you..!

      • Luca says:

        Hi Kostas,

        it's my pleasure! And I hope that my comments were helpful.

        I did not fully understand about comparing the variances between the total and each customer. Now it's clearer and I think that goes in the right direction.

        The negative part of the variances approach is that it might group together customers that are different in the other categories.
        But the positive part is that the approach and its resulting categories are very easy to understand.

        Even a segmentation algorithm such as K-Means (and more advanced ones) is a simplification versus the real-life.
        So if your approach gives you meaningful clusters, I would say go ahead with it.

        Python is a tool to accomplish something. If the same thing can be accomplished in an easier way, then I'd recommend the easier way.

        Best of luck for this!
        And feel to share your progress, I would be curious to know what you choose to do.

        Luca

    • Rebecca says:

      what else can be used in place of transaction ID in the formular
      or must it be always Transaction ID?

      • Luca says:

        Thank you Rebecca for your question!

        Transaction ID is used by Power BI to understand which items were bought together.
        So feel free to rename it to anything that makes more sense for you: Invoice number, Receipt number, Receipt ID, etc.

        Does this answer your questions?
        I apologize for the late reply

        Luca

    • Daniel says:

      Hi Luca,

      Thanks for this easy-to-use guide for association learning / basket analysis.

      I was wondering if I can filter based on a column in another table? Let's say I implement this into a cube which has another table named Campaign, where I use the column Campaign ID as a filter on 1 campaign ID.

      The question derives from me wanting to monitor campaigns better and associate products bought in that campaign ID.

      (Never mind the fact that data might not be sufficient to derive any information from it.)

      • Luca says:

        Hi Daniel,

        thank you for the interesting question.
        One thing to clarify: do you want to calculate a specific market basket analysis for each Campaign ID? So the numbers will be different for each campaign.
        Or you want to calculate the market basket analysis at the total level and then only filter it by campaign ID, based on the products of each campaign?

        In the first case, you can add a column 'Campaign ID' to the Basket Analysis table. The Basket Analysis will therefore have 3 initial columns: Campaign ID, Item, Item2.
        In the second case, you can create a relationship between the Basket Analysis table and a second table with the columns Campaign ID and Item.

        The complexity of the second case is how to map the Item column of your table with either the Item or Item2 columns of the Basket Analysis table.
        I cannot think of an easy way to map one column to either Item or Item2.
        So if you are interested in the second case, it's possible to do but we might need to rethink the data model.

        Does the above help you?
        If you want to discuss more, you can also write to me by email at lc@finance-bi.com

        Cheers

        Luca

        • Daniel says:

          That certainly did help, I did as you stated in the first scenario and can now filter based on campaign. Thank you!

          • Daniel says:

            Hi Luca,

            After going through the results, I see that the wrong campaign id is put on the baskets.

            What I want is to be able to filter on lets say a campaign named 3 for 2, and get all baskets bought with that campaign ID.

            My code for the table is as followed with a csv file named data and parameters Product, CampaignID:

            Basket analysis =
            FILTER(
            CROSSJOIN(VALUES('data'[Product]),VALUES('data'[CampaignID]),SELECTCOLUMNS(VALUES('data'[Product]),"Product2", [Product])),
            [Product]>[Product2])

            • Luca says:

              Hi Daniel,

              sorry for the late reply. I would need to see your data in order to better understand it.
              In case you still have the problem and need help, we can continue the discussion via email at lc@finance-bi.com

              Regards

              Luca

    • Sophie says:

      I'm using this as an example to do a basket analysis for some other data, with the support data, it's coming up with an error:

      Cannot convert value 'MCED602-21X1 (E)' of type Text to type Number.

      This isn't in the index field, so I'm confused as to why the support basket code would want to convert an item to a number.

      Can anyone shed any light on it?

      • Luca says:

        Hi Sophie,

        can you please share when you received this message?
        After entering which measure into Power BI?

        Regards,

        Luca

    • Emiliano Falabrini says:

      Thanks a lot, this tutorial was very useful for me.

      I found a way to reduce the size of the pbix file and lower the processing time, obtaining the same result.
      For a 30.000 rows Transaction table, with 3620 different Items, I got a 200 MB pbix file.
      Making the following change, I got the exact same result, but the size of the pbix file is now 2 MB and it takes 10 time less to process.

      In a nutshell, what I did was obtain the Basket[Item, Item2, Basket] table, with Power query, containing only the pairs Item-Item2 that appear at least once together in a transaction (The other pairs are irrelevant in the Basket analysis table).

      This is how I did it after getting the Transaction table the same way as you explained.
      Continuing with Power query:
      – Create the table TransactionAux, duplicating the Transaction table.
      = Table.SelectColumns(Transaction,{"TransactionID","Item"})
      – Create the "Basket analysis" table starting with Merging Transaction and TransactionAux (inner join by TransactionID)
      = Table.NestedJoin(Transaction, {"TransactionID"}, TransactionAux, {"TransactionID"}, "TransactionAux", JoinKind.Inner)
      – Expand the column TransactionAux in the "Basket analysis" table.
      = Table.ExpandTableColumn(Source, "TransactionAux", {"Item"}, {"TransactionAux.Item"})
      – Rename the column TransactionAux.Item to Item2 in the "Basket analysis" table.
      = Table.RenameColumns(#"Expanded TransactionAux",{{"TransactionAux.Item", "Item2"}})
      – Remove the column TransactionID from "Basket analysis" table.
      = Table.RemoveColumns(#"Renamed Columns",{"TransactionID"})
      – Filter the rows from the "Basket analysis" table with the same logic that you explained.
      = Table.SelectRows(#"Removed Columns", each [Item] > [Item2])
      – Add Basket column to "Basket analysis" table
      = Table.AddColumn(#"Filtered Rows", "Basket", each [Item] & " – " & [Item2])
      – Remove duplicates from "Basket anaysis" table on the Basket column.
      = Table.Distinct(#"Added Custom", {"Basket"})

      That is all the work I did in Power query. Then, I followed the tutorial to create the other columns in the Basket analysis table with DAX.

      Hope this example can help someone else.

      • Luca says:

        Thank you Emiliano for your sharing your solution with everybody!
        Power Query is definitely the way to go as your table gets bigger.

        If Power Query is still not fast enough, then Python (Pandas) might be a good candidate.
        Finally, for massive datasets, I'd probably look into running this in parallel clusters.

        Luca

    1 Trackback or Pingback

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    LN Consulting
    Tired Of Building A Dashboard By Yourself?

    Continue Reading...

    May 2, 2022
    In this post, I will teach you how to build a Pareto analysis dashboard in Power BI. Specifically, the Pareto principle states that 20% of the causes generate 80% of the effects, and vice-versa. We can apply this empirical principle to different topics, from work (20% of the efforts cause 80% of the results) to ...
    Read More
    November 6, 2019
    In this tutorial, I will walk you through creating your own Power BI News Dashboard, which visualizes news about any topic that might interest you. While most of us use Power BI to analyze our own data, the ‘Get Data from Web’ functionality makes it just as easy to analyze external data. The possibilities are ...
    Read More
    November 1, 2019
    In this article, I will guide you through creating your own Power BI dashboard for tracking tickets. While seemingly mundane, tracking tickets is key to the success of the customer service and IT department and impacts the success of a company. If you are looking for an in-depth review of the KPIs for your customer ...
    Read More