Data Mining is the process of extracting useful information from a large set of data. Market Basket Analysis is a technique of data mining which discovers an association between items with another. Market Basket Analysis refers to a process or technique, which identifies a customer's buying behavior or purchasing pattern, i.e. the items which are bought together by a customer in a single shopping cart. Market Basket Analysis is also termed as Association rule learning and another name for this technique is affinity Analysis. The main purpose of Market Basket Analysis is to extract the purchasing pattern of customers so that it increases the business efficiency and assists the retailers in making the decision regarding business in a profitable direction, increasing sales and make marketing strategies to compete with competitors. The main challenge for leading supermarkets is to attract a good number of customers, which can be done with the help of a data mining technique that is association rule mining. The frequent item sets are mined from the market basket to generate and after generation of the frequent items, the strongly associated item sets are generated with the help of support and confidence. This paper presents a recent survey of a supermarket for generating association rules to examine the customers’ buying or purchasing behavior.
Association Rule Mining is used to identify the association or relation between two or more itemsets from a large set of data items or with respect to Market Basket Analysis Association rule mining that provides a set of customer's transactions of a set of items and generates the frequent itemsets, which are concurrently appearing together. Mining of association rules for any retail shop or supermarkets is also called as Market Basket Analysis. Market Basket Analysis is a method or technique of data mining which focuses on extracting the buying behavior or the customers, i.e. if a customer buys a packet of tea then there is a high probability that the same customer will buy a packet of sugar in the same transaction, where tea is known as antecedent and sugar is consequent, so there is some association between tea and sugar. So the association rule would be tea=>sugar with some percentage of probability %c, here c denotes confidence. This association helps in evaluating the dependency between the two or more items in a customer's transaction.
Market Basket Analysis is a technique of data mining. It consists of collecting or analyzing the transactional database of a supermarket or any retail shop. One of the major tasks or focuses of any company who has invested a large amount collecting or organizing customer data is to mine the data accordingly to get some fruitful analysis to build business strategies and gain some economic profit by that analysis. The result of the analysis assist the retailer or the storekeeper to get an idea that what all items a customer buys together and then all those items should be kept in one particular location so that it would be easy for customers to get those items. This would increase the sales of the store as customers will be satisfied with the management of the store.
Earlier there were no supermarkets, only a small Rashaan shop was present for the customers. But now people like to buy their daily goods from available supermarkets which are actually well organized and no need for negotiation products with a fixed price. But the main problem most of the storekeepers or retailers faced till date is how to place the products according to customers’ need or according to demand. So they do not have an idea or knowledge of how to place the product or which product should be placed together so that the customer feels ease to find them. So this problem can be eliminated by the application of Market Basket Analysis which assist in products placement, i.e. placing the related items together or close to each other, pricing strategies, providing offers on associated products.
In today's competitive world, this is important for the industries to know customer purchasing behavior. For fulfilling this purpose, Market Basket Analysis is used, where we get the associations between the products.
Mueller (2005) had stated that frequent itemsets can be on the basis of support and confidence and generate the association rules. He uses Java language for the analysis and Apriori algorithm to generate the frequent itemset and to store in the file.
Brin, Motwani, Ullman, and Tsur (1997) show a new way to find association rules. Ngai, Xiu, and Chau (2009) have focused on techniques that are helpful to retain customer retention. Shaw, Subramaniam, Tan, and Welge (2001) emphasis on customer relationship management.
Raeder and Chawla (2011) explain the properties of a network of products and show the communities which are detected. These networks uncover an expressive relationship between the products that are difficult to find with association rules. Then he studies the association rule network ad center-piece subgraph techniques applications to the market basket problem. And after the study, they propose a general framework for the mining of unseen market basket data in the absence of background knowledge.
Gupta and Mamtora (2014) provide a survey on existing algorithms of data mining for Market Basket Analysis. They applied association rule mining to generate the frequent item sets and use Apriori algorithm for generating the rules.
Dhanabhakyam and Punithavalli (2011) have focussed on a survey about the existing data mining algorithm for Market Basket Analysis.
Kaur and Kang (2016) have explained that data mining is important nowadays for the competitive market. They describe the techniques of mining and proposed an algorithm using association rule mining, which works on the concept of model changing, where the dynamicity of data is basically examined in every time of the mining of data.
Kurniawan, Umayah, Hammad, Nugroho, and Hariadi (2017). have created an application to explore Market Basket Analysis.
Association rules are the basic technique for data mining and mostly used for finding the frequent itemsets in the Market Basket Analysis. When association rules are applied for products, it extracts relationships between different products that the user purchases frequently together or the frequent product that is appeared together in a user's purchase. Association rules are in the form of milk → bread or milk, bread → butter. Milk → bread states that if a customer purchase milk, then it has a major probability that he/she will also purchase butter. And milk, bread → butter states that if a customer has purchased milk and bread then there is a probability that user will also buy butter. To discover the Association rules, the Apriori Algorithm is used.
The association rule mining is divided into two problems or we can say two tasks
The first problem specified is more complex than the second one as it is an expensive task to identify all the frequent patterns from the large dataset for the data that contains N items, then possibly there would be 2N item sets. Mostly, the itemsets that are in the databases are not up to 2N, which are much less in numbers. So the brute force technique used wastes too much efforts and time as it requires exponential time to obtain the set of item sets.
The second problem can be handled and rules are generated efficiently in a specific reasonable time.
The two key points are supported count and confidence for generating association rules. The selection criteria for association rule is based on the support and confidence specified, i.e. the association rule are generated by evaluating the support and confidence by given minimum support count and minimum confidence.
Support: Support is used to determine how frequently an item is appearing in a large dataset. The formula for how many times the x and y together appear in a given data sets of n transactions is as follows:
Confidence: Confidence is used to weed out the item sets that do not fulfill the specified threshold value and then generate strong rules. The confidence is used to determine the number of times the if/then statement is true.
For x ⇒ y
This rule indicates that what is the probability that if x appears then y also appears. So to calculate the confidence, the following formula is used:
Orange is an open source tool for analytical study and generating association rules. It can be used by novice and expert. It was developed in a Bioinformatics Lab of the faculty of Computer Science, Slovenia. It provides an interactive workflow environment. Data Mining can be performed by using Visual programming and also by the python scripting language. Orange provides multiple widgets for user ease, which gives a graphical user interface. Widgets are available for data preprocessing and data entry, classification, association, and data visualization.
Apriori algorithm is a great transformation in the history of association rule mining. This algorithm is proposed by Rakesh Agarwal and Ramakrishnan Srikant in 1994. Apriori algorithm overcomes the limitation of Ais algorithm. The Ais algorithm was modified and renamed as the Apriori. It is used in generating the frequent itemsets and the corresponding association rules on basis of that from the transactional database. The algorithm's name Apriori was derived on the basis of the fact that the algorithm uses prior knowledge to generate the next stage of frequent patterns. Apriori's iterative approach is also known as a level-wise search. The Ais Algorithm was a simple algorithm that needs many passes across the database, producing many candidate itemsets and reserving counters of every candidate, where it mostly appears as infrequent. In Apriori, first of all, all the frequent itemsets, defined as 'L1'.L1 is defined L2, then the frequent itemset L2 is used to find the L3 frequent itemsets and so on, till no more frequent itemset K is found. Because of two reasons, the Apriori algorithm is much efficient than A is at the time of candidate generation, First is different candidates generation and second is the new technique pruning.
Apriori Algorithm searches frequently purchased products level wise, it contains the database which consists of a large amount of data. This can identify the web pages uniquely searched by each user.
Apriori contains four steps for generating the candidate itemsets:
On example of Market Basket Analysis through Apriori Algorithm is given below. Figure 1 shows the process steps for finding rules.
Figure 1. Flowchart of Generating Rules
The data were taken for the experiment Figure 2. contributes for displaying workflow environment on Orange.
Figure 2. Workflow of Association Mining
Frequent Itemset having 50% minimum support count:
Figure 3 shows complete dataset on which task is performed.
Figure 4 shows the list of all itemsets. Figure 5 shows Association Rules on the basis of 50% support count and 40% confidence.
In this paper, Market Basket Analysis on Apriori algorithm is experimented. Market data has taken for analysis and generating frequent itemsets and association rules. 22 grocery products have taken for performing the experiment, and 40 different sets of these products. Data mining tool Orange is used for performing this experiment (by Apriori algorithm). This experiment is useful for increasing the sales of products by analyzing the buying behavior of customers of a different group. Association rules are generated on the basis of frequent itemsets identified, by calculating support and confidence, i.e. what all products are purchased together by customers frequently. For example, we have taken 50% support and 40% confidence for generating association rules and have identified that Masurdalis frequently purchased with Garam Masala.
This research paper primarily deals with finding the frequent items, which are purchased together by customers in the market. This includes the Market Basket Analysis to find the customer purchasing behavior. Using Apriori algorithm, we can find the support and confidence and on basis of support and confidence, we can calculate the frequent itemsets and the associated rules of them. For this calculation, the Orange Tool (data mining tool) is used. Market Basket Analysis is useful for markets, as by analyzing the customers’ buying behavior, companies can easily increase the sales of the most popular and frequently purchased products. In future, more efficient approach can be used with huge amount of data.