Introduction

Calculations using R and Excel.

OPIM 390 Introduction to Business Analytics

Term Project – Spring 2018

Opimbank, Inc. is a commercial bank in Turkey that offers many banking products and services, such as time-deposit accounts, various credit cards, personal loans, mortgages, etc. Recently, the bank is interested in analyzing its customers’ spending patterns based on its credit card transaction database. Based on this database, the bank not only knows when, where and how much a customer spends, but it also knows certain demographics of the customer, which may contribute to understanding the customer’s spending pattern and preferences. The bank wants to analyze this dataset and use the results of this analysis in estimating the likelihood of customers’ spending money where and how, which can then be used to predict customers’ responses to campaigns and promotions.

As an analyst employed at Opimbank, your main job is to analyze the database, which consists of customer information and the matching transaction records, and come up with descriptive statistics as well as predictive results. Because you have taken an introductory course on Business Analytics, you have knowledge of tools and techniques that you can use for this purpose. You have already acquired a dataset from the bank’s datamart and you are now ready to use your analytics skills.

Generally speaking, there are two main tasks you would like to perform on the available dataset. First, you would like to understand the nature of this dataset by creating a variety of descriptive statistics and visualizations. You would like to find answers to questions such as:

– What is the distribution of customers according to the given demographics, such as age, education, income?

– What is the distribution of spending by merchant category, customer demographics, etc.? Secondly, you would like to run predictive models on this dataset to make estimations such as:

– To which customer characteristics are certain spending behaviors (such as weekend shopping, evening shopping, luxurious shopping) linked to?

– What are some common groups of customers who exhibit similar demographics and/or shopping behavior?

– If a marketing promotion in a certain product/service category is to be offered, which customers are more likely to respond to it?
Your goal is to answer these and similar questions to the maximum possible extent using descriptive analytics tools as well as predictive analytics concepts and tools such as regression, clustering and/or classification models.

The Deliverables

You are given a detailed dataset that includes the following layers, which will help you seek answers to the questions above:

– credit card transactions

– demographic profiles of customers who make these transactions
Using the information above, perform necessary analysis to answer the following questions:

Provide descriptive statistics on customer demographics and transaction activity. For instance, for each demographic feature you can generate histograms, boxplots or any other appropriate visual representations that provide insight into the distribution of demographic data. Similarly, provide distributional statistics and visualizations on transaction data. For instance, you may analyze the number of transactions and/or transaction amounts, broken down by days of week, hours of day, category, etc. and further broken down by customer demographics such as age, gender, marital status, etc. Your response here should be comprehensive enough to give the reader enough insight about the nature of the dataset. You should also create at least 1 visualization using R. (15 points)

Design and calculate at least two quantitative measures for quantifying the “shopping behavior” of customers. For instance, you may consider shopping with respect to product/service categories of merchants, or with respect to day of week or hour of day, or geographical spread (using merchant XY coordinates) of shopping (for instance, % amount spent on weekends, or % amount spent on groceries). Your measures should produce normalized continuous values within the range 0-100. (15 points)

Develop multiple linear regression models that link the two measures you have developed in Part 2 with the demographic attributes of customers. Your goal should be to look for significant relationships between customer demographics and shopping behavior. For instance, a valid model can reveal the fact that a certain group of customers shop more on the weekends than on weekdays. Use dummy variables in your models as necessary.
(20 points)

Generate at least 3 different clustering models using R for the customer data to explore the similarity of customers. The clustering models may consider various sets of attributes such as customer demographics and/or your shopping behavior measures. Interpret the clusters you obtain (in terms of customer profile within each cluster and levels of similarity) as a result of executing your clustering models. Speculate on what kind of products/services/campaigns can be targeted to each cluster. (20 points)

Using appropriate classification models in R, find out how likely a customer is to respond to a promotional campaign to shop for

groceries (MARKET) on a weekday (i.e. Monday through Friday). (10 points)

clothing (GIYIM) of at least 100 TL on a weekend. (10 points)

The Data

The two tables of business data are provided to you in a single Excel file, which contains the worksheets listed below.

customers: all customers in the analysis and their demographic attributes.
transactions: details of every purchase transaction made by the customers in the customers table, including the merchant XY coordinates.

Note that the data is rather extensive in number of records; should you choose, you have the liberty to sample the data and use appropriate subsets, without sacrificing the quality of analysis. If you do this, you must explain your sampling method in your project report as well.