# Computer Science

The file P03_22.xlsx lists financial data on movies released since 1980 with budgets of at least $20 million.

Reduce the size of this data set by deleting all movies with a budget of more than $100 million. Also, delete all movies where US gross and/or Worldwide gross is listed as unknown.

For the remaining movies, create a table of correlations between the variables Budget, US Gross, and WorldWide gross. Comment on the results. Are there any surprises?

For the movies remaining after part a, create a scatterplot of WorldWide Gross (Y axis) versus Budget. Briefly explain any patterns you see in the scatterplots. Do they seem to be consistent with the corresponding correlations?

Problem 30 p. 107

The file P03_30xlsx contains monthly data on exchange rates of various currencies versus the US dollar. It is of interest to financial analysts and economists to see whether exchange rates move together through time. You can find correlations between the exchange rates themselves, but its more useful with time series data to check the correlations between differences month to month.

Create a column of differences for each currency. For example, the difference corresponding to Jan-06 will be blank for each currency because the Dec-05 value isn’t listed, but the difference for euros in Feb-06 will be 0.8375-0.8247.

Create a table of correlations between all of the original variables. Then on the same sheet, create a second table of correlations between the difference variables. On the same sheet, enter 2 cutoff values, one positive such as 0.6 and one negative such as -0.5, and use conditional formatting to color all correlations (in both tables)above the positive cutoff green and all correlations below the negative cutoff red. Do it so 1s on the diagonal are not colored.

Based on the second table, and your coloring, can you conclude that these currencies tend to move together in the same direction? If not what can you Can you explain how the correlation between 2 currencies like the Chinese yuan and British pound can be fairly highly negatively correlated, whereas the correlation between their differences is essentially 0? Would you conclude that these 2 currencies “move together”? (Hint: There is no easy answer, but scatterplots and time series graphs for these 2 currencies and their differences are revealing).

Problem 40 p.129

The file P03_40.xlsx contains monthly data on the number of border crossings from Mexico into 4 southwestern states.

Restructure this data set on a new sheet so that there are 3 long columns: Month, State, and Crossings. Essentially, you should stack the original columns B through E on top of one another to get the crossings column, and you should also indicate which state each row corresponds to in the State column. The Month column should have 4 replicas of the original month column.

Create a pivot table and corresponding pivot table chart based on the restructured data. It should break down by average of Crossings by Year and State. Comment on any patterns you see in the chart.

Problem 42 & 43 p. 129

42- One pivot table element we didn’t explain is a calculated item. This is usually a new category for some categorical variable that is created from existing categories. It is easiest to learn from an example. Open the file Elecmart Sales.xlsx from this section, create a pivot table, and put Day in the Rows area. Proceed as follows to create two new categories, Weekday and Weekend.

Select any day and Select Calculated Item from the Formulas dropdown list on the pivot table tools options ribbon. This will open a dialog box. Enter weekend in the name box and enter the formula=Sat+Sun in the formula box (You can double click the items in the Items list to help build this formula). When you click Ok, you will see weekend in the pivot table.

Do it yourself, Create another calculated item, weekday, for Mon through Fri.

Filter out all of the individual days from the row area, so that only Weekday and Weekends remain, and then find the sum of total costs for these 2 new categories. How can you check whether these sums are what you think they should be? (Notes about calculated items: First, if you have weekend, weekday, and some individual days showing in the rows area, the sum of the total cost will double count these individual days so be careful about this. Second beware that if you create a calculated item from some variable such as Day, you are no longer allowed to drag that variable to the Filters area).

43-Building on the previous problem, another pivot table element we didn’t explain is the calculated field. This is usually a new numerical variable built from numerical variables that can be summarized in the values area. It acts somewhat like a new column in the spreadsheet data, but there is an important difference. Again, it is easiest to learn from an example. Open the file Elecmart Sales.xlsx and follow the instructions below

Create a new column in the data, CostPerItem, which is total cost divided by Items ordered. Then create a pivot table and find the average of costperitem, broken down by region. You should find averages such as $50.41 for the mIdwest. Explain exactly how this value is calculated. Would such an average be of much interest to a manager at Electmart? Why or why not?

Select any average in the pivot table , then select calculated field from the formulas dropdown list on the analyze/options ribbon. This will open a dialog box. Enter CF_CostperItem in the name box (we addedCF, for calculated field, because we are not allowed to use the CostPerItem name that already exists) enter the formula =TotalCost/ItemsOrdered and click ok. You should now see a new column in the pivot table. Su,m of CF_CostperItem with different values than in the Average of CostperItem column. For example, the new value for the MidWest should be$46.47. Do some investigation to understand how this sum was calculated. From a managers point of view, does it make any sense? (Note: on calculated fields: When you summarize a calculated field, it doesn’t matter whether you express it as sum, average, max or any other summary measure. It is calculated in exactly the same way in each case).

Problem 49 p.130

The file P03_22xlsx lists financial data on movies released since 1980 with budgets of at least $20 million.

Create 3 new variables, Ratio1, Ratio2, and Decade. Ratio! Should be US Gross divided by budget, Ratio 2 should be Worldwide Gross divided by Budget, and Decade should list 1980s, 1990s, or 2000s, depending on the year of the release data. If either US Gross or WorldWide Gross is listed as “Unknown” the corresponding ratio should be blank (Hint: For decade, use the YEAR function to fill in a new Year column. Then use a lookup table to populate the Decade Column).

Use a pivot table to find counts of movies of various distributions. Then go back to the data and create one more column, Distributor New, which lists the distributor for distributors with at least 30 movies and list other for the rest. (Hint: Use a lookup table to populate Distributor New, but also use an IF to fill in other where the distributor is missing).

Create a pivot table and corresponding pivot chart that shows average and standard deviation of Ratio1, broken down by Distributor New, with Decade in the Filters Area. Comment on any striking results.

Repeat part c for Ratio 2.