% Class 5 - Statistics % Coded by Nigel Reuel on 9.10.19 % % Load the data D = readmatrix('Bulls1997.xlsx') Smean = mean(D(:,3)); fprintf('Mean salary is $%.2f\n',Smean) Smedian = median(D(:,3)); fprintf('Median salary is $%.2f\n',Smedian) % Also measure the spread % Using the standard deviation S_std = std(D(:,3)) % You can measure the interquartile range S_iqr = iqr(D(:,3)) % Example of box and whisker plot boxplot(D(:,3)) % Histogram %histogram(D(2:end,3)) % With the class 2017 data, % report the average commute, % also report 95% confidence range (+/- 2*std) % Column 2 has the commute time CT = readmatrix('ClassData.xlsx'); Avg_CT = mean(CT(:,2)) CT_std = std(CT(:,2)); fprintf('We are 95 percent confident that commute time is between %.2f and %.2f min\n',Avg_CT-2*CT_std,Avg_CT+2*CT_std); % This data is a large sample, let's look % at a histogram histogram(CT(:,2)) xlabel('Commute Time (min)') ylabel('Frequency') % Example of histogram w/ more input histogram(CT(:,2),10) %number of bins xlabel('Commute Time (min)') ylabel('Frequency') % If you need to extract information, % you can use an output variable h1 = histogram(CT(:,2),10) %number of bins xlabel('Commute Time (min)') ylabel('Frequency') % To pull off the count information Counts = h1.Values % Example of plotting histogram % Determining type of distribution % Plotting histogram w/ distribution % You measure the lifetime of a product (months) life = [ 6.2 16.1 16.3 19.0 12.2 8.1 8.8 5.9 7.3 8.2 ... 16.1 12.8 9.8 11.3 5.1 10.8 6.7 1.2 8.3 2.3 ... 4.3 2.9 14.8 4.6 3.1 13.6 14.5 5.2 5.7 6.5 ... 5.3 6.4 3.5 11.4 9.3 12.4 18.3 15.9 4.0 10.4 ... 8.7 3.0 12.1 3.9 6.5 3.4 8.5 0.9 9.9 7.9]; histogram(life,15) xlabel('Product Lifetime (months)') ylabel('Frequency') % From inspection, you see skew, % it is not a normal distribution % Poisson distribution can be used for failure analysis histfit(life,15,'Poisson') Param = fitdist(life','Poisson') % this function needs a column vector input % Post example of use of icdf % % AFTER CLASS EXAMPLE % I forgot to show two important things, % generating random data for a distribution % and using icdf % % 1) Assume a distribution of athlete heights mu = 1.75; %m sigma = 0.13;%m % Generate 100 athlete hights from this normal distrubution H_vec = random('Normal',mu,sigma,100,1); % Make a histogram from this data, 20 bins histogram(H_vec,20) % Make a distribution from this data pd1 = fitdist(H_vec,'Normal'); % Make a distrubution from the pdf parameters pdf2 = makedist('Normal','mu',mu,'sigma',sigma); % Use the pdf to make predictions % Q1) How many athletes would you need to screen % to find one that is > 2 meters tall? % (NOTE: base if off of pdf2, it is more exact. WHY?) prob = cdf(pdf2,2) n_screen = 1/(1-prob); fprintf('You need to screen %d athletes to find one > 2m tall!\n',ceil(n_screen)) % % NOW, inverse cdf is useful if you want % to ask the opposite type of question, where % we know the probability but want to determine % the x-value % % Q2:How tall are the top 0.01% of athletes % according to this distribution x_top = icdf(pdf2,1-0.0001); fprintf('The top 0.01 percent athletes are greater than %.2f m tall!\n',x_top)