[ Pobierz całość w formacie PDF ]
.The variable X6 (diagonal) already worked well in the boxploton Figure 1.4 in distinguishing between the counterfeit and genuine notes.Here, this variableis assigned to the face line and the darkness of the hair.That is why we clearly see a goodseparation within these 20 observations.What happens if we include all 100 genuine and all 100 counterfeit bank notes in the Chernoff-Flury face technique? Figures 1.16 and 1.17 show the faces of the genuine bank notes with the1.5 Chernoff-Flury Faces 37Observations 51 to 100Figure 1.17.Chernoff-Flury faces for observations 51 to 100 of the banknotes.MVAfacebank50.xplsame assignments as used before and Figures 1.18 and 1.19 show the faces of the counterfeitbank notes.Comparing Figure 1.16 and Figure 1.18 one clearly sees that the diagonal (faceline) is longer for genuine bank notes.Equivalently coded is the hair darkness (diagonal)which is lighter (shorter) for the counterfeit bank notes.One sees that the faces of thegenuine bank notes have a much darker appearance and have broader face lines.The facesin Figures 1.16 1.17 are obviously different from the ones in Figures 1.18 1.19.Summary’! Faces can be used to detect subgroups in multivariate data.’! Subgroups are characterized by similar looking faces.’! Outliers are identified by extreme faces, e.g., dark hair, smile or a happyface.’! If one element of X is unusual, the corresponding face element significantlychanges in shape.38 1 Comparison of BatchesObservations 101 to 150Figure 1.18.Chernoff-Flury faces for observations 101 to 150 of the banknotes.MVAfacebank50.xplObservations 151 to 200Figure 1.19.Chernoff-Flury faces for observations 151 to 200 of the banknotes.MVAfacebank50.xpl1.6 Andrews Curves 391.6 Andrews CurvesThe basic problem of graphical displays of multivariate data is the dimensionality.Scat-terplots work well up to three dimensions (if we use interactive displays).More than threedimensions have to be coded into displayable 2D or 3D structures (e.g., faces).The ideaof coding and representing multivariate data by curves was suggested by Andrews (1972).Each multivariate observation Xi = (Xi,1,., Xi,p) is transformed into a curve as follows:ñøXi,1òø "+ Xi,2 sin(t) + Xi,3 cos(t) +.+ Xi,p-1 sin(p-1t) + Xi,p cos(p-1t) for p odd2 22fi(t) =Xi,1óø"+ Xi,2 sin(t) + Xi,3 cos(t) +.+ Xi,p sin(pt) for p even22(1.13)such that the observation represents the coefficients of a so-called Fourier series (t " [-À, À]).Suppose that we have three-dimensional observations: X1 = (0, 0, 1), X2 = (1, 0, 0) andX3 = (0, 1, 0).Here p = 3 and the following representations correspond to the Andrewscurves:f1(t) = cos(t)1f2(t) = " and2f3(t) = sin(t).These curves are indeed quite distinct, since the observations X1, X2, and X3 are the 3Dunit vectors: each observation has mass only in one of the three dimensions.The order ofthe variables plays an important role.EXAMPLE 1.2 Let us take the 96th observation of the Swiss bank note data set,X96 = (215.6, 129.9, 129.9, 9.0, 9.5, 141.7).The Andrews curve is by (1.13):215.6"f96(t) = + 129.9 sin(t) + 129.9 cos(t) + 9.0 sin(2t) + 9.5 cos(2t) + 141.7 sin(3t).2Figure 1.20 shows the Andrews curves for observations 96 105 of the Swiss bank note dataset.We already know that the observations 96 100 represent genuine bank notes, and thatthe observations 101 105 represent counterfeit bank notes.We see that at least four curvesdiffer from the others, but it is hard to tell which curve belongs to which group.We know from Figure 1.4 that the sixth variable is an important one.Therefore, the An-drews curves are calculated again using a reversed order of the variables.40 1 Comparison of BatchesAndrews curves (Bank data)-2 0 2tFigure 1.20.Andrews curves of the observations 96 105 from theSwiss bank note data.The order of the variables is 1,2,3,4,5,6.MVAandcur.xplEXAMPLE 1.3 Let us consider again the 96th observation of the Swiss bank note data set,X96 = (215.6, 129.9, 129.9, 9.0, 9.5, 141.7).The Andrews curve is computed using the reversed order of variables:141.7"f96(t) = + 9.5 sin(t) + 9.0 cos(t) + 129.9 sin(2t) + 129.9 cos(2t) + 215.6 sin(3t).2In Figure 1.21 the curves f96 f105 for observations 96 105 are plotted [ Pobierz caÅ‚ość w formacie PDF ]