The following plot shows a very similar box plot but with an entirely different distribution. Hist(distExampleData, breaks=40, col=secondar圜olor, border=F, main="", xlab="value of the variable", xlim=c(-10,20)) Layout(mat = matrix(c(1,2),2,1, byrow=TRUE), height = c(1,8))īoxplot(distExampleData, horizontal=TRUE, ylim=c(-10,20), xaxt="n", col=primar圜olor, frame=F) For example, if we were looking at just the box plot of the following data set, we wouldn’t be able to tell if the distribution of the data is centered about two points or pretty much spread even across the data range. Hist(dataLogNorm, col = primar圜olor, breaks = 50)īoxplot(dataLogNorm, horizontal=TRUE, col = secondar圜olor, outline=TRUE, add = TRUE)īox plots do not display all statistics needed to determine the distribution.
The following plot shows a histogram and a boxplot of the same data to help understand the box plot and how the data is divided into quartiles. Names(combinedData) = c("Normal Dist", "Log Normal Dist")īoxplot(combinedData, col = c(primar圜olor, secondar圜olor) ) combinedData = list(dataNorm, dataLogNorm) The first and second quartiles are very short compared to the first and second quartiles of the normal distribution example, and compared to the third and fourth quartile of the log normal distribution. The plots show that the distribution between the data points is different. The following plot shows a boxplot of data with a normal distribution and a box plot of data with a log normal distribution. Names(C)=c(paste("Category 1\n n=", length(cat1), sep=""), paste("Category 2\n n=", length(cat2), sep=""))īoxplot(C, col=c(primar圜olor, secondar圜olor), ylab="value" ) Having the two plots side by side helps make a quick comparison to see if the numeric data in one category is significantly different than in the other category. The plot shows two box plots, one for category 1 and the other for category 2. The box plot is also useful for evaluating the relationship between numeric data (continuous data) and categorical data (finite data). For example, 100 or more data points with a normal distribution commonly have some outliers. With large data points, outliers are usually expected. With a loose definition of outliers, you could use the chart to identify the possible existence of outliers. The interpretation of the compactness or spread of the data also applies to each of the 4 sections of the box plot. If the box plot is relatively tall, then the data is spread out. IF the box plot is relatively short, then the data is more compact. We've constructed ourīox-and-whiskers plot, which helps us visualize the entire range but also you could say the middle, roughly the middle half of our numbers.A box plot gives us a basic idea of the distribution of the data. And so the seven is, I guess you could say the The seven has three to the left, remember of the top half, and three to the right. Our median is right inīetween at four-and-a-half. So that right over there is kind of the left boundary of our box, and then for the right boundary, we need to figure out the middle And so the median of those is going to be the one which has three on either side, so it's going to be this Of numbers right over here and find the middle. So now let's take thisīottom half of numbers. Our data points right here, because our median is 4.5. Of course I'm going to exclude the median.
And now, I want to figure out the median of the bottom half of numbersĪnd the top half of numbers. Median of our entire data set, four-and-a-half, four-and-a-half. This four and this five, and you take the mean of the two. Number of numbers like this, you take the middle two numbers, So this four and five, the middle is actually Same thing would haveīeen true for this five. Right over here, this four, but notice, there's one, two, three, four, five, six, seven above it, and there's only one, two, Going to help define my median, because there's no one middle number.
Since I have an even number of numbers, the middle two numbers are I have one, two, three,įour, five, six, seven, eight, nine, 10, 11, 12, 13, 14 numbers. I can plot the whiskers, because I see the range. I have ordered these numbersįrom least to greatest, and now, well just like that, I have a couple ofĮights, and I have a 10. We've got some twos here and some threes, some threes, some four- I have one four and fives. So let me order these numbersįrom least to greatest. Going to do a box-and-whiskers, I'm going to order these numbers.
The exercises yourself, but let's just use this as an example. Where you can't see, there's actually a check answer. And they say the order isn't checked, and that's because I'm doing Might drag the numbers around, which I will do, because And they gave us a bunch of data points, and it says, if it helps, you Represent the following data using a box-and-whiskers plot.