This is my page on "what the heck is going on with STATA histograms?"
(View this page in a decent browser like Opera, Firefox, or Seamonkey -- or you're asking for a mess!)

Check out the image below:
The first example of STATA Hist
Here you see 3 windows overlayed:
  1. A Graph window of the histogram
  2. The Data Browser showing my very simple dataset (1 variable, 6 observations)
  3. The Results window  (mostly covered - but you can see the "hist" command I just gave to produce this graph)
Notice that I asked STATA to give the bins a width of 1, so why is the last bin so clearly a width of 2?  It's obvious if you look at all the other bins, the membership in the bin is only determined by the number on the LEFT edge of the bin.  This graph would lead one to believe that the same number of people had a score of 2 as had a score of 4, and that there were no scores of 5!

(notice that STATA admits that it is only going to create 4 bins -- but isn't that contradictory to the reported width of 1?)

So, I tried this next:

I thought that perhaps the reason why the upper bar in the graph above had a width of 2 was that it was constrained by the range on the X axis.

However, as you can see, even when I opened up the scale, STATA refused to use the extra space.

It turns out that what you need to do apparently is this:

By adding the discrete option, it actually does what is says it's going to do: keep the width of the bins at 1.

STATA help says that without the discrete option, it assumes the data is continuous, but does that justify making bins that are obviously not the width that it claims they are?

Maybe a more experienced STATA user can help me understand whatever rule the graph above is using (I tried assuming that the bins above are actually tied to the midpoints, like the graph obviously is on the right, but still couldn't come up with a reliable rule to predict the bars).

Anyway, what if it turns out that you want to go discrete, but you don't want a width of 1?

Turns out that even though STATA help says that 'discrete' assumes width of 1 (hence the term, I guess), it does allow you to specify width as shown below:

Here, it appears that membership in the bin is determined by being equal to either the middle number or the number to the left (which is why the 0-2 bin reflect 1 score and the 4-5 bin reflects 2)

So, if you have any ideas of how this thing works, leave me a comment below (I would list my email except for all the spam scanners out there, but I'll be sure to read it and be very grateful!)

{By the way, not that it's any great accomplishment, but I built this page with Composer in SeaMonkey (the new Mozilla - and yes, they've finally added some new features to it (CSS, layers) - check it out, it's very cool.)}

Back to My Homepage