The Iris data set, a small, well-understood and known data set, consists of the measurements of four attributes of 150 iris flowers from three types of irises. The typical task for the Iris data set is to classify the type of iris based on the measurements. It is one of the most analyzed data sets in statistics, data mining, and multivariate visualization. It was first published by R. A. Fisher in 1936 [1] and is widely available (our copy came from StatLib at CMU ( The file is in CSV format, which can be imported to Excel and other programs.

The data dimensions are as follows:

  1. sepal length in cm;
  2. sepal width in cm;
  3. petal length in cm;
  4. petal width in cm;
  5. class:
    • Iris Setosa
    • Iris Versicolour
    • Iris Virginica


[1] R.A. Fisher. “The Use of Multiple Measurements in Taxonomic Problems.” Annual Eugenics 7:PartII (1936), 179–188.