Home


Simpson’s paradox is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined (Wikipedia).


Let’s exlore it using the iris data. The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Each sample has four features: the length and the width of the sepals and petals, in centimeters. You can get the data, including details, from UCI Machine Learning Repository