One-Factor Plots

One-factor EDA assumes the model y = f(x) + e. x is typically a categorical factor (batch, machine, operator, treatment group) and the plots answer: do the factor levels produce different location or spread?

Pass x to EDAData as a 1-D array of labels or numbers, matching len(y).

Reference: NIST Handbook Chapter 1.3.3

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from drippy import EDAData
from drippy import (
    scatter_plot,
    box_plot,
    bihistogram,
    qq_plot,
    mean_plot,
    sd_plot,
)

rng = np.random.default_rng(42)

# Five production batches — 20 observations each
batches = ["B1", "B2", "B3", "B4", "B5"]
x_cat = np.repeat(batches, 20)
locs = [5.0, 5.5, 4.8, 5.2, 5.1]
y_cat = np.concatenate(
    [rng.normal(loc=l, scale=1.0, size=20) for l in locs]
)
data_cat = EDAData(y=y_cat, x=x_cat)

# Two-group comparison — 30 observations each
x_two = np.repeat(["Method_A", "Method_B"], 30)
y_two = np.concatenate(
    [rng.normal(10.0, 1.0, 30), rng.normal(10.5, 1.2, 30)]
)
data_two = EDAData(y=y_two, x=x_two)

Scatter Plot (NIST 1.3.3.26)

Raw scatter of y vs x. Works with both continuous and categorical x. The simplest way to spot location differences, outliers, or non-linearity.

x_cont = np.linspace(1, 5, 100)
y_cont = 2.0 * x_cont + rng.normal(scale=0.5, size=100)
data_cont = EDAData(y=y_cont, x=x_cont)

fig, ax = scatter_plot(data_cont)
plt.show()
../_images/a3a17157e162a0f30da7d4faf9ef273541fc5ec127d7180169aa9cc5f8a6229f.png

Box Plot (NIST 1.3.3.7)

Displays median, interquartile range, and outliers for each factor level. Ideal for comparing location and spread across many groups simultaneously.

fig, ax = box_plot(data_cat)
plt.show()
../_images/9260d6200fd301f96dc3d37525671e5a4d497f60e609fc41af78c213e02358a5.png

Bihistogram (NIST 1.3.3.2)

Side-by-side histograms for exactly two factor levels. The mirrored layout makes it easy to compare distribution shape, location, and spread between, for example, two measurement methods or two labs.

fig, axes = bihistogram(data_two)
plt.show()
../_images/f1e66b8c91e1a27ba3ed1f0a11115d5c22b75cc1d8d7cea2f4ab24c69a7740e0.png

Q-Q Plot (NIST 1.3.3.23)

Quantile-quantile plot comparing two groups. Points on the diagonal indicate identical distributions; vertical shifts indicate location differences; changes in slope indicate scale differences.

fig, ax = qq_plot(data_two)
plt.show()
../_images/5897af17479f0a431b81c93e9a140dea9bad3a737b90d2bc4101e7c988f180d7.png

Mean Plot (NIST 1.3.3.17)

Group means connected by a line, with a grand-mean reference. A flat line indicates no factor effect; visible trends or steps suggest systematic differences between levels.

fig, ax = mean_plot(data_cat)
plt.show()
../_images/802c1a361c74c863cb8ae37edb700c22c6c2056792b3b96cfa6b86534ddc9c41.png

SD Plot (NIST 1.3.3.28)

Group standard deviations connected by a line, with a pooled-SD reference. Detects heteroscedasticity — factor levels with unusually high or low variability.

fig, ax = sd_plot(data_cat)
plt.show()
../_images/9713d8b782abad78aa93ad77c34bc5a1f1c2e2ef363df78c38c57c9d1de19528.png