Univariate Plots

Univariate EDA assumes the model y = c + e — a fixed location c plus random error e. The goal is to characterise the distribution of e and check whether the data are truly random and stationary.

Reference: NIST Handbook Chapter 1.3.3

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from drippy import EDAData
from drippy import (
    run_sequence_plot,
    lag_plot,
    histogram,
    normal_probability_plot,
    four_plot,
    ppcc_plot,
    weibull_plot,
    probability_plot,
    box_cox_normality_plot,
    bootstrap_plot,
    box_cox_linearity_plot,
)

rng = np.random.default_rng(42)

# Normal data — 200 ceramic-strength-like measurements
y = rng.normal(loc=688.0, scale=65.0, size=200)
data = EDAData(y=y)

# Positive-only data — component lifetimes following a Weibull distribution
y_pos = rng.weibull(2.0, size=100) * 500
data_pos = EDAData(y=y_pos)

Run Sequence Plot (NIST 1.3.3.25)

Plots y in the order the measurements were taken. Drifts, shifts, or periodic patterns visible here indicate the data are not stationary.

fig, ax = run_sequence_plot(data)
plt.show()
../_images/9aa43f206968fd278c3a41dd6b825c96507495d649d1e0eafdf7e5942eeb37b4.png

Supply a physical time axis via t:

t = np.linspace(0, 10, 200)
data_t = EDAData(y=y, t=t)
fig, ax = run_sequence_plot(data_t)
plt.show()
../_images/c70b4eaeb78cae63a375038dc83e88f4e734405e9f6447c074287bc5c857c34c.png

Lag Plot (NIST 1.3.3.15)

Scatters y[i] against y[i lag]. A structureless cloud confirms randomness; patterns (lines, ellipses) reveal autocorrelation.

fig, ax = lag_plot(data)
plt.show()
../_images/3e2478e7235610cfaee4e72f99bfa423dc202e8b21c25a58956cfb7608a86a7e.png
fig, ax = lag_plot(data, lag=4)
plt.show()
../_images/e5c4af4abdce9716ec58534821cb18ea3d397468da67db00a7637d12db56f6b9.png

Histogram (NIST 1.3.3.14)

Shows the frequency distribution of y. Look for symmetry, outliers, and whether the shape matches a known family.

fig, ax = histogram(data)
plt.show()
../_images/3f86742fc06829e7107a559438ee7c4008a158a86eca644acea9e187b07705ea.png
fig, ax = histogram(data, bins=20)
plt.show()
../_images/022e485f0cabb6abfabb0bcf0ea9c55952ac083eb16c4bbb6dfc982f7af41bf1.png

Normal Probability Plot (NIST 1.3.3.21)

Ordered data against normal quantiles. A straight line indicates normality; curvature suggests skew or heavy tails.

fig, ax, _ = normal_probability_plot(data)
plt.show()
../_images/6ee2e46dbdf012730639bf5839793c8a6a7794d278d233713b2d27a79fba8005.png
fig, ax, rsq = normal_probability_plot(data, return_rsquared=True)
print(f"R² = {rsq:.4f}")
plt.show()
R² = 0.9932
../_images/6ee2e46dbdf012730639bf5839793c8a6a7794d278d233713b2d27a79fba8005.png

4-Plot (NIST 1.3.3.5)

The 4-plot combines the run-sequence plot, lag plot, histogram, and normal probability plot in a single 2×2 figure — the recommended first step for any univariate EDA.

fig, axes = four_plot(data)
plt.show()
../_images/2b17de365083ba4f4fc6958c49546c04c0403c2ce1fbb41c5f2e98f228724c62.png

PPCC Plot (NIST 1.3.3.6)

The Probability Plot Correlation Coefficient (PPCC) plot finds the shape parameter of a distribution family that best fits the data. The rough panel locates the maximum; the fine panel zooms in.

fig, axes = ppcc_plot(data_pos)
plt.show()
../_images/dc96d7e6dc935013a53a2114aa495c4f54d7982a7d1ea668695a7dac7e7c3a56.png
fig, axes = ppcc_plot(data_pos, rough_range=(0.5, 3.0))
plt.show()
../_images/3731f3985ea81f6294f3dba626b46c2df692a6590c3280c97b6a3e559c648e69.png

Weibull Plot (NIST 1.3.3.30)

Linearised Weibull probability plot for positive failure-time or strength data. The fitted slope is the Weibull shape parameter β; the vertical dashed line marks the characteristic life η.

fig, ax = weibull_plot(data_pos)
plt.show()
../_images/f4a2451306aeca7ca831108b4416dac502d0b8cd47d580aa949a397347b04979.png

Probability Plot (NIST 1.3.3.24)

Generalisation of the normal probability plot: compare ordered data against the quantiles of any scipy.stats distribution.

fig, ax = probability_plot(data)
plt.show()
../_images/42518c3b47630985eee8ae2a45f8943da0bd1578a3c81225b512b816df0b5180.png
fig, ax = probability_plot(data, distribution="expon")
plt.show()
../_images/ceb605fc08764d4bbbf256ef012d2f9f7db893a8b5be24a647acbd3a019c567c.png

Box-Cox Normality Plot (NIST 1.3.3.8)

Finds the Box-Cox power λ that maximises normality of positive data. The 2×2 grid shows the original histogram, the PPCC-vs-λ curve, the transformed histogram, and the normal probability plot of the transformed data.

fig, axes = box_cox_normality_plot(data_pos)
plt.show()
../_images/f181a72264eb4650d3007bdb97effa28b009838f358c6226397494fa225c8a16.png

Bootstrap Plot (NIST 1.3.3.4)

Resamples the data to approximate the sampling distribution of any statistic. The histogram of the bootstrap distribution gives a non-parametric confidence interval for the statistic.

fig, ax = bootstrap_plot(data, n_bootstrap=500)
plt.show()
../_images/470acd6aa903d94e07a4c7918428222ac8cbadb4282d6b241200dee07913159b.png
fig, ax = bootstrap_plot(data, statistic=np.median, n_bootstrap=500)
plt.show()
../_images/80477cf6ad85869b05c095b2b2a846066a38777f30ff04c276c3c48bcf4eb259.png

Box-Cox Linearity Plot (NIST 1.3.3.9)

Plots |corr(y, x^λ)| across a range of λ values to identify the power transformation of x that best linearises the relationship with y. Requires x > 0.

x_pos = np.linspace(0.1, 10.0, 50)
y_lin = 3.0 * x_pos**0.5 + rng.normal(scale=0.5, size=50)
data_lin = EDAData(y=y_lin, x=x_pos)

fig, ax = box_cox_linearity_plot(data_lin)
plt.show()
../_images/3bd8963fce37526f6bc43842575086c66fba11e037e2ecc0144ad70e4e5b067c.png