{ "cells": [ { "cell_type": "markdown", "id": "5e1851be", "metadata": {}, "source": [ "# Univariate Plots\n", "\n", "Univariate EDA assumes the model **y = c + e** — a fixed location `c` plus\n", "random error `e`. The goal is to characterise the distribution of `e`\n", "and check whether the data are truly random and stationary.\n", "\n", "Reference: [NIST Handbook Chapter 1.3.3](https://www.itl.nist.gov/div898/handbook/eda/section3/eda33.htm)" ] }, { "cell_type": "code", "execution_count": null, "id": "893271e6", "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from drippy import EDAData\n", "from drippy import (\n", " run_sequence_plot,\n", " lag_plot,\n", " histogram,\n", " normal_probability_plot,\n", " four_plot,\n", " ppcc_plot,\n", " weibull_plot,\n", " probability_plot,\n", " box_cox_normality_plot,\n", " bootstrap_plot,\n", " box_cox_linearity_plot,\n", ")\n", "\n", "rng = np.random.default_rng(42)\n", "\n", "# Normal data — 200 ceramic-strength-like measurements\n", "y = rng.normal(loc=688.0, scale=65.0, size=200)\n", "data = EDAData(y=y)\n", "\n", "# Positive-only data — component lifetimes following a Weibull distribution\n", "y_pos = rng.weibull(2.0, size=100) * 500\n", "data_pos = EDAData(y=y_pos)" ] }, { "cell_type": "markdown", "id": "fc54e19e", "metadata": {}, "source": [ "## Run Sequence Plot (NIST 1.3.3.25)\n", "\n", "Plots `y` in the order the measurements were taken. Drifts, shifts, or\n", "periodic patterns visible here indicate the data are **not** stationary." ] }, { "cell_type": "code", "execution_count": null, "id": "8e080d58", "metadata": {}, "outputs": [], "source": [ "fig, ax = run_sequence_plot(data)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "0743aedc", "metadata": {}, "source": [ "Supply a physical time axis via `t`:" ] }, { "cell_type": "code", "execution_count": null, "id": "243283c4", "metadata": {}, "outputs": [], "source": [ "t = np.linspace(0, 10, 200)\n", "data_t = EDAData(y=y, t=t)\n", "fig, ax = run_sequence_plot(data_t)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "f9150248", "metadata": {}, "source": [ "## Lag Plot (NIST 1.3.3.15)\n", "\n", "Scatters `y[i]` against `y[i − lag]`. A structureless cloud confirms\n", "randomness; patterns (lines, ellipses) reveal autocorrelation." ] }, { "cell_type": "code", "execution_count": null, "id": "aa18cfe5", "metadata": {}, "outputs": [], "source": [ "fig, ax = lag_plot(data)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "31f78c2e", "metadata": {}, "outputs": [], "source": [ "fig, ax = lag_plot(data, lag=4)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "9ad18d25", "metadata": {}, "source": [ "## Histogram (NIST 1.3.3.14)\n", "\n", "Shows the frequency distribution of `y`. Look for symmetry, outliers,\n", "and whether the shape matches a known family." ] }, { "cell_type": "code", "execution_count": null, "id": "578f3a06", "metadata": {}, "outputs": [], "source": [ "fig, ax = histogram(data)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "ece61d4f", "metadata": {}, "outputs": [], "source": [ "fig, ax = histogram(data, bins=20)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "b27ad559", "metadata": {}, "source": [ "## Normal Probability Plot (NIST 1.3.3.21)\n", "\n", "Ordered data against normal quantiles. A straight line indicates\n", "normality; curvature suggests skew or heavy tails." ] }, { "cell_type": "code", "execution_count": null, "id": "506cfd29", "metadata": {}, "outputs": [], "source": [ "fig, ax, _ = normal_probability_plot(data)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "889ed424", "metadata": {}, "outputs": [], "source": [ "fig, ax, rsq = normal_probability_plot(data, return_rsquared=True)\n", "print(f\"R² = {rsq:.4f}\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "073fec4a", "metadata": {}, "source": [ "## 4-Plot (NIST 1.3.3.5)\n", "\n", "The 4-plot combines the run-sequence plot, lag plot, histogram, and normal\n", "probability plot in a single 2×2 figure — the recommended first step for\n", "any univariate EDA." ] }, { "cell_type": "code", "execution_count": null, "id": "504e9265", "metadata": {}, "outputs": [], "source": [ "fig, axes = four_plot(data)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "92a11d98", "metadata": {}, "source": [ "## PPCC Plot (NIST 1.3.3.6)\n", "\n", "The Probability Plot Correlation Coefficient (PPCC) plot finds the shape\n", "parameter of a distribution family that best fits the data. The rough\n", "panel locates the maximum; the fine panel zooms in." ] }, { "cell_type": "code", "execution_count": null, "id": "790a2b33", "metadata": {}, "outputs": [], "source": [ "fig, axes = ppcc_plot(data_pos)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "cc29ce09", "metadata": {}, "outputs": [], "source": [ "fig, axes = ppcc_plot(data_pos, rough_range=(0.5, 3.0))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "5e4ac7dd", "metadata": {}, "source": [ "## Weibull Plot (NIST 1.3.3.30)\n", "\n", "Linearised Weibull probability plot for positive failure-time or\n", "strength data. The fitted slope is the Weibull shape parameter β;\n", "the vertical dashed line marks the characteristic life η." ] }, { "cell_type": "code", "execution_count": null, "id": "c9640830", "metadata": {}, "outputs": [], "source": [ "fig, ax = weibull_plot(data_pos)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "b71d60e4", "metadata": {}, "source": [ "## Probability Plot (NIST 1.3.3.24)\n", "\n", "Generalisation of the normal probability plot: compare ordered data\n", "against the quantiles of any `scipy.stats` distribution." ] }, { "cell_type": "code", "execution_count": null, "id": "bfbe691c", "metadata": {}, "outputs": [], "source": [ "fig, ax = probability_plot(data)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "068a3e86", "metadata": {}, "outputs": [], "source": [ "fig, ax = probability_plot(data, distribution=\"expon\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "05cf1605", "metadata": {}, "source": [ "## Box-Cox Normality Plot (NIST 1.3.3.8)\n", "\n", "Finds the Box-Cox power λ that maximises normality of positive data.\n", "The 2×2 grid shows the original histogram, the PPCC-vs-λ curve,\n", "the transformed histogram, and the normal probability plot of the\n", "transformed data." ] }, { "cell_type": "code", "execution_count": null, "id": "97fcb669", "metadata": {}, "outputs": [], "source": [ "fig, axes = box_cox_normality_plot(data_pos)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "597888b7", "metadata": {}, "source": [ "## Bootstrap Plot (NIST 1.3.3.4)\n", "\n", "Resamples the data to approximate the sampling distribution of any\n", "statistic. The histogram of the bootstrap distribution gives a\n", "non-parametric confidence interval for the statistic." ] }, { "cell_type": "code", "execution_count": null, "id": "7b5c483e", "metadata": {}, "outputs": [], "source": [ "fig, ax = bootstrap_plot(data, n_bootstrap=500)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "31338610", "metadata": {}, "outputs": [], "source": [ "fig, ax = bootstrap_plot(data, statistic=np.median, n_bootstrap=500)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "1b67fe0a", "metadata": {}, "source": [ "## Box-Cox Linearity Plot (NIST 1.3.3.9)\n", "\n", "Plots |corr(y, x^λ)| across a range of λ values to identify the power\n", "transformation of `x` that best linearises the relationship with `y`.\n", "Requires `x > 0`." ] }, { "cell_type": "code", "execution_count": null, "id": "07bf1a07", "metadata": {}, "outputs": [], "source": [ "x_pos = np.linspace(0.1, 10.0, 50)\n", "y_lin = 3.0 * x_pos**0.5 + rng.normal(scale=0.5, size=50)\n", "data_lin = EDAData(y=y_lin, x=x_pos)\n", "\n", "fig, ax = box_cox_linearity_plot(data_lin)\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 5 }