{ "cells": [ { "cell_type": "markdown", "id": "c462b967", "metadata": {}, "source": [ "# One-Factor Plots\n", "\n", "One-factor EDA assumes the model **y = f(x) + e**. `x` is typically a\n", "categorical factor (batch, machine, operator, treatment group) and the\n", "plots answer: *do the factor levels produce different location or spread?*\n", "\n", "Pass `x` to `EDAData` as a 1-D array of labels or numbers, matching `len(y)`.\n", "\n", "Reference: [NIST Handbook Chapter 1.3.3](https://www.itl.nist.gov/div898/handbook/eda/section3/eda33.htm)" ] }, { "cell_type": "code", "execution_count": null, "id": "b5221754", "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from drippy import EDAData\n", "from drippy import (\n", " scatter_plot,\n", " box_plot,\n", " bihistogram,\n", " qq_plot,\n", " mean_plot,\n", " sd_plot,\n", ")\n", "\n", "rng = np.random.default_rng(42)\n", "\n", "# Five production batches — 20 observations each\n", "batches = [\"B1\", \"B2\", \"B3\", \"B4\", \"B5\"]\n", "x_cat = np.repeat(batches, 20)\n", "locs = [5.0, 5.5, 4.8, 5.2, 5.1]\n", "y_cat = np.concatenate(\n", " [rng.normal(loc=l, scale=1.0, size=20) for l in locs]\n", ")\n", "data_cat = EDAData(y=y_cat, x=x_cat)\n", "\n", "# Two-group comparison — 30 observations each\n", "x_two = np.repeat([\"Method_A\", \"Method_B\"], 30)\n", "y_two = np.concatenate(\n", " [rng.normal(10.0, 1.0, 30), rng.normal(10.5, 1.2, 30)]\n", ")\n", "data_two = EDAData(y=y_two, x=x_two)" ] }, { "cell_type": "markdown", "id": "8aa8929f", "metadata": {}, "source": [ "## Scatter Plot (NIST 1.3.3.26)\n", "\n", "Raw scatter of `y` vs `x`. Works with both continuous and categorical `x`.\n", "The simplest way to spot location differences, outliers, or non-linearity." ] }, { "cell_type": "code", "execution_count": null, "id": "33dbfe59", "metadata": {}, "outputs": [], "source": [ "x_cont = np.linspace(1, 5, 100)\n", "y_cont = 2.0 * x_cont + rng.normal(scale=0.5, size=100)\n", "data_cont = EDAData(y=y_cont, x=x_cont)\n", "\n", "fig, ax = scatter_plot(data_cont)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "f0573ac5", "metadata": {}, "source": [ "## Box Plot (NIST 1.3.3.7)\n", "\n", "Displays median, interquartile range, and outliers for each factor level.\n", "Ideal for comparing location and spread across many groups simultaneously." ] }, { "cell_type": "code", "execution_count": null, "id": "d32ebf90", "metadata": {}, "outputs": [], "source": [ "fig, ax = box_plot(data_cat)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "1b80fa36", "metadata": {}, "source": [ "## Bihistogram (NIST 1.3.3.2)\n", "\n", "Side-by-side histograms for exactly two factor levels. The mirrored\n", "layout makes it easy to compare distribution shape, location, and spread\n", "between, for example, two measurement methods or two labs." ] }, { "cell_type": "code", "execution_count": null, "id": "be275bfe", "metadata": {}, "outputs": [], "source": [ "fig, axes = bihistogram(data_two)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "093f16de", "metadata": {}, "source": [ "## Q-Q Plot (NIST 1.3.3.23)\n", "\n", "Quantile-quantile plot comparing two groups. Points on the diagonal\n", "indicate identical distributions; vertical shifts indicate location\n", "differences; changes in slope indicate scale differences." ] }, { "cell_type": "code", "execution_count": null, "id": "a7ccfc69", "metadata": {}, "outputs": [], "source": [ "fig, ax = qq_plot(data_two)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "9b1ded59", "metadata": {}, "source": [ "## Mean Plot (NIST 1.3.3.17)\n", "\n", "Group means connected by a line, with a grand-mean reference.\n", "A flat line indicates no factor effect; visible trends or steps\n", "suggest systematic differences between levels." ] }, { "cell_type": "code", "execution_count": null, "id": "5e27dbd4", "metadata": {}, "outputs": [], "source": [ "fig, ax = mean_plot(data_cat)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "5bf35564", "metadata": {}, "source": [ "## SD Plot (NIST 1.3.3.28)\n", "\n", "Group standard deviations connected by a line, with a pooled-SD\n", "reference. Detects heteroscedasticity — factor levels with unusually\n", "high or low variability." ] }, { "cell_type": "code", "execution_count": null, "id": "30068a0b", "metadata": {}, "outputs": [], "source": [ "fig, ax = sd_plot(data_cat)\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 5 }