{ "cells": [ { "cell_type": "markdown", "id": "213b4bc8", "metadata": {}, "source": [ "# Quickstart\n", "\n", "drippy is a Python EDA library following principles from the\n", "[NIST/SEMATECH e-Handbook of Statistical Methods](https://www.itl.nist.gov/div898/handbook/).\n", "The entry point is `EDAData`, a validated container that holds your response\n", "variable `y` and optional auxiliary arrays. Every plot function accepts an\n", "`EDAData` object as its first argument.\n", "\n", "## The four data models\n", "\n", "| Model | Constructor | Use case |\n", "|-------|-------------|----------|\n", "| Univariate `y = c + e` | `EDAData(y)` | One response, no predictors |\n", "| Time series `y = f(t) + e` | `EDAData(y, t=t)` | Response indexed by a continuous variable |\n", "| One-factor `y = f(x) + e` | `EDAData(y, x=x)` | Continuous or categorical single predictor |\n", "| Multi-factor / DOE | `EDAData(y, factors={…})` | Named factor arrays for designed experiments |" ] }, { "cell_type": "code", "execution_count": null, "id": "8783008d", "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from drippy import EDAData" ] }, { "cell_type": "markdown", "id": "5078f260", "metadata": {}, "source": [ "## Univariate data\n", "\n", "The simplest case: 200 ceramic-strength measurements drawn from a normal\n", "distribution. Only `y` is required." ] }, { "cell_type": "code", "execution_count": null, "id": "f8ba2936", "metadata": {}, "outputs": [], "source": [ "rng = np.random.default_rng(42)\n", "y = rng.normal(loc=688.0, scale=65.0, size=200)\n", "data = EDAData(y=y)\n", "print(f\"n={len(data.y)}, mean={data.y.mean():.1f}, std={data.y.std():.1f}\")" ] }, { "cell_type": "markdown", "id": "3d1884bb", "metadata": {}, "source": [ "## The 4-plot: first stop in any EDA\n", "\n", "The NIST handbook recommends starting every univariate EDA with the **4-plot**\n", "(NIST 1.3.3.5): a 2×2 composite of the run-sequence plot, lag plot,\n", "histogram, and normal probability plot. Together these four panels answer\n", "the four fundamental questions:\n", "\n", "1. Is the process fixed (constant location)?\n", "2. Is the process random (no autocorrelation)?\n", "3. Is the distribution unimodal?\n", "4. Is the distribution approximately normal?" ] }, { "cell_type": "code", "execution_count": null, "id": "a7b5eb14", "metadata": {}, "outputs": [], "source": [ "from drippy import four_plot\n", "\n", "fig, axes = four_plot(data)\n", "fig.suptitle(\"Ceramic Strength — 4-Plot\", y=1.02)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "13d45bec", "metadata": {}, "source": [ "## Fluent API\n", "\n", "Every plot function is also available as a method on `EDAData`:" ] }, { "cell_type": "code", "execution_count": null, "id": "7dc119e2", "metadata": {}, "outputs": [], "source": [ "fig, ax = data.histogram(bins=20)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "95478ed4", "metadata": {}, "source": [ "## Time-series data" ] }, { "cell_type": "code", "execution_count": null, "id": "8fef61f5", "metadata": {}, "outputs": [], "source": [ "t = np.linspace(0, 10, 200)\n", "noise = rng.normal(scale=0.5, size=200)\n", "y_ts = 2.0 * np.sin(2 * np.pi * 0.5 * t) + noise\n", "data_ts = EDAData(y=y_ts, t=t)\n", "\n", "fig, ax = data_ts.run_sequence_plot()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "eec382bd", "metadata": {}, "source": [ "## One-factor (categorical) data" ] }, { "cell_type": "code", "execution_count": null, "id": "712e1377", "metadata": {}, "outputs": [], "source": [ "batches = [\"B1\", \"B2\", \"B3\", \"B4\", \"B5\"]\n", "x_cat = np.repeat(batches, 20)\n", "locs = [5.0, 5.5, 4.8, 5.2, 5.1]\n", "y_cat = np.concatenate([rng.normal(loc=l, scale=1.0, size=20) for l in locs])\n", "data_cat = EDAData(y=y_cat, x=x_cat)\n", "\n", "fig, ax = data_cat.box_plot()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "3fc14ec4", "metadata": {}, "source": [ "## Multi-factor / DOE data" ] }, { "cell_type": "code", "execution_count": null, "id": "1584c5fa", "metadata": {}, "outputs": [], "source": [ "A = np.tile([-1, 1], 8)\n", "B = np.repeat([-1, 1], 8)\n", "y_doe = 2.0 * A + 1.5 * B + rng.normal(scale=0.3, size=16)\n", "data_doe = EDAData(y=y_doe, factors={\"A\": A, \"B\": B})\n", "\n", "fig, axes = data_doe.doe_mean_plot()\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 5 }