Skip to content

dfstore

CI PyPI Python License: MIT

dfstore is a lightweight, serverless DataFrame storage library. Save, version, and retrieve pandas and polars DataFrames from a local store — via Python, CLI, or a web UI.


Why dfstore?

Working with DataFrames across notebooks, scripts, and experiments often means wrestling with ad-hoc file naming (data_v2_final_REAL.csv) and no memory of what changed. dfstore solves this cleanly:

  • Every save() creates a new version automatically — no naming required.
  • Metadata (description, tags, shape, dtypes) is stored alongside the data.
  • Load by name; optionally pin to a specific version.
  • No database, no server — just plain Parquet files on disk.

Quick Example

import dfstore
import pandas as pd

df = pd.read_csv("sales.csv")

# Save
dfstore.save(df, name="sales_2024", description="Annual sales", tags=["finance"])

# Load latest
df = dfstore.get("sales_2024")

# Load a specific version
df_v1 = dfstore.get("sales_2024", version=1)

At a Glance

Feature Description
Versioning Every save is a new version; old versions are always accessible
Diff tracking Row count delta, added/removed columns recorded per version
pandas & polars Save either, load in whichever library you prefer
No server needed Plain files on disk at ~/.dfstore
Soft delete Hide a DataFrame without destroying data; restore any time
Web UI Built-in browser interface for browsing, uploading, and managing
CLI Full command-line interface for shell scripting and pipelines

Installation

pip install dfstore          # core library + CLI
pip install 'dfstore[gui]'   # + web UI (Flask)