杰瑞科技汇

Python分页如何高效实现?

Of course! Pagination is a crucial technique for handling large datasets in web applications and APIs. It prevents overwhelming the server and the client by breaking down a large result set into smaller, manageable chunks.

Python分页如何高效实现?-图1
(图片来源网络,侵删)

Here's a comprehensive guide to pagination in Python, covering the most common methods, best practices, and complete code examples.


Why is Pagination Important?

Imagine you have a database with a million user records. If you tried to fetch all of them at once:

  • Server: It would consume a massive amount of memory and CPU to process the query.
  • Network: It would send a huge amount of data, causing slow response times and potential timeouts.
  • Client: The browser or application would freeze trying to render a million rows.

Pagination solves this by letting the client request data in "pages" (e.g., 20 items at a time).


The Two Main Pagination Strategies

There are two primary methods for implementing pagination. Each has its pros and cons.

Python分页如何高效实现?-图2
(图片来源网络,侵删)

Offset-based Pagination (SQL LIMIT and OFFSET)

This is the most common and intuitive method, especially for web pages.

How it works: You tell the database: "Skip the first N records and then give me the next M records."

  • LIMIT: The number of items to return per page (e.g., 10).
  • OFFSET: The number of items to skip before starting to collect the result set (e.g., for page 2, you skip 10 items).

SQL Example:

-- Page 1 (items 1-10)
SELECT * FROM products ORDER BY created_at DESC LIMIT 10 OFFSET 0;
-- Page 2 (items 11-20)
SELECT * FROM products ORDER BY created_at DESC LIMIT 10 OFFSET 10;
-- Page 3 (items 21-30)
SELECT * FROM products ORDER BY created_at DESC LIMIT 10 OFFSET 20;

Pros:

Python分页如何高效实现?-图3
(图片来源网络,侵删)
  • Simple: Easy to understand and implement.
  • Stable: If you insert or delete an item in the middle of the list, the pages before that point remain unchanged.

Cons:

  • Performance: As you go to deeper pages (higher OFFSET), the query becomes slower. The database still has to count and skip all the previous rows, even if you don't need them.
  • Inconsistent Data: If new data is added while a user is paginating, they might see the same item on two consecutive pages or miss an item entirely. This is known as the "shifting window" problem.

Keyset Pagination (Cursor-based Pagination)

This is a more performant method, ideal for APIs and infinite scrolling.

How it works: Instead of using an arbitrary row number, you use a unique, sequential value from the table (like an auto-incrementing id or a timestamp) to determine where to start fetching. The client passes the "last seen" key to the server to get the next page.

SQL Example (assuming an id column):

-- Page 1 (get first 10 items)
SELECT * FROM products ORDER BY id ASC LIMIT 10;
# Let's say the last item returned had an id of 10.
-- Page 2 (get items after id 10)
SELECT * FROM products WHERE id > 10 ORDER BY id ASC LIMIT 10;
# Let's say the last item returned had an id of 20.
-- Page 3 (get items after id 20)
SELECT * FROM products WHERE id > 20 ORDER BY id ASC LIMIT 10;

Pros:

  • Extremely Fast: Performance is consistent regardless of how deep you paginate. The database can use an index to jump directly to the starting point.
  • Consistent Data: It's resilient to new data being inserted. You will never see duplicate items or miss items.

Cons:

  • More Complex: Requires a stable, unique, and sequential key.
  • Not for Random Access: You can't easily jump to "page 5" because you need the key from the end of page 4. This makes it less suitable for traditional numbered page navigation.

Implementation Examples

Let's look at how to implement both methods in Python using FastAPI (for the API) and SQLAlchemy (for the database ORM).

Setup

First, let's install the necessary libraries:

pip install "fastapi[all]" sqlalchemy

Here's a basic FastAPI + SQLAlchemy model and setup:

# main.py
from fastapi import FastAPI, HTTPException, Query
from sqlalchemy import create_engine, Column, Integer, String, DateTime
from sqlalchemy.orm import sessionmaker, Session
from sqlalchemy.ext.declarative import declarative_base
from datetime import datetime
import random
# --- Database Setup ---
DATABASE_URL = "sqlite:///./test.db" # Use an in-memory SQLite DB for this example
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base = declarative_base()
# --- Model ---
class Product(Base):
    __tablename__ = "products"
    id = Column(Integer, primary_key=True, index=True)
    name = Column(String, index=True)
    description = Column(String)
    created_at = Column(DateTime, default=datetime.utcnow)
# --- Create DB and some dummy data ---
Base.metadata.create_all(bind=engine)
db = SessionLocal()
if db.query(Product).count() == 0:
    for i in range(1, 101):
        db.add(Product(
            name=f"Product {i}",
            description=f"This is a detailed description for product {i}."
        ))
    db.commit()
db.close()
# --- FastAPI App ---
app = FastAPI()
# Dependency to get DB session
def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

Offset-based Pagination in FastAPI

This is great for classic "Page 1, 2, 3..." navigation.

# Add this to your main.py
@app.get("/products-offset/")
def get_products_offset(
    page: int = Query(1, ge=1, description="The page number"),
    size: int = Query(10, ge=1, le=100, description="The number of items per page"),
    db: Session = Depends(get_db)
):
    """
    Get a page of products using offset-based pagination.
    """
    # Calculate the offset
    offset = (page - 1) * size
    # Get the total count for metadata
    total = db.query(Product).count()
    # Get the items for the current page
    items = db.query(Product).offset(offset).limit(size).all()
    return {
        "page": page,
        "size": size,
        "total_items": total,
        "total_pages": (total + size - 1) // size, # Ceiling division
        "data": items
    }
# To run and test:
# uvicorn main:app --reload
# Go to http://127.0.0.1:8000/docs to see the interactive docs.
# Example calls:
# /products-offset/?page=1&size=10
# /products-offset/?page=2&size=5

Keyset Pagination in FastAPI

This is perfect for APIs and infinite scroll. The client needs to pass the cursor (the last ID it saw).

# Add this to your main.py
@app.get("/products-keyset/")
def get_products_keyset(
    limit: int = Query(10, ge=1, le=100, description="The number of items to return"),
    cursor: int = Query(None, ge=1, description="The ID of the last item seen (for pagination)"),
    db: Session = Depends(get_db)
):
    """
    Get a page of products using keyset (cursor-based) pagination.
    """
    query = db.query(Product)
    # If a cursor is provided, filter for items after it
    if cursor:
        query = query.filter(Product.id > cursor)
    # Order by the key and apply the limit
    items = query.order_by(Product.id).limit(limit).all()
    # Prepare the response
    response_data = {
        "limit": limit,
        "data": items,
    }
    # If we got a full page of items, provide a cursor for the next page
    if len(items) == limit:
        # The cursor for the next page is the ID of the last item
        response_data["next_cursor"] = items[-1].id
    else:
        # No more items
        response_data["next_cursor"] = None
    return response_data
# To test this API:
# /products-keyset/  -> Gets first 10 items. The response will contain "next_cursor": 10
# /products-keyset/?cursor=10 -> Gets next 10 items after ID 10. Response will have "next_cursor": 20
# /products-keyset/?cursor=95 -> Gets last 5 items. Response will have "next_cursor": null

Which One Should You Choose?

Feature Offset-based Keyset-based
Use Case Web dashboards, admin panels, traditional "Next/Prev" buttons. APIs, infinite scroll feeds, mobile apps, performance-critical systems.
Performance Degrades on deep pages. Consistently fast, regardless of page depth.
Stability Can show duplicates/miss items if data changes. Highly stable; no duplicates or missed items.
Implementation Very simple. Slightly more complex.
Random Access Easy (jump to any page number). Not possible without knowing the previous cursor.

Recommendation:

  • For most web applications where users navigate through numbered pages, offset-based pagination is perfectly fine and easier to implement.
  • For APIs and any application where performance and data consistency are paramount, especially with large datasets, keyset pagination is the superior choice.
分享:
扫描分享到社交APP
上一篇
下一篇