杰瑞科技汇

Python dateparser如何解析日期?

Of course! dateparser is a fantastic and powerful Python library designed to solve a common problem: parsing date strings from any natural language into standard Python datetime objects.

Python dateparser如何解析日期?-图1
(图片来源网络,侵删)

It's incredibly useful because it can handle a wide variety of formats, languages, and even fuzzy or ambiguous dates that other libraries like datetime.strptime would fail on.


Installation

First, you need to install the library. It's available on PyPI.

pip install dateparser

Basic Usage

The core function is dateparser.parse(). You give it a string, and it tries its best to return a datetime object.

import dateparser
# --- Simple English Dates ---
date_string_1 = "December 25, 2025"
dt1 = dateparser.parse(date_string_1)
print(f"String: '{date_string_1}' -> Datetime: {dt1}")
# Output: String: 'December 25, 2025' -> Datetime: 2025-12-25 00:00:00
date_string_2 = "in 2 hours"
dt2 = dateparser.parse(date_string_2)
print(f"String: '{date_string_2}' -> Datetime: {dt2}")
# Output: String: 'in 2 hours' -> Datetime: 2025-10-27 14:30:00 (assuming current time is 12:30)
date_string_3 = "yesterday"
dt3 = dateparser.parse(date_string_3)
print(f"String: '{date_string_3}' -> Datetime: {dt3}")
# Output: String: 'yesterday' -> Datetime: 2025-10-26 00:00:00 (assuming today is 2025-10-27)

Key Features and Strengths

a) Multiple Languages

dateparser supports dozens of languages out of the box.

Python dateparser如何解析日期?-图2
(图片来源网络,侵删)
# --- Spanish, French, German ---
spanish_date = "23 de enero de 2025"
french_date = "vendredi 12 octobre 2025"
german_date = "Montag, 15. Juli 2025"
print(dateparser.parse(spanish_date)) # Output: 2025-01-23 00:00:00
print(dateparser.parse(french_date))  # Output: 2025-10-12 00:00:00
print(dateparser.parse(german_date))  # Output: 2025-07-15 00:00:00

b) Handling Relative Dates

It understands common relative time expressions.

relative_dates = [
    "3 days ago",
    "last week",
    "next month",
    "in a year",
    "2 hours from now"
]
for date_str in relative_dates:
    print(f"'{date_str}' -> {dateparser.parse(date_str)}")

c) Fuzzy Dates and Ambiguity

It can make intelligent guesses for ambiguous formats. For example, 01/02/2025 could be Jan 2nd or Feb 1st, depending on locale.

# In the US (MM/DD/YYYY)
ambiguous_date_us = "03/04/2025"
dt_us = dateparser.parse(ambiguous_date_us, settings={'DATE_ORDER': 'MD'})
print(f"US Interpretation (MD): {dt_us}") # Output: 2025-03-04 00:00:00
# In many European countries (DD/MM/YYYY)
ambiguous_date_eu = "03/04/2025"
dt_eu = dateparser.parse(ambiguous_date_eu, settings={'DATE_ORDER': 'DM'})
print(f"EU Interpretation (DM): {dt_eu}") # Output: 2025-04-03 00:00:00

d) Timezones

It can parse dates with timezone information and convert them to UTC.

date_with_tz = "2025-10-27 10:00:00 -0500" # US Central Time
dt_with_tz = dateparser.parse(date_with_tz)
print(f"With Timezone: {dt_with_tz}") # Output: 2025-10-27 15:00:00+00:00 (converted to UTC)

Advanced Usage with settings

The dateparser.parse() function accepts a settings dictionary, which is crucial for controlling its behavior.

from dateparser import parse
# --- Custom Date Order ---
# Force a specific date order to avoid ambiguity
date_str = "01/02/2025"
dt_dm = parse(date_str, settings={'DATE_ORDER': 'DM'}) # Day-Month
dt_md = parse(date_str, settings={'DATE_ORDER': 'MD'}) # Month-Day
print(f"DM order: {dt_dm}") # Output: 2025-02-01 00:00:00
print(f"MD order: {dt_md}") # Output: 2025-01-02 00:00:00
# --- Setting a Default Timezone ---
# If a date has no timezone, assign one instead of returning a naive datetime.
from dateparser import parse
from datetime import datetime
date_str_no_tz = "2025-10-27 15:00:00"
default_tz = "Europe/Berlin" # Use pytz or zoneinfo for timezone objects
dt_with_default_tz = parse(
    date_str_no_tz,
    settings={'TIMEZONE': default_tz}
)
print(f"With default timezone: {dt_with_default_tz}")
# Output: 2025-10-27 15:00:00+02:00 (CEST)
# --- Handling Relative Dates ---
# By default, 'today' is at 00:00:00. You can change this.
dt_relative = parse("today", settings={'RELATIVE_BASE': datetime.now()})
print(f"Relative to now: {dt_relative}")
# Output: 2025-10-27 14:30:00 (if current time is 14:30)
# --- Strict Mode ---
# Only parse dates that are unambiguous.
dt_strict = parse("01/02/2025", settings={'STRICT_PARSING': True})
print(f"Strict parsing result: {dt_strict}")
# Output: None (because it's ambiguous without more context)

When NOT to Use dateparser

While powerful, dateparser is not always the right tool.

Use dateparser when... Consider a different library when...
You are dealing with user-generated text (e.g., forms, comments, logs). You have a known, fixed format. datetime.strptime is much faster and more precise.
The date string is in a natural language or an unknown format. Performance is absolutely critical (e.g., parsing millions of dates in a loop).
The date string might be in multiple languages. You need to parse dates in a very specific, non-standard format that dateparser might guess wrong.
You need to handle relative dates like "tomorrow" or "3 days ago". You are working in a performance-critical environment and can afford to pre-validate formats.

Alternatives

  1. dateutil.parser: The engine that dateparser is built on. It's very powerful but less focused on natural language (e.g., it won't parse "yesterday"). It's a great, lightweight alternative.

    from dateutil import parser
    dt = parser.parse("25-Dec-2025")
    print(dt) # Output: 2025-12-25 00:00:00
  2. datetime.strptime: The standard library's method. It's fast but very strict. You must know the exact format string in advance.

    from datetime import datetime
    dt = datetime.strptime("2025-12-25", "%Y-%m-%d")
    print(dt) # Output: 2025-12-25 00:00:00
  3. maya: Another excellent library built on dateutil that focuses heavily on human-readable, timezone-aware dates. It's a great alternative with a slightly different API.

Summary

dateparser is your go-to tool for messy, real-world date data. Its ability to handle different languages, relative times, and ambiguous formats makes it incredibly valuable for data cleaning, web scraping, and processing user input. Just be mindful of its performance overhead and use stricter, faster methods when your data format is guaranteed.

分享:
扫描分享到社交APP
上一篇
下一篇