杰瑞科技汇

Python曲线拟合如何选择最佳模型?

Of course! Curve fitting in Python is a fundamental task in data analysis and scientific computing. It's the process of finding a mathematical function (a "curve") that best represents a set of data points.

Python曲线拟合如何选择最佳模型?-图1
(图片来源网络,侵删)

This guide will cover:

  1. The Core Idea: What is curve fitting?
  2. The Main Tools: The essential Python libraries (NumPy, SciPy, Matplotlib).
  3. Linear Curve Fitting: The simplest case, using scipy.optimize.curve_fit.
  4. Non-Linear Curve Fitting: A more powerful and common approach.
  5. Advanced Fitting: Using lmfit for better control and statistics.
  6. Practical Example: Fitting a real-world dataset.
  7. Choosing the Right Method: A summary of when to use what.

The Core Idea: What is Curve Fitting?

Imagine you have some experimental data. You suspect there's an underlying relationship, but the data is noisy. Curve fitting helps you:

  • Identify the relationship: Find the mathematical formula that connects your variables.
  • Simplify data: Replace a noisy dataset with a smooth, predictive function.
  • Make predictions: Use the fitted function to estimate values at points you haven't measured.

The "best fit" is typically defined as the curve that minimizes the sum of the squares of the residuals (the differences between the observed data points and the values predicted by the curve). This is called Ordinary Least Squares (OLS).


The Main Tools

You'll primarily use three libraries:

Python曲线拟合如何选择最佳模型?-图2
(图片来源网络,侵删)
  • NumPy: The fundamental package for numerical computation in Python. It's used to handle your data as arrays.
  • Matplotlib: The go-to library for plotting. It's essential for visualizing your data and the fitted curve.
  • SciPy: A library for scientific and technical computing. Its scipy.optimize module contains the powerful curve_fit function, which is the workhorse for non-linear curve fitting.

Installation: If you don't have them installed, open your terminal or command prompt and run:

pip install numpy matplotlib scipy

Linear Curve Fitting

This is a special case where the model you're fitting is a linear combination of its parameters (e.g., y = mx + c). While curve_fit can handle this, a simpler and more direct method is numpy.polyfit.

Example: Fitting a straight line

Let's fit a line y = mx + c to some data.

import numpy as np
import matplotlib.pyplot as plt
# 1. Generate some sample data with noise
np.random.seed(0)
x_data = np.linspace(0, 10, 50)
y_data = 2.5 * x_data + 1.0 + np.random.normal(0, 2.0, len(x_data))
# 2. Fit a 1st-degree polynomial (a straight line: y = mx + c)
# polyfit returns the coefficients [m, c]
coefficients = np.polyfit(x_data, y_data, 1)
m, c = coefficients
print(f"Fitted equation: y = {m:.2f}x + {c:.2f}")
# 3. Create the fitted line for plotting
y_fit = np.polyval(coefficients, x_data)
# 4. Plot the results
plt.figure(figsize=(8, 6))
plt.scatter(x_data, y_data, label='Original Data', color='blue')
plt.plot(x_data, y_fit, 'r-', label=f'Fitted Line: y = {m:.2f}x + {c:.2f}')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')'Linear Curve Fitting with NumPy')
plt.legend()
plt.grid(True)
plt.show()

Non-Linear Curve Fitting with scipy.optimize.curve_fit

This is the most common and powerful method. You define the function you want to fit, and curve_fit adjusts its parameters to best match your data.

Python曲线拟合如何选择最佳模型?-图3
(图片来源网络,侵删)

Example: Fitting an exponential curve y = a * e^(b*x)

The key is to define your model function before calling curve_fit.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# 1. Define the model function
# It must take the independent variable as the first argument and the parameters to fit as subsequent arguments.
def exponential_model(x, a, b):
    """An exponential function: y = a * e^(b*x)"""
    return a * np.exp(b * x)
# 2. Generate some sample data with noise
np.random.seed(42)
x_data = np.linspace(0, 4, 50)
y_data = 2.5 * np.exp(1.5 * x_data) + np.random.normal(0, 8.0, len(x_data))
# 3. Perform the curve fit
# p0 is an optional initial guess for the parameters [a, b]. It helps the algorithm converge.
popt, pcov = curve_fit(exponential_model, x_data, y_data, p0=[1, 1])
# popt contains the optimal parameters (a, b)
# pcov is the estimated covariance of popt
a_opt, b_opt = popt
print(f"Optimal parameters: a = {a_opt:.2f}, b = {b_opt:.2f}")
# 4. Create the fitted curve for plotting
y_fit = exponential_model(x_data, a_opt, b_opt)
# 5. Plot the results
plt.figure(figsize=(8, 6))
plt.scatter(x_data, y_data, label='Original Data', color='blue')
plt.plot(x_data, y_fit, 'r-', label=f'Fitted Curve: y = {a_opt:.2f} * e^({b_opt:.2f}x)')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')'Non-Linear Curve Fitting with SciPy')
plt.legend()
plt.grid(True)
plt.show()

Understanding the Output: popt and pcov

  • popt (Optimal Parameters): A NumPy array containing the best-fit values for the parameters of your function.
  • pcov (Parameter Covariance): A 2D array that describes the estimated uncertainty in the parameters. The diagonal elements pcov[i, i] are the variances of the parameter popt[i]. The standard error can be approximated by taking the square root of the diagonal elements:
    perr = np.sqrt(np.diag(pcov))
    print(f"Standard errors: a = {perr[0]:.2f}, b = {perr[1]:.2f}")

Advanced Fitting with lmfit

While scipy.optimize.curve_fit is powerful, the lmfit library provides a more user-friendly, high-level interface. It's especially useful for complex models because it:

  • Provides better statistics on the fit (reduced chi-square, AIC, BIC).
  • Handles parameter bounds and constraints easily.
  • Offers a wide range of built-in models (Gaussian, Lorentzian, etc.).

First, install lmfit:

pip install lmfit

Example: Fitting the same exponential curve with lmfit

import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
# 1. Generate the same sample data
np.random.seed(42)
x_data = np.linspace(0, 4, 50)
y_data = 2.5 * np.exp(1.5 * x_data) + np.random.normal(0, 8.0, len(x_data))
# 2. Create a model from your function
# lmfit automatically infers the parameters from the function signature.
exponential_model_lmfit = Model(exponential_model)
# 3. Set initial guesses and parameter constraints (optional)
# This is much cleaner than p0.
params = exponential_model_lmfit.make_params(a=1, b=1)
# You can also set bounds
params['a'].min = 0 # 'a' cannot be negative
# params['b'].max = 0 # 'b' cannot be positive
# 4. Perform the fit
# The fit method is more descriptive and returns a rich FitResult object.
result = exponential_model_lmfit.fit(y_data, params, x=x_data)
# 5. Print a comprehensive report
print(result.fit_report())
# 6. Plot the results
plt.figure(figsize=(8, 6))
plt.scatter(x_data, y_data, label='Original Data', color='blue')
# The result object has a built-in plot method for the best fit
result.plot_fit(ax=plt.gca(), show_init=False)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')'Non-Linear Curve Fitting with lmfit')
plt.legend()
plt.grid(True)
plt.show()

The result.fit_report() is extremely useful, giving you the optimal parameters, their standard errors, and various goodness-of-fit statistics in one place.


Practical Example: Fitting a Real-World Dataset (Gaussian Peak)

Let's say we

分享:
扫描分享到社交APP
上一篇
下一篇