Python3.6中urlparse如何正确使用？-杰瑞科技汇

Of course. The urlparse module in Python is a standard library tool for breaking down (parsing) Uniform Resource Locators (URLs) into their constituent parts.

Since you specified Python 3.6, it's important to note that urlparse is part of the standard library and has been stable for a very long time. The functionality in Python 3.6 is the same as in modern versions, though modern Python has added more features like urllib.parse.urljoin for relative URLs and urllib.parse.urlsplit which is slightly different.

Here's a comprehensive guide to using urlparse in Python 3.6.

The Basics: `urlparse()`

The main function is urlparse(). It takes a URL string and returns a special object called a ParseResult.

How to Import

from urllib.parse import urlparse

What it Returns

A ParseResult object is a named tuple with six attributes:

scheme: The protocol (e.g., http, https, ftp).
netloc: The network location (domain name and port).
path: The path to the resource on the server.
params: Parameters for the last path element (rarely used).
query: The query string, which contains key-value pairs.
fragment: The fragment identifier, used to link to a specific part of a page.

Example: Parsing a Simple URL

Let's break down a common URL.

from urllib.parse import urlparse
# A typical URL
url = "https://www.example.com:8080/path/to/page?query=search#section1"
# Parse the URL
parsed_url = urlparse(url)
# The result is a ParseResult object
print(f"Full ParseResult object: {parsed_url}\n")
# You can access each part by its attribute name
print(f"Scheme:      {parsed_url.scheme}")
print(f"Netloc:      {parsed_url.netloc}")
print(f"Path:        {parsed_url.path}")
print(f"Params:      {parsed_url.params}") # Often empty
print(f"Query:       {parsed_url.query}")
print(f"Fragment:    {parsed_url.fragment}")

Output:

Full ParseResult object: ParseResult(scheme='https', netloc='www.example.com:8080', path='/path/to/page', params='', query='query=search', fragment='section1')
Scheme:      https
Netloc:      www.example.com:8080
Path:        /path/to/page
Params:
Query:       query=search
Fragment:    section1

The `netloc` Attribute: Deeper Dive

The netloc (network location) part often contains more than just the domain name. You might need to separate the domain, port, and user info.

from urllib.parse import urlparse
url_with_auth = "ftp://user:pass@sub.domain.com:21/path/to/file"
parsed = urlparse(url_with_auth)
netloc = parsed.netloc
print(f"Full netloc: '{netloc}'")
print(f"Username:    '{parsed.username}'") # A convenient attribute
print(f"Password:    '{parsed.password}'") # A convenient attribute
print(f"Hostname:    '{parsed.hostname}'") # A convenient attribute
print(f"Port:        {parsed.port}")        # Returns an integer or None

Output:

Full netloc: 'user:pass@sub.domain.com:21'
Username:    'user'
Password:    'pass'
Hostname:    'sub.domain.com'
Port:        21

Note: parsed.port is very useful because it automatically converts the port number from a string to an integer. If no port is specified, it returns None.

Working with the Query String (`query`)

The query string is a list of key=value pairs separated by &. The urlparse module provides parse_qs and parse_qsl to handle this.

`parse_qs()`: Parses into a Dictionary

It returns a dictionary where keys are the parameter names and values are lists of all values for that key (since a key can appear multiple times).

from urllib.parse import urlparse, parse_qs
url = "https://example.com/search?q=python&sort=desc&q=urlparse"
parsed = urlparse(url)
query_string = parsed.query
print(f"Original query string: '{query_string}'")
# Parse the query string into a dictionary
query_params = parse_qs(query_string)
print(f"Parsed query params:  {query_params}")
# Accessing a specific value
# Note that the value is always a list!
search_terms = query_params['q']
print(f"\nThe value for 'q' is a list: {search_terms}")
print(f"The first search term is: {search_terms[0]}")

Output:

Original query string: 'q=python&sort=desc&q=urlparse'
Parsed query params:  {'q': ['python', 'urlparse'], 'sort': ['desc']}
The value for 'q' is a list: ['python', 'urlparse']
The first search term is: python

`parse_qsl()`: Parses into a List of Tuples

This is useful if you need to preserve the order of the parameters or if you prefer working with a simple list of (key, value) tuples.

from urllib.parse import parse_qsl
url = "https://example.com/search?q=python&sort=desc&q=urlparse"
query_string = urlparse(url).query
# Parse the query string into a list of tuples
query_list = parse_qsl(query_string)
print(f"Parsed query list: {query_list}")

Output:

Parsed query list: [('q', 'python'), ('sort', 'desc'), ('q', 'urlparse')]

Reconstructing a URL (`urlunparse`)

The urlunparse function does the reverse of urlparse. It takes a ParseResult (or a 6-tuple) and reconstructs a URL string.

from urllib.parse import urlunparse
# Create a new ParseResult object
# Note: The 'params' part is included but is often an empty string.
new_url_parts = (
    'https',      # scheme
    'new.site.com', # netloc
    '/api/v1/data', # path
    '',           # params
    'id=123&format=json', # query
    'results'     # fragment
)
# Reconstruct the URL
reconstructed_url = urlunparse(new_url_parts)
print(reconstructed_url)

Output:

https://new.site.com/api/v1/data?id=123&format=json#results

Common Pitfalls and Best Practices

a. Relative URLs

urlparse works on absolute URLs. If you have a relative URL like /path/to/page, the scheme and netloc will be empty.

relative_url = "/path/to/page?query=1"
parsed = urlparse(relative_url)
print(parsed.scheme) # Output: ''
print(parsed.netloc) # Output: ''
print(parsed.path)   # Output: '/path/to/page'

To resolve relative URLs against a base URL, use urllib.parse.urljoin().

b. Malformed URLs

urlparse is very forgiving. It will try its best to parse even malformed URLs without raising an error.

# A URL with an invalid scheme and no slashes
bad_url = "mailto:someone@example.com?subject=Hello"
parsed = urlparse(bad_url)
print(parsed)

Output:

ParseResult(scheme='mailto', netloc='', path='someone@example.com', params='', query='subject=Hello', fragment='')

This is expected behavior. The scheme is mailto, and the rest of the string is treated as the path.

Summary Table

Function	Purpose	Example
`urlparse(url)`	Breaks a URL string into 6 components.	`p = urlparse("https://example.com/path?q=val")`
`p.scheme`	Gets the protocol (e.g., `https`).	`p.scheme` -> `'https'`
`p.netloc`	Gets the domain and port.	`p.netloc` -> `'example.com'`
`p.path`	Gets the resource path.	`p.path` -> `'/path'`
`p.query`	Gets the query string.	`p.query` -> `'q=val'`
`p.fragment`	Gets the fragment identifier.	`p.fragment` ->
`parse_qs(p.query)`	Parses query string into a dictionary.	`parse_qs('q=val&q=val2')` -> `{'q': ['val', 'val2']}`
`parse_qsl(p.query)`	Parses query string into a list of tuples.	`parse_qsl('q=val&q=val2')` -> `[('q', 'val'), ('q', 'val2')]`
`urlunparse(parts)`	Reconstructs a URL from a 6-tuple.	`urlunparse(('https', 'ex.com', '/', '', '', ''))`

Python3.6中urlparse如何正确使用？

The Basics: `urlparse()`

How to Import

What it Returns

Example: Parsing a Simple URL

The `netloc` Attribute: Deeper Dive

Working with the Query String (`query`)

`parse_qs()`: Parses into a Dictionary

`parse_qsl()`: Parses into a List of Tuples

Reconstructing a URL (`urlunparse`)

Common Pitfalls and Best Practices

a. Relative URLs

b. Malformed URLs

Summary Table

99ANYc3cd6

static关键字在Java中到底有何特殊作用？

Java如何判断字符串含特殊字符？

如何在Python中实现Holt-Winters预测模型？

Python教程视频教程哪个好？

Visual C++教程怎么学？从哪开始学？

Java Socket编程实例具体怎么实现？

C++ Builder 6程序设计教程适合新手入门吗？

Eclipse for Java怎么用？入门到精通技巧有哪些？

夏宇闻Verilog教程，数字系统设计如何入门？

Python VSCode中Pylint如何配置与使用？

Python如何对接Zabbix实现syslog日志监控？

Python如何重载已加载的模块？

Python interpreter路径在哪找？

ImageReady教程，新手如何快速上手？

Visual FoxPro程序设计教程该怎么学？

Python 2.7.13与OpenCV如何兼容使用？

Python3.6中urlparse如何正确使用？

The Basics: urlparse()

How to Import

What it Returns

Example: Parsing a Simple URL

The netloc Attribute: Deeper Dive

Working with the Query String (query)

parse_qs(): Parses into a Dictionary

parse_qsl(): Parses into a List of Tuples

Reconstructing a URL (urlunparse)

Common Pitfalls and Best Practices

a. Relative URLs

b. Malformed URLs

Summary Table

相关推荐

Java Socket编程实例具体怎么实现？

The Basics: `urlparse()`

The `netloc` Attribute: Deeper Dive

Working with the Query String (`query`)

`parse_qs()`: Parses into a Dictionary

`parse_qsl()`: Parses into a List of Tuples

Reconstructing a URL (`urlunparse`)