python的urljoin函数情况

[python] 2025-07-06 圈点662

摘要：python的urljoin函数情况

注意以下几个问题：

1，

In Scrapy >=0.24.2, HtmlResponse class does not yet have urljoin() method。

如果要用，应该当成一个比较独立的函数，如下

from urlparse import urljoin

full_url = urlparse.urljoin(response.url, href.extract())

2，

在scrapy 1.0中有关于这个方法的添加，即Add Response.urljoin() helper，说明网圵如下

https://github.com/scrapy/scrapy/pull/1086

3，关于urljoin函数（非方法）的使用问题

#引用函数

from urlparse import urljoin

#做如下测试操作

urljoin("http://www.xoxxoo.com/a/b/c.html", "d.html")
#结果为：'http://www.xoxxoo.com/a/b/d.html'

urljoin("http://www.xoxxoo.com/a/b/c.html", "/d.html")

#结果为：'http://www.xoxxoo.com/d.html'

urljoin("http://www.xoxxoo.com/a/b/c.html", "../d.html")

#结果为：'http://www.xoxxoo.com/a/d.html'

相关内容: