Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding parse and DecodedURL docs #126

Merged
merged 9 commits into from
Apr 16, 2020
36 changes: 34 additions & 2 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,43 @@ Hyperlink API

.. automodule:: hyperlink._url

.. contents::
:local:

Creation
--------

Before you can work with URLs, you must create URLs. There are two
ways to create URLs, from parts and from text.
Before you can work with URLs, you must create URLs.

Parsing Text
^^^^^^^^^^^^

If you already have a textual URL, the easiest way to get URL objects
is with the :func:`parse()` function:

.. autofunction:: hyperlink.parse

By default, :func:`~hyperlink.parse()` returns an instance of
:class:`DecodedURL`, a URL type that handles all encoding for you, by
wrapping the lower-level :class:`URL`.

DecodedURL
^^^^^^^^^^

.. autoclass:: hyperlink.DecodedURL
.. automethod:: hyperlink.DecodedURL.from_text

The Encoded URL
^^^^^^^^^^^^^^^

The lower-level :class:`URL` looks very similar to the
:class:`DecodedURL`, but does not handle all encoding cases for
you. Use with caution.

.. note::

:class:`URL` is also available as an alias,
``hyperlink.EncodedURL`` for more explicit usage.

.. autoclass:: hyperlink.URL
.. automethod:: hyperlink.URL.from_text
mahmoud marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
6 changes: 3 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,17 +39,17 @@ library. The easiest way to install is with pip::

Then, URLs are just an import away::

from hyperlink import URL
import hyperlink

url = URL.from_text(u'http://github.com/python-hyper/hyperlink?utm_source=readthedocs')
url = hyperlink.parse(u'http://github.com/python-hyper/hyperlink?utm_source=readthedocs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'm not a fan of parse, which can return two types, and would recommend that one use URL.from_text or DecodedURL.from_text directly instead, so this change seems like a minus to me, but I might be alone there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think parse() was added to shift recommended practice toward DecodedURLs, because they're safer. parse() returns a DecodedURL by default. I think it also looks pretty nice.


better_url = url.replace(scheme=u'https', port=443)
org_url = better_url.click(u'.')

print(org_url.to_text())
# prints: https://github.com/python-hyper/

print(better_url.get(u'utm_source'))
print(better_url.get(u'utm_source')[0])
# prints: readthedocs

See :ref:`the API docs <hyperlink_api>` for more usage examples.
Expand Down
44 changes: 33 additions & 11 deletions src/hyperlink/_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,18 @@

Usage is straightforward::

>>> from hyperlink import URL
>>> url = URL.from_text(u'http://github.com/mahmoud/hyperlink?utm_source=docs')
>>> import hyperlink
>>> url = hyperlink.parse(u'http://github.com/mahmoud/hyperlink?utm_source=docs')
>>> url.host
u'github.com'
>>> secure_url = url.replace(scheme=u'https')
>>> secure_url.get('utm_source')[0]
u'docs'

As seen here, the API revolves around the lightweight and immutable
:class:`URL` type, documented below.
Hyperlink's API centers on the :class:`DecodedURL` type, which wraps
the lower-level :class:`URL`, both of which can be returned by the
:func:`parse()` convenience function.

""" # noqa: E501

import re
Expand Down Expand Up @@ -1743,13 +1745,23 @@ def remove(

EncodedURL = URL # An alias better describing what the URL really is

_EMPTY_URL = URL()

class DecodedURL(object):
"""DecodedURL is a type meant to act as a higher-level interface to
the URL. It is the `unicode` to URL's `bytes`. `DecodedURL` has
almost exactly the same API as `URL`, but everything going in and
out is in its maximally decoded state. All percent decoding is
handled automatically.
""":class:`DecodedURL` is a type designed to act as a higher-level
interface to :class:`URL` and the recommended type for most
operations. By analogy, :class:`DecodedURL` is the
:class:`unicode` to URL's :class:`bytes`.

:class:`DecodedURL` automatically handles encoding and decoding
all its components, such that all inputs and outputs are in a
maximally-decoded state. Note that this means, for some special
cases, a URL may not "roundtrip" character-for-character, but this
is considered a good tradeoff for the safety of automatic
encoding.

Otherwise, :class:`DecodedURL` has almost exactly the same API as
:class:`URL`.

Where applicable, a UTF-8 encoding is presumed. Be advised that
some interactions can raise :exc:`UnicodeEncodeErrors` and
Expand All @@ -1763,8 +1775,18 @@ class DecodedURL(object):
lazy (bool): Set to True to avoid pre-decode all parts of the URL to
check for validity. Defaults to False.

.. note::

The :class:`DecodedURL` initializer takes a :class:`URL` object,
not URL components, like :class:`URL`. To programmatically
construct a :class:`DecodedURL`, you can use this pattern:

>>> DecodedURL().replace(host='pypi.org', path=('projects', 'hyperlink').to_text()
"http://pypi.org/projects/hyperlink"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any opinion on adding a DecodedURL.build(...) or is this a fine pattern?



"""
def __init__(self, url, lazy=False):
def __init__(self, url=_EMPTY_URL, lazy=False):
mahmoud marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the default? Perhaps merits discussion outside of doc changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, a couple reasons:

  1. It's more parallel with URL(), which can be instantiated without arguments to get an empty URL.
  2. This way to programmatically construct a DecodedURL, you don't need to also import URL. You can do DecodedURL().replace(...).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and the reason for including it in these changes was because I really wanted to avoid people programmatically constructing the newly-exposed DecodedURL by doing

from hyperlink import URL, DecodedURL
DecodedURL(URL(...))

Without realizing that URL's initializer arguments aren't as rigorously decoded/encoded as the rest of DecodedURL's API.

I proposed adding a .build() or .from_parts() if we want to shift to a better pattern. For now it's either that or parse('').replace(...). I figured a default arg at least lets you explicitly use the type without an extra parse()/from_text() step.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_parts() sounds like a good addition. I have written DecodedURL(URL()) — it's a bit awkward. Perhaps I am worrying too much (this is Python, everything is slow...) but I tend to look for APIs that avoid temporaries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK makes sense, and yes to from_parts.

# type: (URL, bool) -> None
self._url = url
if not lazy:
Expand Down Expand Up @@ -2098,7 +2120,7 @@ def parse(url, decoded=True, lazy=False):
decoded (bool): Whether or not to return a :class:`DecodedURL`,
which automatically handles all
encoding/decoding/quoting/unquoting for all the various
accessors of parts of the URL, or an :class:`EncodedURL`,
accessors of parts of the URL, or a :class:`URL`,
which has the same API, but requires handling of special
characters for different parts of the URL.

Expand Down