Skip to content

Commit

Permalink
Merge pull request #47 from remix/v1.0.0
Browse files Browse the repository at this point in the history
v1.0
  • Loading branch information
invisiblefunnel authored Dec 18, 2018
2 parents 8b67dae + f94595c commit 8813824
Show file tree
Hide file tree
Showing 26 changed files with 1,178 additions and 1,170 deletions.
12 changes: 12 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[flake8]
exclude =
.eggs
.git
__pycache__
build
dist
docs
scratch
venv

max-line-length = 100
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,7 @@ venv/

scratch/
.DS_Store
.pytest_cache
.ipynb_checkpoints/*
*.ipynb
.mypy_cache
9 changes: 2 additions & 7 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,9 @@
language: python
python:
- 3.6
- 3.5
- 2.7
# TODO(DW) fix
# - 3.4
# - 3.3

# command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors
install: pip install -U tox-travis flake8
install: python setup.py install && pip install -U black flake8 mypy

# command to run tests, e.g. python setup.py test
script: tox && make lint
script: make test
8 changes: 4 additions & 4 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,13 @@ Ready to contribute? Here's how to set up `partridge` for local development.

Now you can make your changes locally.

5. When you're done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox::
5. When you're done making changes, check that your changes pass flake8 and the tests::

$ flake8 partridge tests
$ python setup.py test or py.test
$ tox

To get flake8 and tox, just pip install them into your virtualenv.
To get flake8, just pip install it into your virtualenv.

6. Commit your changes and push your branch to GitHub::

Expand All @@ -101,7 +101,7 @@ Before you submit a pull request, check that it meets these guidelines:
2. If the pull request adds functionality, the docs should be updated. Put
your new functionality into a function with a docstring, and add the
feature to the list in README.rst.
3. The pull request should work for Python 2.6, 2.7, 3.3, 3.4 and 3.5, and for PyPy. Check
3. The pull request should work for Python 3.6+. Check
https://travis-ci.org/remix/partridge/pull_requests
and make sure that the tests pass for all supported Python versions.

Expand All @@ -110,5 +110,5 @@ Tips

To run a subset of tests::

$ py.test tests.test_partridge
$ py.test tests.test_feed

27 changes: 24 additions & 3 deletions HISTORY.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,31 @@
History
=======

1.0.0 (2018-12-18)
------------------

This release is a combination of major internal refactorings and some minor interface changes. Overall, you should expect your upgrade from pre-1.0 versions to be relatively painless. A big thank you to @genhernandez and @csb19815 for their valuable design feedback.

Here is a list of interface changes:

* The class ``partridge.gtfs.feed`` has been renamed to ``partridge.gtfs.Feed``.
* The public interface for instantiating feeds is ``partridge.load_feed``. This function replaces the previously undocumented function ``partridge.get_filtered_feed``.
* A new function has been added for identifying the busiest week in a feed: ``partridge.read_busiest_date``
* The public function ``partridge.get_representative_feed`` has been removed in favor of using ``partridge.read_busiest_date`` directly.
* The public function ``partridge.writers.extract_feed`` is now available via the top level module: ``partridge.extract_feed``.

Miscellaneous minor changes:

* Character encoding detection is now done by the ``cchardet`` package instead of ``chardet``. ``cchardet`` is faster, but may not always return the same result as ``chardet``.
* Zip files are unpacked into a temporary directory instead of reading directly from the zip. These temporary directories are cleaned up when the feed is garbage collected or when the process exits.
* The code base is now annotated with type hints and the build runs ``mypy`` to verify the types.
* DataFrames are cached in a dictionary instead of the ``functools.lru_cache`` decorator.
* The ``partridge.extract_feed`` function now writes files concurrently to improve performance.


0.11.0 (2018-08-01)
-------------------

* Fix major performance issue related to encoding detection. Thank you to @cjer for reporting the issue and advising on a solution.


Expand All @@ -23,9 +46,7 @@ History
0.8.0 (2018-03-14)
------------------

* Gracefully handle completely empty files. This change unifies the behavior of reading from a CSV
with a header only (no data rows) and a completely empty (zero bytes)
file in the zip.
* Gracefully handle completely empty files. This change unifies the behavior of reading from a CSV with a header only (no data rows) and a completely empty (zero bytes) file in the zip.


0.7.0 (2018-03-09)
Expand Down
16 changes: 10 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,16 +53,20 @@ dependency-graph.png:

dot: dependency-graph.png

lint: ## check style with flake8
flake8 partridge tests
black:
black partridge tests

lint: ## check style with black
black --check --diff partridge tests
flake8

type-check:
mypy partridge --ignore-missing-imports

## run tests quickly with the default Python
test: lint
test: lint type-check
py.test

test-all: ## run tests on every Python version with tox
tox

coverage: ## check code coverage quickly with the default Python
coverage run --source partridge -m pytest
coverage report -m
Expand Down
119 changes: 85 additions & 34 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
=========
Partridge
=========

Expand All @@ -11,9 +12,11 @@ Partridge

Partridge is python library for working with `GTFS <https://developers.google.com/transit/gtfs/>`__ feeds using `pandas <https://pandas.pydata.org/>`__ DataFrames.

The implementation of Partridge is heavily influenced by our experience at `Remix <https://www.remix.com/>`__ ingesting, analyzing, and debugging thousands of GTFS feeds from hundreds of agencies.
Partridge is heavily influenced by our experience at `Remix <https://www.remix.com/>`__ analyzing and debugging every GTFS feed we could find.

At the core of Partridge is a dependency graph rooted at ``trips.txt``. Disconnected data is pruned away according to this graph when reading the contents of a feed.

At the core of Partridge is a dependency graph rooted at ``trips.txt``. Disconnected data is pruned away according to this graph when reading the contents of a feed. The root node can optionally be filtered to create a view of the feed specific to your needs. It's most common to filter a feed down to specific dates (``service_id``), routes (``route_id``), or both.
Feeds can also be filtered to create a view specific to your needs. It's most common to filter a feed down to specific dates (``service_id``) or routes (``route_id``), but any field can be filtered.

.. figure:: dependency-graph.png
:alt: dependency graph
Expand All @@ -36,57 +39,112 @@ The design of Partridge is guided by the following principles:
- Do anything other than efficiently read GTFS files into DataFrames
- Take an opinion on the GTFS spec


Installation
------------

.. code:: console
pip install partridge
Usage
-----

**Reading a feed**
**Setup**

.. code:: python
import datetime
import partridge as ptg
path = 'path/to/sfmta-2017-08-22.zip'
inpath = 'path/to/caltrain-2017-07-24/'
Inspecting the calendar
~~~~~~~~~~~~~~~~~~~~~~~


**The date with the most trips**

.. code:: python
date, service_ids = ptg.read_busiest_date(inpath)
# datetime.date(2017, 7, 17), frozenset({'CT-17JUL-Combo-Weekday-01'})
**The week with the most trips**


.. code:: python
service_ids_by_date = ptg.read_busiest_week(inpath)
# {datetime.date(2017, 7, 17): frozenset({'CT-17JUL-Combo-Weekday-01'}),
# datetime.date(2017, 7, 18): frozenset({'CT-17JUL-Combo-Weekday-01'}),
# datetime.date(2017, 7, 19): frozenset({'CT-17JUL-Combo-Weekday-01'}),
# datetime.date(2017, 7, 20): frozenset({'CT-17JUL-Combo-Weekday-01'}),
# datetime.date(2017, 7, 21): frozenset({'CT-17JUL-Combo-Weekday-01'}),
# datetime.date(2017, 7, 22): frozenset({'CT-17JUL-Caltrain-Saturday-03'}),
# datetime.date(2017, 7, 23): frozenset({'CT-17JUL-Caltrain-Sunday-01'})}
**Dates with active service**

.. code:: python
service_ids_by_date = ptg.read_service_ids_by_date(path)
date = datetime.date(2017, 9, 25)
service_ids = service_ids_by_date[date]
date, service_ids = min(service_ids_by_date.items())
# (datetime.date(2017, 7, 15), frozenset({'CT-17JUL-Caltrain-Saturday-03'}))
feed = ptg.feed(path, view={
'trips.txt': {
'service_id': service_ids,
'route_id': '12300',
},
})
date, service_ids = max(service_ids_by_date.items())
# (datetime.date(2019, 7, 20), frozenset({'CT-17JUL-Caltrain-Saturday-03'}))
assert service_ids == set(feed.trips.service_id)
len(feed.stops)
# 88
**Dates with identical service**


.. code:: python
feed.routes.head()
# route_id agency_id route_short_name route_long_name route_desc route_type \
# 12300 SFMTA 18 46TH AVENUE NaN 3
#
# route_url route_color route_text_color
# NaN NaN NaN
dates_by_service_ids = ptg.read_dates_by_service_ids(inpath)
busiest_date, busiest_service = ptg.read_busiest_date(inpath)
dates = dates_by_service_ids[busiest_service]
min(dates), max(dates)
# datetime.date(2017, 7, 17), datetime.date(2019, 7, 19)
Reading a feed
~~~~~~~~~~~~~~


**Extracting a new feed**

.. code:: python
import partridge as ptg
_date, service_ids = ptg.read_busiest_date(inpath)
view = {
'trips.txt': {'service_id': service_ids},
'stops.txt': {'stop_name': 'Gilroy Caltrain'},
}
feed = ptg.load_feed(path, view)
Extracting a new feed
~~~~~~~~~~~~~~~~~~~~~

.. code:: python
inpath = 'gtfs.zip'
outpath = 'gtfs-slim.zip'
date, service_ids = ptg.read_busiest_date(inpath)
view = {'trips.txt': {'service_id': service_ids}}
ptg.writers.extract_feed(inpath, outpath, {'trips.txt': {'service_id': service_ids}})
ptg.extract_feed(inpath, outpath, view)
feed = ptg.load_feed(outpath)
assert service_ids == set(ptg.feed(outpath).trips.service_id)
assert service_ids == set(feed.trips.service_id)
Features
Expand All @@ -100,13 +158,6 @@ Features
- Handle nested folders and bad data in zips
- Predictable type conversions

Installation
------------

.. code:: console
pip install partridge
Thank You
---------

Expand Down
Loading

0 comments on commit 8813824

Please sign in to comment.