Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite this project in Python #119

Closed
Mr0grog opened this issue Aug 10, 2017 · 19 comments
Closed

Rewrite this project in Python #119

Mr0grog opened this issue Aug 10, 2017 · 19 comments

Comments

@Mr0grog
Copy link
Member

Mr0grog commented Aug 10, 2017

This is not necessarily a high priority at current, but is here to make sure we budget time to do the work.

At this point, it is fairly clear that the decision to write this project in Ruby was a bad idea based on a incorrect understanding of EDGI’s situation (that legacy code written in Ruby needed to be supported, that those authors would stick around, that other contributors were comfortable with Ruby). There are approximately zero contributors (active or otherwise) to the Web Monitoring project as a whole (besides myself) who are comfortable working with Ruby, let alone experienced in Rails. Further, all other code in the associated projects (and in new experimental work submitted by others, e.g. edgi-govdata-archiving/web-monitoring#36) has been written in either Python or JavaScript, making this the odd technology out that makes understanding and working across the four software projects in web monitoring problematic.

This should probably be Python (3), both because the tools for a database-backed CRUD API like this one are generally more mature and because most of the other back-end stuff we have is also Python.

@Mr0grog
Copy link
Member Author

Mr0grog commented Aug 21, 2017

In the realm of Python, I have a strong preference for pretty standard, well-known tools, e.g. Django or Flask. That said API Star is looking kinda neat, especially for our use case (with a similarly nice documentation front page that gets auto-generated).

@danielballan danielballan removed this from the v1 milestone Oct 9, 2017
@Mr0grog Mr0grog added this to the Backlog milestone Oct 9, 2017
@Mr0grog
Copy link
Member Author

Mr0grog commented Oct 9, 2017

As a side note, we have some serious performance/memory problems here:

screen shot 2017-10-09 at 3 11 19 pm

Doubtless there are things we could be doing slightly better in ActiveRecord’s ORM, but I have spent a fair amount of time on this over the last few months. I think Django’s ORM is a little lighter, but fundamentally, we probably need to avoid the ORM entirely for most of our list-based queries—particularly for pages.

@patcon
Copy link
Member

patcon commented Oct 9, 2017

Are we logging slow queries, or if not, would that be worth doing? If we could drop some logs in an issue, might that be a domain where we could solicit external contributors without too much onboarding.

Happy to reticket if it sounds like a decent approach

@Mr0grog
Copy link
Member Author

Mr0grog commented Oct 9, 2017

We are, but it’s not a slow query—that's the point of the chart. All the time is being spent in Ruby.

@patcon
Copy link
Member

patcon commented Oct 10, 2017

Oh! (Aw man, I am bad at chart-reading.) Wow, my brain just wasn't expecting to read that as ruby

Hm. A little more niche, but maybe someone gets their kicks out of ruby/rails optimization? Or do you think it's marginal benefits, and the better convo is over changing languages?

@Mr0grog
Copy link
Member Author

Mr0grog commented Oct 10, 2017

Heh, you can be forgiven; it’s not what you would expect to see (especially for how little work the Ruby code should be doing on those requests).

Would love to have help/advice from someone who is an expert on that. I don’t know when we are going to get around to this rewrite, so an interim improvement is still worthwhile. And it’s always possible insights could carry over.

@Mr0grog
Copy link
Member Author

Mr0grog commented Jul 27, 2018

Note: this performance stuff has largely been handled.

What we need here is for someone to take a hard look at various Python tools that would be good to rewrite this in and do a clear writeup of the pros & cons (and ideally make a recommendation 😁).

Some obvious ones to look at:

  • Django (the clearest analog to Rails)
  • Flask + SQLAlchemy (or some other ORM/DB layer)
  • Tornado + SQLAlchemy (or some other ORM/DB layer) (we use Tornado for the diff server)
  • API Star + SQLAlchemy (or some other ORM/DB layer)
  • Sanic + SQLAlchemy (or some other ORM/DB layer)

Some important considerations:

  • Having tools that are either commonly used or not too complex is a big bonus — it shouldn’t be too hard for someone new to be able to contribute.
  • Having some tooling for migrations built-in or available with another package is super valuable.
  • We probably want to keep most stuff going through the REST API, but it might be a nice convenience if the DB layer could be shared with other Python code we write elsewhere to enable more complex queries.
  • Our needs are fairly narrow — there is little to no web UI happening here; the only thing we need outside of making a JSON API easy is users & authentication/authorization.

@danielballan
Copy link
Contributor

danielballan commented Jul 31, 2018

At this point, it is fairly clear that the decision to write this project in Ruby was a bad idea based on a incorrect understanding of EDGI’s situation ([...] that other contributors were comfortable with Ruby).

Reflecting on the first contact conversation between @Mr0grog and myself that led to the Rails/Python split, as I walked @jsnshrmn through the history that got us to this point, I think I overestimated my capacity to catch up on Rails, having last touched it in 2013 and in a volunteer capacity. Looking forward, I'm not sure how to correctly prioritize this -- surely at some point after our v0.0.2 Milestone -- but I'm certainly happy to help, and I think it is probably a worthwhile rewrite for the long-term health and maintainability of the project.

@Mr0grog
Copy link
Member Author

Mr0grog commented Jul 31, 2018

I think it’s reasonably important and there are good ways to migrate (e.g. first rewrite some of the tests in Python, so we can use them against both implementations, then start writing parts in Python and proxy them through the Rails app (or vice-versa)).

Either way, what we need first is for someone to take the time to do the research from the above comment. We should have a writeup here of each option and their pros, cons, and impact on transitioning. There’s no reason for someone not to take this and get started on it right away. 0.0.2 is pretty much entirely blocking on me.

I have some basic analysis thoughts, but this is not good enough:

  • Django:

    • 👍 The API server has gotten the most outside contributions, and I’d argue that it’s mostly because it’s written in Rails and is a widely understood and known quantity. The only Python framework that confers similar benefits is Django. That’s a big deal.
    • 👍 Has Django Rest Framework, which could potentially shortcut a lot of things for us (I’ve never used it, but have heard good things from former colleagues).
    • 👎 On the flip side, it’s big, provides a lot of tools we don’t need, and
    • 👎 Has an ORM that only works within Django.
    • 👎 The ORM makes particular demands on table schemas, which may make a piecemeal migration tough.
  • Flask:

    • 👍 I love how light and simple Flask is.
    • 👍 It is also very well known, so it is not too hard to contribute to a mid-sized project that uses it.
    • 👎 But we have to build everything extra we want on top of it.
  • Tornado:

    • 🤷‍♀️ It’s a known quantity, but not as well known as Django or Flask.
    • 👍 We are also already using it in the diffing server, so we have some existing stuff to build from.
    • 👍 Async (but not quite native, see below)
    • 👎 On the other hand, it’s pretty complicated — lots of folks who know Python won’t be familiar at all with the Tornado’s futures and the way it uses generators (you can use the new async/await syntax with it now, but there are still some oddities and limitations due to it’s generator-based legacy).
    • 👎 It is even more limited in what it gives you than Flask.
  • API Star

    • 🤷‍♀️ I totally don’t know much about it other than a few big name folks have said it’s neat
    • 🤷‍♀️ Follows Flask’s API, so will be familiar-ish to many
    • 👍 Native async
    • 👍 Leverages type annotations and introspections to auto-generate API docs
    • 👎 Leverages type annotations, which will be weird and different for a lot of Python devs
    • 👎 We have to build everything extra we want on top of it.
  • Sanic

    • 🤷‍♀️ I totally don’t know much about it other than a few big name folks have said it’s neat
    • 👍 Native async
    • 👍 Supposedly super duper fast
    • 👍 Datasette uses it, and I like Datasette as a good example we can follow of an API-focused data service that is mostly just wrapping a database.
    • 👎 We have to build everything extra we want on top of it.
    • 👎 Not commonly used at all, so far as I know.
  • SQLAlchemy (for use with any of the above except Django)

    • 👍 The main Python SQL ORM out there, so it’s well known
    • 🤷‍♀️ Doesn’t have migrations, but there is a library for migrations on SQLAlchemy (Alembic)
    • 👍 Because it’s not tied to the web framework, we could use our models as a library in other tools could connect directly to the underlying DB and use instead of the HTTP API (buuuuuut we have to be careful, as there is always a danger another library could fail to follow all the right business rules, etc.)
    • 👎 No async support last time I looked. (But entirely possible that’s changed. Please correct me if so.)

I don’t know if there are any good competitors out there to SQLAlchemy that are worth looking at.

@danielballan
Copy link
Contributor

Unofficial sqlalchemy async support (pre-1.0, probably not a good idea, but seems to be the state of the art right now): https://github.com/CanopyTax/asyncpgsa

@Mr0grog
Copy link
Member Author

Mr0grog commented Aug 1, 2018

Added to Django: 👍 Has Django Rest Framework which could potentially shortcut a lot of things for us (I’ve never used it, but have heard good things from former colleagues).

@Mr0grog Mr0grog changed the title Rewrite this project in Python or JavaScript Rewrite this project in Python Aug 2, 2018
@jsnshrmn
Copy link
Contributor

jsnshrmn commented Aug 9, 2018

My thoughts on django based on experience:
👍 It's relatively easy to deal with things outside the ORM via serializers
👍 It's relatively easy to delegate async/batchy things to celery or some other task doer
👍 In practice, its flexibility and breadth of tooling doesn't lead it to be a performance pig
👍 It lets you take advantage of many of the nice feature of postgres without much fuss.
👍 It has lots of contrib tools
👎 which, just like any other contrib ecosystem, is supported very unevenly
👎 When the framework gets updated, be ready to (sometimes substantially) update your code to support the new version or to stop receiving security updates relatively quickly.
👎 There are LTS releases with three years of support, but the current LTS release is the final python 2 release. The best bet would be to go with the latest release so that there's not an impending py2-> py3 migration, but that means you'll be updating/migrating code to support the framework updates more often until the next LTS.

@stale
Copy link

stale bot commented Feb 6, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

@stale stale bot added the stale label Feb 6, 2019
@Mr0grog
Copy link
Member Author

Mr0grog commented Feb 6, 2019

Still gonna have to happen. Still looking at this issue with dread.

@stale stale bot removed the stale label Feb 6, 2019
@Mr0grog
Copy link
Member Author

Mr0grog commented Feb 21, 2019

Ran across the databases package today: it’s an async adapter for postgresql/mysql/sqlite that supports SQLAlchemy. https://github.com/encode/databases

@stale stale bot added the stale label Aug 20, 2019
@Mr0grog Mr0grog removed the stale label Aug 26, 2019
@edgi-govdata-archiving edgi-govdata-archiving deleted a comment from stale bot Aug 26, 2019
@stale
Copy link

stale bot commented Feb 22, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

@stale stale bot added the stale label Feb 22, 2020
@stale stale bot removed the stale label Feb 28, 2020
@stale
Copy link

stale bot commented Aug 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

@stale stale bot added the stale label Aug 27, 2020
@stale stale bot closed this as completed Sep 11, 2020
@Mr0grog
Copy link
Member Author

Mr0grog commented Nov 11, 2020

Bringing this project back to life, but in a slightly different form: rather than porting this project as-is to Python, we should stop focusing on building an API.

Instead, this should focus on:

  • Serve as a back-end to web-monitoring-ui rather than as a generalized API.
    • This might ultimately mean consolidating the two repos, but that’s not a major requirement here.
    • This probably also means integrating the importing, healthcheck, etc. scripts from web-monitoring-processing into this repo.
  • Ensure the database has a good permissions regime so that other data needs can be served by simply giving access directly to the DB.
  • Write some basic SQLAlchemy classes that other Python code can import and use if it really needs to. (Note this implies NOT using Django for this rewrite.)

Nobody is actually using the API externally, but maintaining this project as an API rather than a back-end keeps us constantly bending over backwards and doing a lot of extra work to maintain that possibility.

Re-imagining the goals here makes me slightly less scared of this task, although I am definitely still worried about it.

@stale
Copy link

stale bot commented Jun 3, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

@stale stale bot added the stale label Jun 3, 2021
@stale stale bot closed this as completed Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants