In this file: Decisions made when building Talkyard, about how to do things, and how Not do things.
Usage:
1) Before adding any 3rd party lib, e.g. react-transition-group
, search this file
for the name of that lib, and related words, e.g. "animations".
Because maybe a decision was made to Not use that lib
and do things in other ways instead.
And it’d be annoying if these decisions got forgotten so that years later,
things got done in the bad ways.
2) Before making any non-trivial decision, search this file for related words, and see if there’re related thoughts and previous decisions here, already.
Use Webdriver.io for end-to-end tests because:
-
It supports Multiremote, i.e. running many browsers at the same time in the same test, useful for testing e.g. the chat system.
-
It has synchronous commands: console.log(browser.getTitle()) instead of browser.getTitle().then(title ⇒ console.log(title)) (or
async await
everywhere?). -
It is under active development (even more nowadays 2020-11).
2020-05-16: Use Asciidoc for docs, not Markdown or CommonMark.
Why? See e.g.: https://news.ycombinator.com/item?id=18848278: "I’ve entirely replaced markdown and been using asciidoctor for both documentation and general writing. Asciidoctor is far superior to Markdown [ …]". You can also websearch: "site:news.ycombinator.com markdown vs asciidoc".
As of 2020-05 most Ty docs are in Markdown. But don’t add any more Markdown docs.
And some day, can run sth like:
find . -type f -name '*.md' -exec pandoc -f markdown -t asciidoc {} \;
to convert from .md to .adoc. https://news.ycombinator.com/item?id=22875242
2020-05-16: Don’t add react-transition-group
or any other animations lib. [REACTANIMS]
Only use CSS anims in Ty for now.
CSS animations are simple and fast and keep the bundles small. Others agree: "use it instead of importing javascript libraries, your bundle remains small. And browser spends fewer resources" https://medium.com/@dmitrynozhenko/5-ways-to-animate-a-reactjs-app-in-2019-56eb9af6e3bf
"ReactTransitionGroup has small size" they write, but:
react-transition-group.min.js
is 5,6K min.js.gz (v4.4.1, May 2020)
— that’s too much for Talkyard! (at least for now, for slim-bundle).
Don’t use Express (expressjs.com) or sth like that. Hard to reason about security, e.g.: https://hackerone.com/reports/712065, who could have guessed:
const _ = require('lodash'); .zipObjectDeep(['proto_.z'],[123]) _.zipObjectDeep(['a.b.__proto__.c'],[456]) console.log(z) // 123 console.log(c) // 456
Plus other vulns all the time in js libs it seems to me. And 9999 microdependencies —> supply chain attacks made simple. "npm has more than 1.2M of public packages […] perfect target for cybercriminals" writes https://snyk.io/blog/what-is-a-backdoor. Look: is-odd and is-even: https://news.ycombinator.com/item?id=16901188.
Using Javascript in the browser, though, is different — as long as the browser talks only with the Ty server.
Also better not use Python or Ruby for server code.
SECURITY [to_ty_risky] to-talkyard is a bit risky: nodejs code that parses user defined data. Use a different lang? Run in sandbox?
Edit, half a year (?) later: Turns out there’s a "Nodejs done right", namely Deno, by he who created Nodejs actually. Ty might use Deno for scripting (instead of js or bash).
Vendor all dependencies, i.e. bundle their source code (or at least JARs) together with Talkyard.
Four reasons:
-
Security. Can diff changes in vendored deps.
-
Developer friendliness. Seems some people are behind company firewalls and proxy servers that break downloading some dependencies. But accessing Git seems to have worked for everyone.
-
Continuous Integration. Then good if can build offline?
-
Reproducible builds (also requires using a specific OS? Which should be mostly fine, since things get built in Docker containers with a specific OS and version — but will need to specify the exact image hash?) [repr_builds]
-
Working offline. If the wifi stops working and your phone too (no tethering).
Place the vendored stuff in Git submodules. Then, if the vendored stuff’s Git repos eventually grows large because of old historical commit objects, those Git submodules can be squashed or sth like that, making them small again — without affecting the history in the main repo (it’d just be updated to point to the squashed stuff).
What should the join community button title be? [join_or_signup]
"Join", "Sign Up", "Create Account", "Register"? _ - Not "Register" — sounds like the tax agency. - Not "Create Account" — that’s too long, causes UX trickiness on 400 px wide phones. - Not "Sign up" because non-native English speakers sometimes don’t know the difference between "Sign Up" and "Sign In", or "Sign Up" and "Log In". (See e.g.: https://ell.stackexchange.com/questions/24384/what-is-the-difference-among-sign-up-sign-in-and-log-in )
But "Join" is simple to understand? And short and friendly? So "Join" it’ll be.
Don’t use "Sign in" for the login button title either, because can get confused with "Sign up", and also, is 1 letter longer than "Log in". Facebook uses "Log in" so "everyone" should be used to this.
So: "Join" and "Log in" and "Log out", in Talkyard.
Talkyard used to (before 2020-11-25) sort replies by most-popular-first. However I think this was confusing, in that most of the time, people don’t upvote others' replies — so most of the time, replies are in effect sorted by time only, …
-
Except for sometimes, someone clicks Like, causing the thereafter upvoted post, to berak the seemingly by-time sort order. Making things look a bit random?
Instead, there can be a one-click button "Show Popular Replies" or "Show best answers", to sort best-first.
There’s a per site config value to change the sort order — maybe per category or topic one day? Topic type? Q&A = best-first, others by time?
It’s pointless: Chrome >= v86 partitions the cache by URL, and Safari too, so there’re no performance benefits — instead, slightly slower, because of a TLS handshake. The previous browser behaviour enabled cross-site tracking, checking if the user had visited certain sites etc.
There’s also lua-resty-auto-ssl
but it has a dependency on Dehydrated, a Bash
script — whilst lua-resty-acme
is Lua only: simpler to install and
to understand the code. It’s actively developed by Kong which Ty a bit depends on
indirectly anyway because they sometimes contribute to OpenResty.
See: fffonion/lua-resty-acme#5
Procrastinating isn’t too bad — now, thanks to having done nothing for years (just Typescript and concatenated files with Gulpjs), suddenly Snowpack has appeared, and looks better than Webpack, Parcel and Browserify. (Rollup? Snowpack uses Rollup.)
Svelte chooses Snowpack: https://news.ycombinator.com/item?id=24911742 https://news.ycombinator.com/item?id=24909118
Elastic recently (2021-01) announced they’ll change from Apache2 to Server Side Public License (SSPL):
However they dual-license under both SSPL and the Elastic License — the latter allows using ElasticSearch the way Talkyard does it: all Basic features are available at no cost, when using ES via object code (not source code) and when the primary purpose is not ElasticSearch itself. In Ty’s case, the primary purpose with Ty is not ES, but discussions between humans. From the license:
You agree not to: … (iv) use Elastic Software Object Code for providing … software-as-a-service, … where obtaining access to the Elastic Software or the features and functions of the Elastic Software is a primary reason or substantial motivation for users of the SaaS Offering to access and/or use the SaaS Offering …
Also, it seems only Elastic may distribute the ES object code:
You agree not to: (iii) … distribute, sublicense, … Elastic Software Object Code,
That’s fine — with Talkyard, people download the ElasticSearch Docker image themselves (via Docker-Compose) from Elastic’s own servers — see images/search/Dockerfile.
The Elastic License: https://github.com/elastic/elasticsearch/blob/master/licenses/ELASTIC-LICENSE.txt
(FYI: Talkyards' source code integrating with ElasticSearch (via its REST API) is
about 750 lines? See app/talkyard/server/search/
. Whilst Ty’s whole code base
is about 160 000 lines? (as of 2021-01), I don’t remember exactly
— anyway, the directly related to ElasticSearch code in Talkyard
should be < 0.5%? of everything. This can make it easy to understand,
also for outsiders, that the primary benefit with Talkyard really is not
ElasticSearch. Also, that should be about how much code would need to change, if
replacing ElasticSearch with, say PostgreSQL’s search or some of the upcoming
alternatives in Rust or Golang.)
Bash scripts should be forbidden. I’m getting more and more upset at various scripts, e.g. this PostgreSQL related snippet from here:
if [[ ${1:0:1} = '-' ]]; then ... set --
What does ${1:0:1}
do, what can I search for, to find out?
I search for "bash bracket colon one zero"? … 10 frustrating minutes later,
I find something at AskUbuntu:
${var:pos} means that the variable var is expanded, starting from offset pos. $ {var:pos:len} means that the variable var is expanded, starting from offset pos with length len.
And what does set --
do? Fortunately help set
shows this — but still,
the above Bash code is utterly unreadable without jumping to a web browser
and start searching & browsing the Internet, even if you’ve been coding
for 10+ years and written somewhat much Bash code yourself already.
Use Deno instead of Bash. Why Deno:
-
Can code in Typescript, which Ty uses already (no new language to learn).
-
More secure: Doesn’t allow file system or network access, by default.
-
Easy to install (just a single binary, can incl in the vendors Git submodule).
-
Popular, large community, many libs if needed.
-
Couldn’t find anything else anyway. Ruby? No. Python or Nim? That’d be new languages to learn. (And lacks Deno’s security features?)
-
Maybe Lua? Ty uses Lua (Nginx plugins). But tiny community and few libs, lacks the Deno security features — could maybe implement oneself, but seems a bit complicated?: how-to-wrap-the-io-functions-in-lua-to-prevent-the-user-from-leaving-x-directory. So, Deno then.
GraphQL looks nice, I start thinking, now when working with Ty’s Search, List and Get APIs. However GraphQL doesn’t seem to bring any particular benefits to Ty, as of now, 2021-02.
GraphQL pros:
-
Well thought through query language, see e.g.: https://dgraph.io/docs/graphql/schema/search/ — but Ty already has its own ok REST API with get/list/search and filters: [api_json_query]. Supporting via GraphQL too would be dupl work, right.
-
Maybe simpler to use for people who know GraphQL already. But not many are familiar with GraphQL? — No one has ever asked about GraphQL.
-
Subscriptions seems nice?
(A supposed GraphQL benefit / REST + JSON drawback from howtographql.com: "With REST, you have to make three requests to different endpoints to fetch the required data. You’re also overfetching since the endpoints return additional information that’s not needed" — But one can create a REST api that looks at some request params and then includes all that’s needed, and only what’s needed / not-much-more, in a single response.)
GraphQL cons:
-
DoS attack risk? Clients could construct GraphQL queries that ask for lots of things and lists of nested things, missing the caches, causing lots of database queries. Slightly complicated code would be needed to ensure the clients are well behaved — and lots of test code, for that code. — In the same way, custom ElasticSearch queries or custom regular expressions, would also be risky.
-
Runtime bug risk: Wiring the schema definition to Java/Scala code and SQL queries is not type safe — look at e.g.: https://www.graphql-java.com/tutorials/getting-started-with-spring-boot/#datafetchers
dataFetchingEnvironment.getArgument("id")
— the compiler won’t check"id"
. Or is this Scala lib type safe, compile time code gen? "automatically derives GraphQL schemas from your data types", and: https://ghostdogpr.github.io/caliban/docs/#a-simple-example. -
More things to learn — there’s a lot to read over at https://graphql.org, plus learning the particilar Java/Scala lib to implement GraphQL server side.
-
What Java/Scala lib to use? There’re 5 to choose among: https://graphql.org/code/ Takes a while to evaluate all of them, pick the one that’s best for Ty.
-
Duplicated work: Would need to create a GraphQL schema, in addition to having Ty’s domain model in Typescript already.
-
Even more libs and complicated tooling: Would probably eventually want to auto generate Typescript interfaces, from the GraphQL schema?
Is it even worse that what one could have thought? https://www.prisma.io/blog/the-problems-of-schema-first-graphql-development-x1mn4cb0tyl3
a myriad of [GraphQL] tools have been released that are trying to improve the workflows around SDL-first development
[challenging ot keep] schema definition is in sync with the resolvers at all times
split into files (instead of one big schema file)
reuse files
People switching to code-first? https://github.com/graphql-rust/juniper, https://blog.logrocket.com/code-first-vs-schema-first-development-graphql/
Also, now there’s another similar graph query language, DQL: https://dgraph.io/docs//query-language/graphql-fundamentals/ > Dgraph Query Language, DQL, (previously named GraphQL+-) is based on GraphQL
So, for now, use Talkyard’s Get, List and Search APIs, but not GraphQL. At least wait until Scala 3 and the Scala 3 macro system — probably the existing / any-future Scala GraphQL libs will make use of them in some nice more-type-safe ways. Or maybe shouldn’t even be in the Scala app, maybe instead a separate server in Rust, optionally reading from a read-only replica rdb.
WebAssembly is "future compatible": If (parts of) the Talkyard app server ever gets rewritten in another language, say, Rust, plugins written in WebAssembly can be called from that new language too. But if the plugins were instead in Java or Scala, they couldn’t easily be used from the new code.
Browsers can run WebAssembly. Deno can run WebAssembly natively: https://deno.land/manual/getting_started/webassembly. GraalVM (JVM) can also run WebAssembly. And Rust and Bash: https://docs.wasmtime.dev/lang.html. There’s a Scala interpreter: https://github.com/satabin/swam.
Also: No GC, small code size.
Rust and Typescript, well, AssemblyScript, can be compiled to WebAssembly: https://rustwasm.github.io and https://github.com/AssemblyScript/assemblyscript. (But not Scala: scala-js/scala-js#1747)
In Talkyard, one can tag not only pages, but also users. A user tag functions as a user badge: the badge (tag) title can be configured to appear next to the names of people with that badge, for example in the about-user dialog, or in posts by them, e.g.: "By Some Name @some_name support-team" so everyone notices that Some Name is an official member of the support team.
Why use the tag system, for user titles, instead of adding titles directly to users and groups?
-
Because if there’s both user badges, and also user titles / group-member titles, then there’d be two systems for making a text appear next to someone’s name. And two systems, is more complicated, than just one system? E.g. more code for deciding which titles to show/not-show, if someone has, say, two titles from two groups hen is in, and also has two badge titles. And in which order to show the titles. Sometimes the group/user title(s) would be more interesting; in other cases the badge title(s)? And db columns determining what to show/hide & show in which order, would need to be duplicated between tables (both pats_t and tags_t)? And there’d be source code that understands and considers both group titles and badge titles, and maybe keeps badge + group titles unique accorss two tables? More complicated.
-
Maybe maybe a security issue, if using groups for titles? A moderator might want to give someone a title, and so adds hen to a group with that user title. But groups can have security settings, and maybe the mod didn’t remember all security settings associated with that group — and by adding the user, accidentally granted unintended permissions. However, giving someone a badge, won’t have any such security implications.
("Badge permissions" gives 0.8k Google search hits, whilst "group permissions" gives 300k hits — so, probably people expect groups, but not badges, to be associated with permissions? Groups = security, badges = for display.)
Summary: Group member titles, and user titles, would make sense, if there wasn’t also group member badges and user badges. But one solution is enough.
Java 17: Nashorn gone, was deprecated in Java 11.
Talkyard uses Nashorn for server side rendering. So, right now, cannot upgrade past Java 11.
Later, one alternative would be to switch to GraalVM and try to call the render-page script bundles from the Scala code, using Graal’s polygot features.
However, when rewriting / redoing this anyway, it’d be better to move the server side rendering, to a separate process — and I suppose it’ll run in a separate Docker container then, with Deno. Then, the script bundles + Deno can be compiled into an executable — making Ty quicker to restart. And if there’s really lots of things to server-side-(re)render, then, only the Deno container gets affected — other parts of Ty (e.g. already rendered and cached HTML) can continue working unaffected.
(The app server would send a HTTP? WebSocket? message to the Deno server, with the CommonMark source to render. Would Deno need to ask the app server for page names and excerpts, to show in link previews? Maybe two phases: 1) app server sends CommonMark to Deno, which looks at it and replies with what link previews, user names etc it needs. Then the app server fetchest this from the database (doing access permission control), or even external websites, e.g. a Twitter tweet link preview, and does a 2nd request to Deno, this time including all that’s needed for Deno to render the whole thing. [ext_markup_processor])
Don’t try to compile the script bundles to WASM and run inside a Rust server — it’s a bit complicated, different memory management, see e.g.: https://paulbutler.org/2021/calling-webassembly-from-rust/
Move most state fields from pages, to posts. Then, things one can do with pages, e.g. marking as a task to be done, becomes possible also with comments on a page. So, can use comments & sub threads, as lightweight tasks, instead of having to create new pages for everything (and remembering which pages belong together).
(In Ty, users and groups are already "unified" — they use the same Pat (short for
Participant) Scala base type, and users3
table (to be renamed to pats_t
).
This has worked out fine — e.g. can toggle settings on both a user and group level.)
Change-rename tables post_actions3
to pat_rels_t
and create a new post_rels_t
table ("rel" for relationship, graph db terminology: nodes and relationships).
See [graph_database]
in <./docs/tyworld.adoc>,
and, edit, 2023: [add_nodes_t]
and <./docs/everything-is-a-node.txt>.
Edit: Not pat_rels_t
, instead: pat_node_rels_t
?
And rename group_participants3
to pat_pat_rels_t
— relationships between pats and other pats.
E.g. member of group. Or follows, or friends, or … what more? PseudonymOf
?
And add PatPatRelType
with value e.g. MemberOf
, or ManagerOf
etc,
of relevance for groups. And SubscribedTo
, if one wants to know about
new posts by someone or by anyone in a group. — But most of this will
be in pat_node_rels_t
so can be per category (e.g. following someone but
only in a specific category).
Using these links tables will make it possible for e.g. a single flag to flag many things, e.g. all astroturfing accounts the flagger thinks belong to the same real world person. Or a post can be PostRelType.AssignedTo more than one person. Or a private comments sub thread can be made visible to more than one other group or person. Or having two people listed as co-authors of an article. — Instead of, as in most other software, always just one of whatever it is.
Use Debian as Docker base image, instead of Alpine. Debian is more widely used, that’s good for security, e.g. Debian at Google: gLinux, and libc means no exotic musl-not-libc bug risk. And most official images are based on Debian — we’d be downloading a Debian base image anyway. [alp_2_deb]
Stick to Docker Compose as the main way of deploying Ty. At the same time, K8s looks nice, they’ve thought about everything (!). In Ty v1, maybe start adding experimental support for running Ty on localhost in 1) Minikube, using 2) Skaffold and 3) Kpt. (Stay away from Helm — look, how error prone, a program written in Yaml + Go templates: (where? I lost the link) And this: Bash scripts, in Go templates, in Yaml: https://github.com/kudobuilder/operators/blob/master/repository/kafka/operator/templates/kafka-connect-setup.yaml )
Minikube on localhost can be a way to enable everyone to contribute to Ty, also if they use Windows or MacOS? Currently, though, some development scripts assume that one uses Debian (or Ubuntu). But with Minikube and Skaffold, it’ll be not-that-much-work to re-think/fix this? Look: "Skaffold enabled us to develop independently of the platform on Linux, OSX, and Windows, with no platform specific logic required", said TNG GmbH in a testimonial on https://skaffold.dev.
K8s Operators. The ./scripts/ dir in talkyard-prod-one, in a way does the same as a K8s operator. And, in the distant future, there could be auto-upgrade, auto-backup, panic-power-off operators for [Ty running in K8s] written in Rust: https://old.reddit.com/r/rust/comments/mjsfcu/writing_kubernetes_operators_in_rust/. https://kubernetes.io/docs/reference/using-api/client-libraries/ — e.g. kube.rs. But not in Go.
It’s good to keep down the number of languages Ty uses, so there’re fewer things to learn, for new contributors. And, apparently there isn’t ever going to be a need for writing anything in Go? Not even K8s operators, although Go is the "official" operators language — see the Rust K8s Operators links above.