Multiple tilepaths splitted by ','. Hash based IDs of tilesets. Walking by tree with symlinks #92

asmyasnikov · 2020-02-12T11:40:28Z

Old single tilepath (without splitter ':') also worked

… (without splitter ':') also worked

asmyasnikov · 2020-02-12T11:47:15Z

I'm also rename handlers.NewFromBaseDir to handlers.NewFromFilenames because filenames from main.go are not used and filepath.Walk over tilePath code are duplicate (main.go and inside handlers.NewFromBaseDir).
Due to tilesets in different paths may be equal I'm replace plain id (by relative path) of each tileset to hash id of absolute tileset path

coveralls · 2020-02-12T11:50:16Z

Coverage decreased (-0.1%) to 95.92% when pulling 7943a56 on asmyasnikov:master into a613fe5 on consbio:master.

…ang < 1.12

asmyasnikov · 2020-02-12T12:25:00Z

some travis job are disconnected because job very long wait on queue =(

brendan-ward · 2020-02-12T15:37:11Z

@asmyasnikov Thanks for your PR!

Good catch on the duplicate code between main.go and handlers.NewFromBaseDir.

Help me understand your use case better - why is having multiple root directories better than having subdirectories within a single root directory?

I understand the need to generate IDs to get around possibly duplicate directory names; however, my strong preference is that we rely on the filesystem to ensure uniqueness, and have close parity between the subdirectories / mbtiles filenames and the resulting service IDs.

I found that we aren't handling symlinked subdirectories correctly, so I created #93 to track that. I think if we handled those properly, it would be possible for you to have multiple symlinked subdirectories for different collections of tilesets, provided that you assign each a unique symlink name within the directory passed to mbtileserver with the --dir option. This would also keep the Docker implementation more straightforward too.

asmyasnikov · 2020-02-12T15:48:20Z

My hardware solution is a single board computer (aka RaspberryPi or Odroid).
Im use two instances of mbtileserver with different http ports for two paths: persistent internal tilesets (on motherbard eMMC) and optional tilesets on external USB-flash/SD-card. User may manually change external tilesets on flash.
So, two URL paths of different mbtiles sources for client (browser) - bad way.
Symlinks are not helped in this situation because tilesets on external USB can be changed by user.
This proposal feature can help me for this situation.

asmyasnikov · 2020-02-12T15:55:35Z

Not found place for share my Dockerfile. Print in this topic (may be help you)

ARG BASE_GOLANG=golang
ARG BASE=busybox:glibc

FROM ${BASE_GOLANG} as build

RUN go get -d github.com/consbio/mbtileserver

ARG TARGET_OS=linux
ARG TARGET_ARCH=amd64
ARG TARGET_ARM=7

WORKDIR /go/src/github.com/consbio/mbtileserver

ENV GOOS=${TARGET_OS}
ENV GOARCH=${TARGET_ARCH}
ENV GOARM=${TARGET_ARM}

RUN CGO_ENABLED=1 go build -ldflags="-w -s" -o /go/bin/mbtileserver

RUN cp $(ldd /go/bin/mbtileserver | grep libdl.so | awk '{print $3}') /go/bin/libdl.so.2

FROM ${BASE}

ARG VERSION=None
ENV VERSION=$VERSION

COPY --from=build /go/bin/libdl.so.2 /lib/libdl.so.2
COPY --from=build /go/bin/mbtileserver /mbtileserver

ENTRYPOINT ["/mbtileserver"]

CMD []

I'm use busybox (lighter than alpine). My mbtileserver docker image for amd64 have only 17.1MB, for arm32v7 - 13MB

brendan-ward · 2020-02-12T16:45:17Z

@asmyasnikov thanks for the additional details. That is a really interesting use case; while we are happy that mbtileserver works great on the smallest cloud compute instances available, we never targeted single board computers. Thanks for your work on that front!

I'm assuming under your setup that it is possible to mount the flash drive to a directory name that is defined in advance? This seems like it would allow you to define 2 well-known source paths (one for the static tilesets and one for the optional ones) in such a way that they are unique, and avoid the need to generate IDs for them from a hash?

Or taking that even further, it seems theoretically possible to define a single local directory that includes symlinked subdirectories - one to your static tilesets, and one to the root folder of the tilesets on the flash drive. In this case, you only need to know the mount directory of the flash drive to symlink it, not the directory structure on the flash drive, right?

asmyasnikov · 2020-02-12T17:08:08Z

Symlinks available only on a some filesystems (of all possible) like a ext2/3/4. But symlinks a not work on vfat, exfat and other. My users may be plug USB-flash with unsupported (for symlinks) filesystem to single board computer. I'm foresee some problems with using symlinks for all cases.
If I use root path of tilesets with ext2/3/4 filesystem of-course I may create symlink from mountpoint (of external drive) to inside of root path. This can help me. But filepath.Walk on symlink path may be wrong work (needs to check for different filesystems and case of recursive symlinks path)
plain path of tilesets for user may be so long, contains not supported for URL characters, recursive or not safe (by security notes). I make hashId of tileset with fixed length, problems with unsupported characters in URL are except, this safely, collisions are minimal, hash provide constant links for users in different runtimes or kubernate cluster instances (instead random id)

asmyasnikov · 2020-02-12T17:13:53Z

On single board computers only mbtileserver are unobtrusive instead tilemill, tilestash, mapnik and other WMTS.
Thanks very much for mbtileserver =)

brendan-ward · 2020-02-12T17:31:45Z

Thanks for the additional information. Limited support for symlinks on different filesystems is definitely something to consider in our approach here. At minimum, we'd have to do testing with #93 to make sure that there is decent support for symlink traversal across various filesystems. I would hope that the standard go library handles those cases correctly.

The issues you describe with symlinks are certainly true as general cases, though for normal server-based environments, they could be entirely controlled prior to starting up mbtileserver (i.e., your subdirectories must be uniquely named, not have recursive symlinks, etc )

For your users, is it possible to have a well-defined path that is required on their flash drive in order to support the use of optional tilesets? Plus well-defined instructions for how to organize their optional tilesets? It seems like those would help mitigate some of the potential shortcomings of you using symlinks locally to point to the optional tile directory on the flash drive.

How are users consuming these tilesets?

How will they discover what tilesets are available? Is the /services endpoint sufficient for that?

We're using named subdirectories in production, so it would be a bit challenging to migrate existing uses to a hash-based ID system.

brendan-ward · 2020-02-12T17:55:14Z

Further question: how is mbtileserver being executed in this setup? Is it something that users call directly, something you wrap in a shell script, or some other way?

asmyasnikov · 2020-02-12T18:38:47Z

My devices do not nave GUI such as Gnome, KDE, Mate or other, using as real web-server. Users must connect to device by http from browser. User cannot run applications at its discretion. I proceed from the concept of "stupid user" (which don't understand bash, SSH and other utils)
I'm create docker-compose.yml file (with mbtileserver and nginx with reverse proxy) and provide automatically start docker-compose on startup of device. At this time all tile paths are must be defined. USB-flash must be plugged before startup (this requirment will be lifted after mbtileserver upgrade with fswatch feature. Other containers are also wait external storage on startup, each case I'm resolving different - WIP). Flash I mount atomatically to /opt/usb/. Internal tilesets (such as osm-planet...) I permanently place to /opt/mbtiles/. Root filesystem I freeze via read-only mounting (for safe after hard poweroff, reboots and other cases).
My requirments for USB-flash (for users): first partition must have directory "mbtiles" with tilesets. But "stupid user" may insert to this directory something which crash mbtileserver (recursive symlinks, bad filenames and more). The device should work anyway!
This is about device.
About my web-client. First I read data from /services path and further read data about each tileset. I'm provide for users dialog in which the user select of map of interest. After I'm apply tilepath such as .../x/y/z.png to map engene (leaflet, openlayers or mapbox-gl). This case cannot require update web-client code.
But I understood your problem with migrate using of mbtileserver with hashIDs.
I suggest next solution. I'm upgrade code (of pull request) for provide old tilepaths (with plain filenames). But this tilepath will not print out in /services request. Only backdoor for compatibility. Accordingly two tilepath handlers will lead to single tileset. Old clients are not to be upgrade for new mbtileserver. But if mbtileserver run with multiple tilepaths and equal tilesets filenames, this must be a tricky logic (such as relative path from first tilepath in slice of tilepaths, or do not add backdoor for second and next tilepaths or other solutions)
What about it? Your fears will disappear? =)

asmyasnikov · 2020-02-12T19:14:48Z

As solution of problem I may create different arg for slice of paths. --path arg (and TILE_PATH envvar) corresponds to base dir of tilesets and will be provide plain ids, tilesets from different arg with tilesets corresponds to hash IDs. All ids (plain and hash) will be present in /services request

brendan-ward · 2020-02-12T19:50:28Z

Ok, so it sounds like in your case, the root directories of the tilesets are well-defined and known at the time mbtileserver starts.

I'd need help to identify sufficient cases of bad paths or filenames to make sure we handle that correctly, but that feels like a separate issue that we should address outside the context of this PR, since it affects the server in general. Given a root path(s), server should be able to find all valid mbtiles files and start services for them; any paths that are malformed or otherwise bad should be skipped (with appropriate log messages) rather than causing a hard failure on server startup or reload.

Do you have examples of bad paths for filenames we can use for testing?

Likewise, for valid paths and filenames, we need to be able to generate valid URLs to them.

I'm not yet sure I like the idea of a separate argument that provides an alternative way of specifying tile directories and ID schemes, so please hold off on implementing that.

Instead, I see the following decision points, which are related but not directly coupled:

do we require a single tileset root directory, or do we expand this to allow multiple root directories?
do we automatically generate IDs, do we preserve (as best we can) IDs derived from the paths in the root directory or directories, or do we make this an option (e.g., --generate-ids)?

asmyasnikov · 2020-02-12T20:03:51Z

Bad filenames I cannot get you. I need search and test all my realy and invented external storages.
But you may generate bad url if tileset have big nesting. This is will fetch to URL great then 2000 characters. Long URLs cannot support by some or all of browsers or them versions (see https://serpstat.com/blog/how-long-should-be-the-page-url-length-for-seo/)

asmyasnikov · 2020-02-12T20:10:48Z

--generate-ids flag can be optional if we are use only root directory. With symlinks of course for my case =). As solution of some bad filenames (for example, spaces in filename - windows users are often use that) and long URLs.
If we are use multiple root paths, --generate-ids is required. Because may be collisions of filenames in different paths. But this way will be broken old tileset URLs.

brendan-ward · 2020-02-12T20:19:41Z

--generate-ids can be optional in either case. In the case of duplicate path / filenames without it, we can log an error and skip the duplicates. That way it is more of an opt-in feature when your use case requires it.

asmyasnikov · 2020-02-12T20:39:34Z

Ignore duplicate - wrong way. In two tilesets paths may be equal mbtiles file. For example, world.mbtiles, but different content (timestamps or other). Auto ignore cannot get from use needed map

asmyasnikov · 2020-02-12T20:43:54Z

hashes will help to user for get needed map.

brendan-ward · 2020-02-12T20:54:45Z

You are controlling startup (if I understood your description above), so for your specific case you can always use the --generate-ids option (if we expose it that way).

Can you please explain what you mean by this?

hashes will help to user for get needed map.

How are errors communicated back to the user about invalid tilesets from their flash drive?

Based on your description, I see an important distinction between:

server process always starts up and keeps running correctly, no matter what paths / tilesets are provided by user. This means skipping invalid paths or tilesets (we do now).
users can only select from valid tilesets

The first is about server stability, the second is about user interface. If users provide invalid paths / tilesets, ignore directions about how to name things, etc - what do they expect to happen? The server process should not crash, so other valid tilesets should still be available, but what about the invalid ones?

asmyasnikov · 2020-02-12T20:56:06Z

but in this case I will use --generate-ids.
So, can I make this flag? And logic for switch type of id (plain or hash)?

asmyasnikov · 2020-02-12T21:00:05Z

hashes creates from abspath and unique for equal filenames in different paths.

Can you please explain what you mean by this?

hashes will help to user for get needed map.

asmyasnikov · 2020-02-12T21:05:13Z

Users cannot view wrong tilesets and msgs about reasons. Yes, server must be a stable. In my solution wrong filenames and errors may see from portainer (docker admin web-GUI), but I'm make this way for exclusive cases.

brendan-ward · 2020-02-12T22:22:31Z

Redundant file walk and appropriate error handling handled in ab7ae02

Which should also make server startup more robust even if there are errors scanning tileset directories.

Per #94, I think we should use a URL-safe SHA1 for IDs instead of MD5. I have some opinions about how I'd like that API to look, so I'll try to take a first pass today.

asmyasnikov · 2020-02-13T03:47:59Z

I was tryed sha1 yesterday. URLs was bad (some characters are not good for URLs) and was different length. For URLs hash must be with predefined alphabet (only readable characters). After this experience I was changed algorithm to md5 (URLs was also bad) and result was wrapped on hexify. URLs acquired readable view and fix length

…with golang < 1.12" This reverts commit 4eb1d3c.

…tilepath (without splitter ':') also worked" This reverts commit b21ec14.

…links. WIP

brendan-ward · 2020-02-14T13:00:17Z

@asmyasnikov what do you think about splitting paths on ,? It seems like : would cause issues on Windows filesystems when referring to other drives, i.e., d:/data/tiles

asmyasnikov · 2020-02-14T14:57:15Z

this ok, not principially

asmyasnikov · 2020-02-14T15:01:05Z

I wrote simple scanner for walk directory tree with symlinks (pushed to my repo). Filenames are ok. But paths with symlinks are not applied in service set. Because this paths need repeatedly wrap for good applying =(

asmyasnikov · 2020-02-14T15:01:40Z

Task of symlinks are not easy =(

asmyasnikov · 2020-02-14T15:03:53Z

multiple paths looks easier

asmyasnikov · 2020-02-14T15:26:24Z

my last changes not break existing features, only refactoring (scanTilesets method and merge with resolving conflicts). scanTilesets make isolated code of walking by tree. Because in symlink case need call scanTilesets recursive and replace prefix of filepath

asmyasnikov · 2020-02-14T16:19:00Z

				id = url.PathEscape(p[:len(p)-len(e)])

url.PathEscape make URL for subdirs invalid. What's the point of this?

brendan-ward · 2020-02-14T17:06:55Z

id = url.PathEscape(p[:len(p)-len(e)])

was an attempt to escape out extra characters in path or filename, I didn't catch that it broke paths.

Note: I'm still working on the branch generate_ids, it does not yet have all the changes I'm considering there.

Thanks for your work investigating symlinks, it sounds like those may not be the good option I was hoping for.

Refactoring scan of tilesets given a base directory seems reasonable.

Overall, I'd like to keep the interface of NewFromBaseDir relatively intact - it shouldn't be in charge of splitting the base paths, that is a command-line concern.

One option might be to allow variadic path arguments, but they can only be the last argument, which creates a less-than-ideal function signature

func NewFromBaseDir(generateIDs bool, baseDirs ...string) (*ServiceSet, error) {
for _,baseDir:= range baseDirs {
...find tilesets, construct ids, and add to ServiceSet
}
...
}

Lightweight proof of concept of this in variadic_basedirs (note: splits on ','). I'm not sure I like the API yet.

Another option would be to provide a Merge() method on ServiceSet which would allow us to split the base directories, call NewFromBaseDir for each, then merge into a final ServiceSet. Not sure I like that option either.

A further option would be to provide a new method that allows us to construct the ServiceSet via New, then loop and add services to it from each directory via a new method, maybe AddFromBaseDir? I haven't fully thought this one out yet.

brendan-ward · 2020-02-14T17:51:29Z

@asmyasnikov I'm not sure what you mean by:

Environment variables may be not supported this concept

Do you mean the split on ','?

asmyasnikov · 2020-02-14T17:59:13Z

Yes, I was code ',' yet =)

asmyasnikov · 2020-02-14T18:03:04Z

Merge strategy in my branch already worked by default - new equal filepath replace old filepath in other base path. This is normal for me. First base dir - persistent, second base dir - removable. If removable media contains tileset with filename which equal with persistent tiles, persistent tileset will be forget. I was tested this case and it's ok.

brendan-ward · 2020-02-14T18:19:16Z

@asmyasnikov I'm having a hard time following some of your comments.

When you say

Merge strategy in my branch already worked by default

Do you mean your PR here was looping over tilesets in both directories and adding them to the same ServiceSet, that you were considering them "merged"?

Right now, the code does not warn you if you have multiple instances of the same ID, as up until previously that was not permitted because paths forced unique IDs. I think we should log an error message in that case, and preserve the first rather than the last tileset that has the duplicate ID.

Using the --generate-ids option avoids those issues, as expected.

asmyasnikov · 2020-02-14T18:28:41Z

Right now, the code does not warn you if you have multiple instances of the same ID, as up until previously that was not permitted because paths forced unique IDs. I think we should log an error message in that case, and preserve the first rather than the last tileset that has the duplicate ID.

You are write - error msg needed.
About priority of tilesets. As you wish =)

The code takes the form of my initial PR =)

Add feature - multiple tilepaths splitted by ':'. Old single tilepath…

b21ec14

… (without splitter ':') also worked

replace strings.ReplaceAll to strings.Replace for fix builds with gol…

4eb1d3c

…ang < 1.12

asmyasnikov changed the title ~~Add feature - multiple tilepaths splitted by ':'.~~ Add feature - multiple tilepaths splitted by ':'. Hash based IDs of tilesets Feb 12, 2020

asmyasnikov changed the title ~~Add feature - multiple tilepaths splitted by ':'. Hash based IDs of tilesets~~ Multiple tilepaths splitted by ':'. Hash based IDs of tilesets Feb 12, 2020

brendan-ward mentioned this pull request Feb 12, 2020

Add unique generated tileset IDs #94

Closed

amyasnikov added 3 commits February 13, 2020 08:51

Revert "replace strings.ReplaceAll to strings.Replace for fix builds …

d3c14e0

…with golang < 1.12" This reverts commit 4eb1d3c.

Revert "Add feature - multiple tilepaths splitted by ':'. Old single …

d92c63a

…tilepath (without splitter ':') also worked" This reverts commit b21ec14.

Merge branch 'master' of https://github.com/consbio/mbtileserver

844f3a9

brendan-ward mentioned this pull request Feb 13, 2020

Traverse symlinked directories within tilesets directory #93

Open

Add scanTilesets method for scanning base directory which include sym…

9328a33

…links. WIP

Added support for auto-generating IDs from tileset paths

66e8a15

Merge branch 'master' of https://github.com/consbio/mbtileserver

3885594

amyasnikov added 2 commits February 14, 2020 18:27

Remove debug print

5030af3

Remove debug print

31786f9

asmyasnikov changed the title ~~Multiple tilepaths splitted by ':'. Hash based IDs of tilesets~~ Multiple tilepaths splitted by ':'. Hash based IDs of tilesets. Walking by tree with symlinks Feb 14, 2020

amyasnikov added 3 commits February 14, 2020 18:40

multiple paths. WIP

c44cd33

Merge branch 'generate_ids' of https://github.com/consbio/mbtileserver

f94445b

multiple base dirs ok

7943a56

asmyasnikov changed the title ~~Multiple tilepaths splitted by ':'. Hash based IDs of tilesets. Walking by tree with symlinks~~ Multiple tilepaths splitted by ','. Hash based IDs of tilesets. Walking by tree with symlinks Feb 14, 2020

asmyasnikov closed this Feb 14, 2020

Multiple tilepaths splitted by ','. Hash based IDs of tilesets. Walking by tree with symlinks #92

Multiple tilepaths splitted by ','. Hash based IDs of tilesets. Walking by tree with symlinks #92

Conversation

asmyasnikov commented Feb 12, 2020

asmyasnikov commented Feb 12, 2020

coveralls commented Feb 12, 2020 • edited Loading

asmyasnikov commented Feb 12, 2020

brendan-ward commented Feb 12, 2020

asmyasnikov commented Feb 12, 2020 • edited Loading

asmyasnikov commented Feb 12, 2020 • edited Loading

brendan-ward commented Feb 12, 2020

asmyasnikov commented Feb 12, 2020 • edited Loading

asmyasnikov commented Feb 12, 2020 • edited Loading

brendan-ward commented Feb 12, 2020

brendan-ward commented Feb 12, 2020

asmyasnikov commented Feb 12, 2020 • edited Loading

asmyasnikov commented Feb 12, 2020 • edited Loading

brendan-ward commented Feb 12, 2020

asmyasnikov commented Feb 12, 2020

asmyasnikov commented Feb 12, 2020

brendan-ward commented Feb 12, 2020

asmyasnikov commented Feb 12, 2020

asmyasnikov commented Feb 12, 2020

brendan-ward commented Feb 12, 2020

asmyasnikov commented Feb 12, 2020 • edited Loading

asmyasnikov commented Feb 12, 2020

asmyasnikov commented Feb 12, 2020

brendan-ward commented Feb 12, 2020

asmyasnikov commented Feb 13, 2020 • edited Loading

brendan-ward commented Feb 14, 2020

asmyasnikov commented Feb 14, 2020

asmyasnikov commented Feb 14, 2020

asmyasnikov commented Feb 14, 2020

asmyasnikov commented Feb 14, 2020

asmyasnikov commented Feb 14, 2020 • edited Loading

asmyasnikov commented Feb 14, 2020

brendan-ward commented Feb 14, 2020

brendan-ward commented Feb 14, 2020

asmyasnikov commented Feb 14, 2020

asmyasnikov commented Feb 14, 2020

brendan-ward commented Feb 14, 2020

asmyasnikov commented Feb 14, 2020

coveralls commented Feb 12, 2020 •

edited

Loading

asmyasnikov commented Feb 12, 2020 •

edited

Loading

asmyasnikov commented Feb 12, 2020 •

edited

Loading

asmyasnikov commented Feb 12, 2020 •

edited

Loading

asmyasnikov commented Feb 12, 2020 •

edited

Loading

asmyasnikov commented Feb 12, 2020 •

edited

Loading

asmyasnikov commented Feb 12, 2020 •

edited

Loading

asmyasnikov commented Feb 12, 2020 •

edited

Loading

asmyasnikov commented Feb 13, 2020 •

edited

Loading

asmyasnikov commented Feb 14, 2020 •

edited

Loading