Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple tilepaths splitted by ','. Hash based IDs of tilesets. Walking by tree with symlinks #92

Closed
wants to merge 13 commits into from

Conversation

asmyasnikov
Copy link

Old single tilepath (without splitter ':') also worked

@asmyasnikov
Copy link
Author

I'm also rename handlers.NewFromBaseDir to handlers.NewFromFilenames because filenames from main.go are not used and filepath.Walk over tilePath code are duplicate (main.go and inside handlers.NewFromBaseDir).
Due to tilesets in different paths may be equal I'm replace plain id (by relative path) of each tileset to hash id of absolute tileset path

@coveralls
Copy link

coveralls commented Feb 12, 2020

Coverage Status

Coverage decreased (-0.1%) to 95.92% when pulling 7943a56 on asmyasnikov:master into a613fe5 on consbio:master.

@asmyasnikov
Copy link
Author

some travis job are disconnected because job very long wait on queue =(

@brendan-ward
Copy link
Collaborator

@asmyasnikov Thanks for your PR!

Good catch on the duplicate code between main.go and handlers.NewFromBaseDir.

Help me understand your use case better - why is having multiple root directories better than having subdirectories within a single root directory?

I understand the need to generate IDs to get around possibly duplicate directory names; however, my strong preference is that we rely on the filesystem to ensure uniqueness, and have close parity between the subdirectories / mbtiles filenames and the resulting service IDs.

I found that we aren't handling symlinked subdirectories correctly, so I created #93 to track that. I think if we handled those properly, it would be possible for you to have multiple symlinked subdirectories for different collections of tilesets, provided that you assign each a unique symlink name within the directory passed to mbtileserver with the --dir option. This would also keep the Docker implementation more straightforward too.

@asmyasnikov
Copy link
Author

asmyasnikov commented Feb 12, 2020

My hardware solution is a single board computer (aka RaspberryPi or Odroid).
Im use two instances of mbtileserver with different http ports for two paths: persistent internal tilesets (on motherbard eMMC) and optional tilesets on external USB-flash/SD-card. User may manually change external tilesets on flash.
So, two URL paths of different mbtiles sources for client (browser) - bad way.
Symlinks are not helped in this situation because tilesets on external USB can be changed by user.
This proposal feature can help me for this situation.

@asmyasnikov
Copy link
Author

asmyasnikov commented Feb 12, 2020

Not found place for share my Dockerfile. Print in this topic (may be help you)

ARG BASE_GOLANG=golang
ARG BASE=busybox:glibc

FROM ${BASE_GOLANG} as build

RUN go get -d github.com/consbio/mbtileserver

ARG TARGET_OS=linux
ARG TARGET_ARCH=amd64
ARG TARGET_ARM=7

WORKDIR /go/src/github.com/consbio/mbtileserver

ENV GOOS=${TARGET_OS}
ENV GOARCH=${TARGET_ARCH}
ENV GOARM=${TARGET_ARM}

RUN CGO_ENABLED=1 go build -ldflags="-w -s" -o /go/bin/mbtileserver

RUN cp $(ldd /go/bin/mbtileserver | grep libdl.so | awk '{print $3}') /go/bin/libdl.so.2

FROM ${BASE}

ARG VERSION=None
ENV VERSION=$VERSION

COPY --from=build /go/bin/libdl.so.2 /lib/libdl.so.2
COPY --from=build /go/bin/mbtileserver /mbtileserver

ENTRYPOINT ["/mbtileserver"]

CMD []

I'm use busybox (lighter than alpine). My mbtileserver docker image for amd64 have only 17.1MB, for arm32v7 - 13MB

@brendan-ward
Copy link
Collaborator

@asmyasnikov thanks for the additional details. That is a really interesting use case; while we are happy that mbtileserver works great on the smallest cloud compute instances available, we never targeted single board computers. Thanks for your work on that front!

I'm assuming under your setup that it is possible to mount the flash drive to a directory name that is defined in advance? This seems like it would allow you to define 2 well-known source paths (one for the static tilesets and one for the optional ones) in such a way that they are unique, and avoid the need to generate IDs for them from a hash?

Or taking that even further, it seems theoretically possible to define a single local directory that includes symlinked subdirectories - one to your static tilesets, and one to the root folder of the tilesets on the flash drive. In this case, you only need to know the mount directory of the flash drive to symlink it, not the directory structure on the flash drive, right?

@asmyasnikov
Copy link
Author

asmyasnikov commented Feb 12, 2020

  1. Symlinks available only on a some filesystems (of all possible) like a ext2/3/4. But symlinks a not work on vfat, exfat and other. My users may be plug USB-flash with unsupported (for symlinks) filesystem to single board computer. I'm foresee some problems with using symlinks for all cases.
  2. If I use root path of tilesets with ext2/3/4 filesystem of-course I may create symlink from mountpoint (of external drive) to inside of root path. This can help me. But filepath.Walk on symlink path may be wrong work (needs to check for different filesystems and case of recursive symlinks path)
  3. plain path of tilesets for user may be so long, contains not supported for URL characters, recursive or not safe (by security notes). I make hashId of tileset with fixed length, problems with unsupported characters in URL are except, this safely, collisions are minimal, hash provide constant links for users in different runtimes or kubernate cluster instances (instead random id)

@asmyasnikov
Copy link
Author

asmyasnikov commented Feb 12, 2020

On single board computers only mbtileserver are unobtrusive instead tilemill, tilestash, mapnik and other WMTS.
Thanks very much for mbtileserver =)

@brendan-ward
Copy link
Collaborator

Thanks for the additional information. Limited support for symlinks on different filesystems is definitely something to consider in our approach here. At minimum, we'd have to do testing with #93 to make sure that there is decent support for symlink traversal across various filesystems. I would hope that the standard go library handles those cases correctly.

The issues you describe with symlinks are certainly true as general cases, though for normal server-based environments, they could be entirely controlled prior to starting up mbtileserver (i.e., your subdirectories must be uniquely named, not have recursive symlinks, etc )

For your users, is it possible to have a well-defined path that is required on their flash drive in order to support the use of optional tilesets? Plus well-defined instructions for how to organize their optional tilesets? It seems like those would help mitigate some of the potential shortcomings of you using symlinks locally to point to the optional tile directory on the flash drive.

How are users consuming these tilesets?

How will they discover what tilesets are available? Is the /services endpoint sufficient for that?

We're using named subdirectories in production, so it would be a bit challenging to migrate existing uses to a hash-based ID system.

@brendan-ward
Copy link
Collaborator

Further question: how is mbtileserver being executed in this setup? Is it something that users call directly, something you wrap in a shell script, or some other way?

@asmyasnikov
Copy link
Author

asmyasnikov commented Feb 12, 2020

My devices do not nave GUI such as Gnome, KDE, Mate or other, using as real web-server. Users must connect to device by http from browser. User cannot run applications at its discretion. I proceed from the concept of "stupid user" (which don't understand bash, SSH and other utils)
I'm create docker-compose.yml file (with mbtileserver and nginx with reverse proxy) and provide automatically start docker-compose on startup of device. At this time all tile paths are must be defined. USB-flash must be plugged before startup (this requirment will be lifted after mbtileserver upgrade with fswatch feature. Other containers are also wait external storage on startup, each case I'm resolving different - WIP). Flash I mount atomatically to /opt/usb/. Internal tilesets (such as osm-planet...) I permanently place to /opt/mbtiles/. Root filesystem I freeze via read-only mounting (for safe after hard poweroff, reboots and other cases).
My requirments for USB-flash (for users): first partition must have directory "mbtiles" with tilesets. But "stupid user" may insert to this directory something which crash mbtileserver (recursive symlinks, bad filenames and more). The device should work anyway!
This is about device.
About my web-client. First I read data from /services path and further read data about each tileset. I'm provide for users dialog in which the user select of map of interest. After I'm apply tilepath such as .../x/y/z.png to map engene (leaflet, openlayers or mapbox-gl). This case cannot require update web-client code.
But I understood your problem with migrate using of mbtileserver with hashIDs.
I suggest next solution. I'm upgrade code (of pull request) for provide old tilepaths (with plain filenames). But this tilepath will not print out in /services request. Only backdoor for compatibility. Accordingly two tilepath handlers will lead to single tileset. Old clients are not to be upgrade for new mbtileserver. But if mbtileserver run with multiple tilepaths and equal tilesets filenames, this must be a tricky logic (such as relative path from first tilepath in slice of tilepaths, or do not add backdoor for second and next tilepaths or other solutions)
What about it? Your fears will disappear? =)

@asmyasnikov
Copy link
Author

asmyasnikov commented Feb 12, 2020

As solution of problem I may create different arg for slice of paths. --path arg (and TILE_PATH envvar) corresponds to base dir of tilesets and will be provide plain ids, tilesets from different arg with tilesets corresponds to hash IDs. All ids (plain and hash) will be present in /services request

@asmyasnikov asmyasnikov changed the title Add feature - multiple tilepaths splitted by ':'. Add feature - multiple tilepaths splitted by ':'. Hash based IDs of tilesets Feb 12, 2020
@asmyasnikov asmyasnikov changed the title Add feature - multiple tilepaths splitted by ':'. Hash based IDs of tilesets Multiple tilepaths splitted by ':'. Hash based IDs of tilesets Feb 12, 2020
@brendan-ward
Copy link
Collaborator

Ok, so it sounds like in your case, the root directories of the tilesets are well-defined and known at the time mbtileserver starts.

I'd need help to identify sufficient cases of bad paths or filenames to make sure we handle that correctly, but that feels like a separate issue that we should address outside the context of this PR, since it affects the server in general. Given a root path(s), server should be able to find all valid mbtiles files and start services for them; any paths that are malformed or otherwise bad should be skipped (with appropriate log messages) rather than causing a hard failure on server startup or reload.

Do you have examples of bad paths for filenames we can use for testing?

Likewise, for valid paths and filenames, we need to be able to generate valid URLs to them.

I'm not yet sure I like the idea of a separate argument that provides an alternative way of specifying tile directories and ID schemes, so please hold off on implementing that.

Instead, I see the following decision points, which are related but not directly coupled:

  1. do we require a single tileset root directory, or do we expand this to allow multiple root directories?
  2. do we automatically generate IDs, do we preserve (as best we can) IDs derived from the paths in the root directory or directories, or do we make this an option (e.g., --generate-ids)?

@asmyasnikov
Copy link
Author

Bad filenames I cannot get you. I need search and test all my realy and invented external storages.
But you may generate bad url if tileset have big nesting. This is will fetch to URL great then 2000 characters. Long URLs cannot support by some or all of browsers or them versions (see https://serpstat.com/blog/how-long-should-be-the-page-url-length-for-seo/)

@asmyasnikov
Copy link
Author

--generate-ids flag can be optional if we are use only root directory. With symlinks of course for my case =). As solution of some bad filenames (for example, spaces in filename - windows users are often use that) and long URLs.
If we are use multiple root paths, --generate-ids is required. Because may be collisions of filenames in different paths. But this way will be broken old tileset URLs.

@brendan-ward
Copy link
Collaborator

--generate-ids can be optional in either case. In the case of duplicate path / filenames without it, we can log an error and skip the duplicates. That way it is more of an opt-in feature when your use case requires it.

@asmyasnikov
Copy link
Author

Ignore duplicate - wrong way. In two tilesets paths may be equal mbtiles file. For example, world.mbtiles, but different content (timestamps or other). Auto ignore cannot get from use needed map

@asmyasnikov
Copy link
Author

hashes will help to user for get needed map.

@brendan-ward
Copy link
Collaborator

You are controlling startup (if I understood your description above), so for your specific case you can always use the --generate-ids option (if we expose it that way).

Can you please explain what you mean by this?

hashes will help to user for get needed map.

How are errors communicated back to the user about invalid tilesets from their flash drive?

Based on your description, I see an important distinction between:

  1. server process always starts up and keeps running correctly, no matter what paths / tilesets are provided by user. This means skipping invalid paths or tilesets (we do now).
  2. users can only select from valid tilesets

The first is about server stability, the second is about user interface. If users provide invalid paths / tilesets, ignore directions about how to name things, etc - what do they expect to happen? The server process should not crash, so other valid tilesets should still be available, but what about the invalid ones?

@asmyasnikov
Copy link
Author

asmyasnikov commented Feb 12, 2020

but in this case I will use --generate-ids.
So, can I make this flag? And logic for switch type of id (plain or hash)?

@asmyasnikov
Copy link
Author

hashes creates from abspath and unique for equal filenames in different paths.

Can you please explain what you mean by this?

hashes will help to user for get needed map.

@asmyasnikov
Copy link
Author

Users cannot view wrong tilesets and msgs about reasons. Yes, server must be a stable. In my solution wrong filenames and errors may see from portainer (docker admin web-GUI), but I'm make this way for exclusive cases.

@brendan-ward
Copy link
Collaborator

Redundant file walk and appropriate error handling handled in ab7ae02

Which should also make server startup more robust even if there are errors scanning tileset directories.

Per #94, I think we should use a URL-safe SHA1 for IDs instead of MD5. I have some opinions about how I'd like that API to look, so I'll try to take a first pass today.

@asmyasnikov
Copy link
Author

asmyasnikov commented Feb 13, 2020

I was tryed sha1 yesterday. URLs was bad (some characters are not good for URLs) and was different length. For URLs hash must be with predefined alphabet (only readable characters). After this experience I was changed algorithm to md5 (URLs was also bad) and result was wrapped on hexify. URLs acquired readable view and fix length

@brendan-ward
Copy link
Collaborator

@asmyasnikov what do you think about splitting paths on ,? It seems like : would cause issues on Windows filesystems when referring to other drives, i.e., d:/data/tiles

@asmyasnikov
Copy link
Author

this ok, not principially

@asmyasnikov
Copy link
Author

I wrote simple scanner for walk directory tree with symlinks (pushed to my repo). Filenames are ok. But paths with symlinks are not applied in service set. Because this paths need repeatedly wrap for good applying =(

@asmyasnikov
Copy link
Author

Task of symlinks are not easy =(

@asmyasnikov
Copy link
Author

multiple paths looks easier

@asmyasnikov
Copy link
Author

asmyasnikov commented Feb 14, 2020

my last changes not break existing features, only refactoring (scanTilesets method and merge with resolving conflicts). scanTilesets make isolated code of walking by tree. Because in symlink case need call scanTilesets recursive and replace prefix of filepath

@asmyasnikov asmyasnikov changed the title Multiple tilepaths splitted by ':'. Hash based IDs of tilesets Multiple tilepaths splitted by ':'. Hash based IDs of tilesets. Walking by tree with symlinks Feb 14, 2020
@asmyasnikov
Copy link
Author

				id = url.PathEscape(p[:len(p)-len(e)])

url.PathEscape make URL for subdirs invalid. What's the point of this?

@asmyasnikov asmyasnikov changed the title Multiple tilepaths splitted by ':'. Hash based IDs of tilesets. Walking by tree with symlinks Multiple tilepaths splitted by ','. Hash based IDs of tilesets. Walking by tree with symlinks Feb 14, 2020
@brendan-ward
Copy link
Collaborator

id = url.PathEscape(p[:len(p)-len(e)])

was an attempt to escape out extra characters in path or filename, I didn't catch that it broke paths.

Note: I'm still working on the branch generate_ids, it does not yet have all the changes I'm considering there.

Thanks for your work investigating symlinks, it sounds like those may not be the good option I was hoping for.

Refactoring scan of tilesets given a base directory seems reasonable.

Overall, I'd like to keep the interface of NewFromBaseDir relatively intact - it shouldn't be in charge of splitting the base paths, that is a command-line concern.

One option might be to allow variadic path arguments, but they can only be the last argument, which creates a less-than-ideal function signature

func NewFromBaseDir(generateIDs bool, baseDirs ...string) (*ServiceSet, error) {
for _,baseDir:= range baseDirs {
...find tilesets, construct ids, and add to ServiceSet
}
...
}

Lightweight proof of concept of this in variadic_basedirs (note: splits on ','). I'm not sure I like the API yet.

Another option would be to provide a Merge() method on ServiceSet which would allow us to split the base directories, call NewFromBaseDir for each, then merge into a final ServiceSet. Not sure I like that option either.

A further option would be to provide a new method that allows us to construct the ServiceSet via New, then loop and add services to it from each directory via a new method, maybe AddFromBaseDir? I haven't fully thought this one out yet.

@brendan-ward
Copy link
Collaborator

@asmyasnikov I'm not sure what you mean by:

Environment variables may be not supported this concept

Do you mean the split on ','?

@asmyasnikov
Copy link
Author

Yes, I was code ',' yet =)

@asmyasnikov
Copy link
Author

Merge strategy in my branch already worked by default - new equal filepath replace old filepath in other base path. This is normal for me. First base dir - persistent, second base dir - removable. If removable media contains tileset with filename which equal with persistent tiles, persistent tileset will be forget. I was tested this case and it's ok.

@brendan-ward
Copy link
Collaborator

@asmyasnikov I'm having a hard time following some of your comments.

When you say

Merge strategy in my branch already worked by default

Do you mean your PR here was looping over tilesets in both directories and adding them to the same ServiceSet, that you were considering them "merged"?

Right now, the code does not warn you if you have multiple instances of the same ID, as up until previously that was not permitted because paths forced unique IDs. I think we should log an error message in that case, and preserve the first rather than the last tileset that has the duplicate ID.

Using the --generate-ids option avoids those issues, as expected.

@asmyasnikov
Copy link
Author

Right now, the code does not warn you if you have multiple instances of the same ID, as up until previously that was not permitted because paths forced unique IDs. I think we should log an error message in that case, and preserve the first rather than the last tileset that has the duplicate ID.

You are write - error msg needed.
About priority of tilesets. As you wish =)

The code takes the form of my initial PR =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants