Skip to content
This repository has been archived by the owner on Mar 1, 2021. It is now read-only.

Penalize inner-link U-turns (builds on #83) #88

Merged
merged 5 commits into from
Dec 23, 2016

Conversation

stefanholder
Copy link
Contributor

Things to note

  • Requires Refactor unfavoring of virtual edges (#885) graphhopper#909
  • mm-uturn2.gpx matches without U-turns with gps_accuracy>=80. Otherwise, there is a U-turn at a real node, which we could address in a separate PR.
  • I squashed @kodonnell's first two commits from prevent inner-link U-turns #83 because the second commit fixes the line endings of the first commit.
  • Added a penalty to the path distance for each unfavored edge contained in the path.
  • Use non-normalized transition metric because the timeDiff==0 check didn't work with directed candidates and was broken anyway.
  • Shortest weighting does not use penalties, hence unfavoring virtual edges would not work there.
  • Implemented logging of candidates and paths. After fixing the remaining broken unit test, it probably makes sense to set the logging level to WARN for MapMatching.java.

When looking at the unit tests I was wondering why the map match results are only checked for the correct sequence of street names but not for the correct sequence of real nodes / real edges. Is there a reason for this?

@karussell, @kodonnell: When finally merging this PR, if you want to squash commits please only squash commits of the same author and please do not squash the entire PR so that it is still clear which commits were contributed by which author.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
.hints(new PMap().put(Parameters.CH.DISABLE, false))
.maxVisitedNodes(1000)
.hints(new PMap().put(Parameters.CH.DISABLE, false)
//TODO Fix that CH routing still uses default penalty of 300
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testUTurns for CH still fails because CH routing does not consider the set parameter value for Parameters.Routing.HEADING_PENALTY. @karussell, can you please help?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CH cannot consider this value in most cases due to theoretical limitations, but sometimes it works and you could try if here this would be the cases via put(CH.FORCE_HEADING, true) if not, then we would need to exclude the test for CH somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put(CH.FORCE_HEADING, true) did not work, so I excluded the test for CH.

*/
public double transitionLogProbability(double routeLength, double linearDistance,
double timeDiff) {
if (timeDiff == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For GPX traces with equal timestamps, all transitions had a probability of 1 and hence transitions were not considered during map matching. With directed candidates the siutation got even worse because it could happen that the Viterbi algorithm chose a candidate with wrong direction because penalties from unfavored edges would still result in a transition probability of 1. In this case the resulting map matching path would take unnecessary detours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed here I'd prefer a speed based limit where we break sequences (#87) if the speed between two candidates is too high. However, I think we can leave that until after this PR is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed here I would prefer that the user is able to specify if timestamps should be used. In some cases such as #13 there are just no (correct) timestamps available. However, checking speed between two candidates requires correct timestamps and should hence be optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because penalties from unfavored edges would still result in a transition probability of 1

@stefanholder would you mind to explain this to me? Do we just pick the distance then? But is this not somehow conflicting with the FastestWeighting we use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With conflicting I mean e.g. when we calculate the correct fastest path going on a motorway and then still prefer the track besides the motorway which is shorter (but slower), because the matchings were slightly closer to the track. But using the fastest weighting for me implies that we should prefer matching to fast roads even if there are no time stamps attached. So more generally I think we should even use path.getWeight() so that one could in theory use a completely different road preference (not sure if this is useful, so maybe we should stick to time/fastest)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So more generally I think we should even use path.getWeight()

Good idea. I think we're all agreed that it'd be good to allow the user to configure their transition metric preference (or at least base it on the algoOptions). Maybe a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But using the fastest weighting for me implies that we should prefer matching to fast roads even if there are no time stamps attached.

What I tried to explain in the original comment was that the timeDiff==0 check was broken and that the consequences would be even worse for directed candidates so this had to be fixed. The code change above makes sure that penalties from unfavored edges are also considered if all timestamps are equal.

With FastestWeighting and the route length transition metric we are currently using the map matcher favors the shortest routes out of all fastest routes between candidate pairs (independent of whether timestamps are equal or not). This means that the map matcher actually achieves a compromise between fast and short routes. Moreover, we are using the transition metric suggested and verified by Newson & Krumm. Changing this should be done in a separate PR and carefully evaluated. (This discussion actually belongs into #86.)

@stefanholder stefanholder changed the title Issue70 penalize paths Penalize inner-link U-turns (builds on #83) Dec 13, 2016
@stefanholder
Copy link
Contributor Author

The Travis CI build has a problem with the maven-failsafe-plugin, which should be unrelated to my changes. @karussell, can you please have a look?

@karussell
Copy link
Member

When finally merging this PR, if you want to squash commits please only squash commits of the same author and please do not squash the entire PR so that it is still clear which commits were contributed by which author.

I'm sorry. This happened before to @michaz too, which I'm very sorry about. I didn't indent to remove the authorship or something. We could explicitly add this here.

The problem also is that github offer this as default option which I should change now. Also squashing all commits into one is a lot simpler than making this author by author, maybe we should aim for just very few commits per PR and avoid squashing at all.

@stefanholder
Copy link
Contributor Author

stefanholder commented Dec 13, 2016

I'm sorry. This happened before to @michaz too, which I'm very sorry about. I didn't indent to remove the authorship or something.

No problem. Actually, it didn't happen to my commits but I know that the github squashing feature exist and I wanted to make you and other maintainers aware of the consequences.

maybe we should aim for just very few commits per PR and avoid squashing at all.

Sure, just let me know if you want me to squash my commits. For reviewing, it is sometimes better to have more commits, e.g. having refactorings in separate commits. This also allows to easily revert certain commits.

@karussell
Copy link
Member

Ok. I think the only improvement that squashing does is that a new feature is condensed into as few as possible commits so that it also can be easily reverted. But as I didn't use squashing ~6 months ago I think we can revert to this behaviour without tweaking authorship.

The Travis CI build has a problem with the maven-failsafe-plugin, which should be unrelated to my changes. @karussell, can you please have a look?

Looks like retriggering solved this, but we need a new GH core version deployed before.

Copy link
Member

@karussell karussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments

// Penalty in m for each U-turn performed in the path between two subsequent candidates
// Note that this penalty should roughly correspond to the penalty used for unfavored
// virtual edges because results get inconsistent otherwise.
private static final double VIRTUAL_NODE_U_TURN_PENALTY = 100;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to make it consistent with the u-turn penalty used here or here?

I.e. using the same unit (seconds) and value. If both values should be similar, what happens if the heading penalty is changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current as well as the previous transition metric use path lengths in m. We could later try to change this to path times as discuss in #86 but this should be done in a separate PR.

What we could do now is to convert HEADING_PENALTY in s into a heading penalty in m by assuming a default speed. @karussell, do you have a suggestion how to set the default speed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with it in meters - as @stefanholder says, it makes sense given the transition metric. Maybe we should allow the user to change it, however?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now converting HEADING_PENALTY time into m using a conversion speed. This also allows the user to change the penalty by adjusting HEADING_PENALTY.

private final Graph routingGraph;
private final LocationIndexMatch locationIndex;
private double measurementErrorSigma = 50.0;
private double transitionProbabilityBeta = 0.00959442;
private double transitionProbabilityBeta = 2.0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary in this PR or should we create a new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter needed to be adjusted because because the normalized transition metric was changed to the non-normalized transition metric. This was necessary because the normalized transition metric divides by the squared time difference but for some traces such as #13, all time stamps are equal. The previous way to deal with this doesn't work with directed candidates and also has other problems as explained above.

allQueryResults.addAll(qrs);
queryGraph.lookup(allQueryResults);

logger.info("================= Query results =================");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frequent logging should be avoided. Could we use a debug switch instead here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will use a static final variable that is checked before calling the logger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmh, but with many logging statements we make it more unreadable. Especially the core is just a library and should not log (or really only if necessary or informative and this then can be configured via log config). Logging is often (of course not always :) !!) a sign that certain parts are not easy to test - maybe we should fix this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually logger.debug should be called instead of logger.info for the logging in MapMatching.java. Then this logging can be easily configured via log config and we don't need a switch anymore (unless we are worried about performance).

IMO, the added logging statements make sense because they help to figure out what is really going on during map matching. This is always useful when a trace is not map matched as expected. I don't think that more tests would replace this need for logging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. logger.debug sounds reasonable to me. Furthermore we should create a good debugging experience e.g. using something like the MiniGraphUI which requires also to move this project into the core to avoid circle deps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will change this to logger.debug. A graph UI would be great indeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ding for #78 (which I've already somewhat implemented privately) - though assuming #792 is going to happen soon, we may want to wait.

/**
* Find the possible locations of each qpxEntry in the graph.
*/
private List<List<QueryResult>> findGPXEntriesInGraph(List<GPXEntry> gpxList,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be named like findQueryResults or lookupGPXEntries, findGPXEntries implies that we return something like List<GPXEntry> (?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can take care of this. This will result in a separate commit because @kodonnell added this method and I don't want to change his commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be named like findQueryResults or lookupGPXEntries, findGPXEntries implies that we return something like List (?)

That's true, lookupGPXEntries is probably better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

queryGraph.getEdgeIteratorState(iter.getEdge(), iter.getAdjNode()));
}
}
assert virtualEdges.size() == 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we throw an exception (IllegalStateException?) here with a detailed message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can throw an exception here but IllegalStateException should only be used for violated preconditions. Its javadoc says:

Signals that a method has been invoked at an illegal or inappropriate time.

However, this is an internal error. I would just use RuntimeExcption. Is this ok with you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistakes there - I didn't realise assertions are ignored without the -ea switch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

public void testTransitionLogProbability() {
HmmProbabilities instance = new HmmProbabilities();
// see #13 for the real world problem
assertEquals(0, instance.transitionLogProbability(1, 1, 0), 0.001);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you removed this? Please see #13 and #75

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I needed to remove the timeDiff check (see above), which made this unit test obsolete.

wpt.put("x", extension.getQueryResult().getSnappedPoint().lon);
wpt.put("y", extension.getQueryResult().getSnappedPoint().lat);
wpt.put("timestamp", extension.getEntry().getTime());
wpt.put("x", extension.queryResult.getSnappedPoint().lon);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we enforce calling getQueryResult() and getEntry() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kodonnell, can you please answer this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't worry me at all, and I don't know enough about Java to say which is good practice. So happy to leave to @karussell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was actually my change. Same answer as here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm usually a friend of public final stuff except where we guess that we could use subclasses and method overwriting, but the problem here is that this breaks API compatiblity for no (good) reason IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I wasn't aware that GPXExtension is part of the public API. For me it was mainly the candidate representation but for the user it is the map matched GPX entry. I will then use getters for all fields in GPXExtension.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I wasn't aware that GPXExtension is part of the public API

Every public class is available from others and forms our public API. This will probably change in java 9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -64,6 +64,7 @@
public void doPost(HttpServletRequest httpReq, HttpServletResponse httpRes)
throws ServletException, IOException {

logger.info("posted");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here logs should be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karussell, @kodonnell: Should we remove this or change to logger.debug (change was done by @kodonnell)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to get rid of it - I shouldn't have left it in there. (I couldn't figure out how to get any debug output from running the GUI, and that was me trying, I think.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

+ ", gpxListIndex:" + gpxListIndex;
}

public QueryResult getQueryResult() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because queryResult is now public final. This achieves the same as this getter with less boilerplate code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I prefer this in terms of code cleanliness/development (though I'm not sure if it's good/bad practice etc.).

public final QueryResult queryResult;
public final boolean isDirected;
public final VirtualEdgeIteratorState incomingVirtualEdge;
public final VirtualEdgeIteratorState outgoingVirtualEdge;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like having the virtual edges public, can we reduce visibility or use EdgeIteratorState instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will change this to EdgeIteratorState.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@kodonnell
Copy link
Contributor

First - nice work @stefanholder! Looks like it was pretty involved, so thanks for sticking with it. If you'd like, I'm happy to review, though it may be a few days. @karussell - can you let me know whether you'd prefer me to review this, or work on #87?

if you want to squash commits

I still don't know what squashing is, so you're safe from me!

@karussell
Copy link
Member

@karussell - can you let me know whether you'd prefer me to review this, or work on #87?

I'd prioritize this one here but this should not be in my power - please do what you think is best or fun :) !

Copy link
Contributor

@kodonnell kodonnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure this will work - but I'm happy to be led by proof, and this does seem to work for mm-uturns1. However, @stefanholder can you verify you also get the same results as below for mm-uturns2?

mm-uturns2

That's my only real concern - otherwise just a few implementation details as below. It's not noted there, but @stefanholder - is it possible to filter the issue 70 OSM data to only include relevant stuff (as per here)?

@karussell - it'd be nice to test with #73 = ) I can manually merge stuff to try, but it'd be easier if it was in the repo etc.

if (penalizedEdges.contains(edge.getEdge())) {
totalPenalty += VIRTUAL_NODE_U_TURN_PENALTY;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of points:

  • shouldn't we only be checking first/last path edge? I can't see why the middle edges would ever be penalized - and it'll be faster if we avoid checking them.
  • wondering if a HashSet is overkill (and slow). At least we could use e.g. trove IntHashSet (on edge ID). Or, if penalizedEdges is length = 2 (or 4?) and we're only comparing two edges, a loop is probably faster (and avoids allocating a new hash etc.). Maybe that's premature optimization though.

Copy link
Member

@karussell karussell Dec 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if a HashSet is overkill (and slow). At least we could use e.g. trove IntHashSet (on edge ID)

HashSet is super fast for smaller sets (in Ks or even millions) and as we do not put all nodes in there it should be preferred because an array requires to be of the size of the graph which can be in the range of 100 millions.

What we could slightly improve is doing new HashSet(penalizedVirtualEdges.size()) ... but with this change we do not need to copy it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was more that we'll only ever have 2 (or 4?) entries in the set, and only calling 'contains' twice (i.e. first/last edge of path), so I'm not sure we need a hashset. (And, if we are using one, we might as well just hash the edge IDs, hence TIntHashSet. Or we could even use int[].) However, if it's premature optimization, I'm more than happy to ignore it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An array with just two entries and then calling contains should be indeed faster. Sorry I misunderstood this.

Copy link
Contributor Author

@stefanholder stefanholder Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but with this change we do not need to copy it?

Good point, I will just use the set returned by QueryGraph.getUnfavoredVirtualEdges. @kodonnell, anything else would indeed be premature optimization IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we only be checking first/last path edge?

Good point, I changed this.

// get favored/unfavored edges:
VirtualEdgeIteratorState incomingVirtualEdge = j == 0 ? e1 : e2;
VirtualEdgeIteratorState outgoingVirtualEdge = j == 0 ? e2 : e1;
// create candidate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're ignoring the possibility of incomingVirtualEdge == outgoingVirtualEdge (i.e. a U-turn). @stefanholder - can you comment? My suspicion is it wouldn't change the results - but if that's the case, can we maybe make a note to explain to anyone else who notes this?

Copy link
Contributor Author

@stefanholder stefanholder Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean it's possible that incomingVirtualEdge==outgoingVirtualEdge? I think it's not possible that the EdgeIterator returns the same edge twice and we are already asserting that we get exactly 2 virtual edges from the EdgeIterator. But if you want I can add another check here to make sure that incomingVirtualEdge!=outgoingVirtualEdge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that it's a perfectly valid route - I go in on the same edge I leave (i.e. U-turn). My gut feeling is we should include these as candidates (which will be penalized) ... I worry that specifically ignoring them might e.g. make U-turns impossible. However, it maybe just be an implementation thing, and we don't actually need to do it. So happy for you to make the call on this one.

Copy link
Contributor Author

@stefanholder stefanholder Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we created a candidate with incomingVirtualEdge==outgoingVirtualEdge then this candidate would actually allow to perform a U-turn without a penalty by going to and from the virtual node through the other virtual edge pair.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. If it's not discussed elsewhere, and you think it worthwhile, could you update the comment just to note that we don't need to consider these cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment.

@stefanholder
Copy link
Contributor Author

However, @stefanholder can you verify you also get the same results as below for mm-uturns2?

I get the same results if I use issue-70.osm.gz as map data. The reason is that the road, the map matcher should match to, is not contained in the map file. The trace matches correctly with the entire Serbia map.

@stefanholder - is it possible to filter the issue 70 OSM data to only include relevant stuff (as per here)?

The size of issue-70.osm.gz is 11 KB, so is this really a concern? If yes, could you do this as another commit to branch issue70 and maybe also include the map data for mm-uturns2? I will then squash this commit into your first commit in this PR.

@stefanholder stefanholder force-pushed the issue70-penalize-paths branch from e4ad3a2 to 1ea7565 Compare December 19, 2016 08:36
@stefanholder
Copy link
Contributor Author

I updated my commits with the changes discussed above and squashed some commits. If you want to see only what has changed you can compare with branch issue70-penalize-paths-v0.1. The comparison doesn't work in the Github UI because both branches diverged but works in IntelliJ, for example.

Copy link
Member

@karussell karussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good to me.

I would like to get feedback from @michaz, do you (still) think this could/should be solved differently?

public GPXEntry getEntry() {
return entry;
/**
* Returns true if the snapped point is a virtual node, otherwise returns false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is isDirected the best wording here? At least the javadocs should be explicitely say 'true if the snapped point is a virtual node, otherwise it is a tower node and returns false'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that isDirected is the best wording here because there are directed and undirected candidates (GPXExtensions). Hence the javadoc (implicitly) says that all virtual nodes are directed and all real nodes are undirected.

Sure, I can change the javadoc to what you suggested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also suggest renaming it to isVirtual if "virtual node" is the GraphHopper core terminology.
In addition, we should clarify/resolve this terminology in QueryGraph, where virtual edges are introduced. Currently, neither 'directed' nor 'undirected' appears there. It says that four new edges are inserted: "base-snap, snap-base, [...]", which suggests directed edges, whereas edges in the base graph are undirected. That's a rather useful thing to know, and we should be more explicit about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point but from a map matching point of view, we either have directed or undirected candidates. Currently, all virtual nodes are transformed into directed candidates but we might later have directed candidates also for real nodes later as discussed before. So the method name isDirected would be more stable than isVirtual. Also the javadoc of this method is meant to be read in combination with the class javadoc, which should explain the whole picture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the javadoc of isDirected.

}

/**
* Returns null if this GPXExtension is not directed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead throw an exception to enforce that isDirected is always called before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is probably safer to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super-particular about these kinds of things. There's not a lot which can go wrong here -- the NullPointerException you will get is pretty clear in its meaning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the NullPointerException could be thrown much later than the IllegalStateException (potentially after null is passed a couple of times), which can make debugging hard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@michaz
Copy link
Member

michaz commented Dec 19, 2016

I would like to get feedback from @michaz, do you (still) think this could/should be solved differently?

No no, I I still like it. My concerns were more with putting other distance-like things (like time) instead of distance into the transition probabilities.

Just so I understand this correctly, and for future efforts: The reasons we are unfavoring the wrongly-directed virtual edges, rather than just not inserting them, are that

1a.) QueryGraph is the way it is.
2a.) We want to allow inner-link U-turns, just with low probability.

Similarly, the special case for "matching to a real node" is

1b.) because QueryGraph is the way it is.
2b.) because in the cases this happens, it may give better performance

3.) I think there may be some interference between unrelated things. We use one QueryGraph for everything, so every candidate introduces inner-link U-turn possibilities on routes between completely different candidates.

So, if we could tell QueryGraph to always create a pair of directed edge-matches (2 * (virtual node + 2 virtual edges)), it would fix 3.) and be more straight-forward, but at the cost of missing 2a) and 2b)?

@karussell
Copy link
Member

karussell commented Dec 19, 2016

2a.) We want to allow inner-link U-turns, just with low probability.

Yes, I think it is this case

Similarly, the special case for "matching to a real node" is

Here kind of both apply. We can also improve QueryGraph, but avoiding to create virtual nodes is good for performance especially for multiple hundreds of points. But we probably should test it before using it as an argument :)

We use one QueryGraph for everything, so every candidate introduces inner-link U-turn possibilities on routes between completely different candidates.

Here @stefanholder uses the 'unfavouring' stuff to clear and avoid interfering I think, still making QueryGraph stateful removes possibility to do matching on multiple threads (with the same QueryGraph).

So, if we could tell QueryGraph to always create a pair of directed edge-matches

You mean always creating virtual nodes&edges?

@stefanholder
Copy link
Contributor Author

stefanholder commented Dec 19, 2016

The reasons we are unfavoring the wrongly-directed virtual edges, rather than just not inserting them, are that
1a.) QueryGraph is the way it is.
2a.) We want to allow inner-link U-turns, just with low probability.

The main reason is that during candidate generation we don't know yet what the wrongly-directed virtual edges are. This is determined by the Viterbi algorithm after the router computed the penalized path lengths. Moreover, both directed candidates can be valid choices even without any U-turn if the road is not a dead end.

Similarly, the special case for "matching to a real node" is
1b.) because QueryGraph is the way it is.
2b.) because in the cases this happens, it may give better performance

I would say because we first wanted to address inner-link U-turns in this PR and not U-turns at real intersections. However, this shouldn't be hard to do in another PR if really needed.

3.) I think there may be some interference between unrelated things. We use one QueryGraph for everything, so every candidate introduces inner-link U-turn possibilities on routes between completely different candidates.

Additional virtual nodes in the QueryGraph from other candidates shouldn't be a problem because

  • Since virtual nodes retrieved by findNClosest are perpendicular to GPS positions, we don't have additional candidates for any GPS position
  • inner-link U-turns can only happen at the beginning or end of a path but not at virtual nodes from other candidates (otherwise this path wouldn't be the fastest or shortest including penalties from unfavored edges)

@michaz
Copy link
Member

michaz commented Dec 19, 2016

You mean always creating virtual nodes&edges?

I mean instead of the QueryGraph doing this

no

and then doing the direction-thing in the map matching code...

...maybe have the QueryGraph do this

yes

and then do nothing more in the map matching code.

And yes, treat all the matches like this. No special case for being EPSILON next to a real node.

@michaz
Copy link
Member

michaz commented Dec 19, 2016

Later, I mean. Not now.

@stefanholder
Copy link
Contributor Author

...maybe have the QueryGraph do this

Interesting idea, this would make virtual edges truly unidirectional. From my understanding of Graphhopper internals, this would mean splitting the flags field into two flags fields - one for each direction.

@stefanholder
Copy link
Contributor Author

stefanholder commented Dec 19, 2016

But as you wrote before, this would completely prevent inner-link U-turns instead of penalizing these. Since penalizing U-turns is more powerful than preventing U-turns, I think we should stick with penalties for inner-link U-turns.

@karussell
Copy link
Member

Interesting. I also favour penalizing u-turns over completely avoiding them. Also we would introduce a bit complexity due to two candidates '7a' and '7b' instead of one?

We could improve clarity in a later PR via fixing QueryGraph to use just two bidirectional edges (instead of 4 unidir. edges) and use the edge based traversal mode, then we could use the TurnWeighting to avoid u-turns without a special virtual edge handling in the map matching core I guess. The costs would be that the full algorithm will be roughly 2 times slower but for motor vehicles we have to consider turn costs anyway.

@michaz
Copy link
Member

michaz commented Dec 19, 2016

Allright, I'm just saying: The whole QueryGraph-virtual-node-virtual-edge-business was just what I found to be there when I did the prototype and noticed that GraphHopper generally "thinks undirectedly", (which I don't, I always think of road segments as two opposing directed edges, and storing them as one data record is just a compressed representation of that, which should ideally be abstracted away in the layer directly above data storage, wherever possible).

I did not intend it to model U-turns, this is just what happened.

If I had found a way in QueryGraph to get positions on directed-edge-like-things, I would have immediately used that.

@stefanholder
Copy link
Contributor Author

stefanholder commented Dec 19, 2016

then we could use the TurnWeighting to avoid u-turns without a special virtual edge handling in the map matching core I guess

Also with TurnWeighting we need two candidates per virtual node, one for each direction. This is because we don't know the correct direction at candidate generation (see above). Hence, I'm not sure what the benefit of TurnWeighting would be. Still it would be nice to support all traversal modes for map matching.

I did not intend it to model U-turns, this is just what happened.

Sure, that's what I thought.

@stefanholder stefanholder self-assigned this Dec 19, 2016
@karussell
Copy link
Member

but not for the correct sequence of real nodes / real edges, which is suboptimal when there are no street names such as here. Is there a reason for this?

I'm not 100% sure about the reason besides being simpler to debug. But it could be the following: although nodes&edges are stable if we do not change the source map, they could still change if something changes while import and checking just streets is more robust. The best solution which I would also like to include in the public web API would be to use OSM nodes which are okay to debug and relative stable (OSM way IDs are not that stable).

…hopper#70)

For GPX traces with equal timestamps, all transitions had a
probability of 1 and hence transitions were not considered during map
matching. With directed candidates the siutation got even worse
because it could happen that the Viterbi algorithm chose a candidate
with wrong direction because penalties from unfavored edges would still
result in a transition probability of 1. In this case the resulting
map matching path would take unnecessary detours.
@stefanholder stefanholder force-pushed the issue70-penalize-paths branch from 1ea7565 to 58417cd Compare December 20, 2016 12:39
@stefanholder
Copy link
Contributor Author

I updated my commits with the changes discussed above. The previous commits can be found in branch issue70-penalize-paths-v0.2.

Copy link
Member

@karussell karussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The car example here (also the official example of the Dir API) is also fixed with this - there were 2 obstacles before and now is a perfect match - nice! (I used this extract for the pbf)

The problem is now how we measure quality (but @kodonnell opened already an issue for this). A smaller difference is an indicator but in this specific example the distance difference now is bigger!

@karussell
Copy link
Member

Please let me know a +1 and I'll merge @michaz & @kodonnell

@stefanholder
Copy link
Contributor Author

@karussell, your last comment got thumbs up from both @michaz and @kodonnell (there is no email notification for this). Not sure if this is what you meant with +1.

@karussell
Copy link
Member

Thanks!

there is no email notification for this

How ugly. They could at least give a notification in the web UI ... this is what discourse does (in general they have a much better notification system ...)

@karussell karussell merged commit d3cf21e into graphhopper:master Dec 23, 2016
@karussell
Copy link
Member

Merged - thanks again all involved :) !

@karussell karussell added this to the 0.9 milestone Dec 23, 2016
@stefanholder
Copy link
Contributor Author

How ugly.

Yes, it's probably better that all reviewers approve a review via github pull request reviews.

@kodonnell
Copy link
Contributor

As an aside, as #73 was merged before this (thanks @karussell), we've got the following two measurement results. (We can't easily go back further.)

commits:
--------------------------

3cd6476 [2016-12-20] Performance measurement suite (#73)
d3cf21e [2016-12-23] Merge pull request #88 from stefanholder/issue70-penalize-paths

measurements:
-------------

                              3cd6476             d3cf21e             
                              -------             -------             
location_index_match.max      10.756949           5.571446            
location_index_match.mean     0.3402709212        0.19900762400000002 
location_index_match.min      0.001734            0.001277            
location_index_match.sum      1701.354606         995.03812           
map_match.max                 1645.143509         2409.088762         
map_match.mean                127.85747168        274.09888926        
map_match.min                 1.564703            3.390409            
map_match.sum                 12785.747168        27409.888926        
measurement.count             5000                5000                
measurement.seed              123                 123                 
measurement.time              19071               36710               
measurement.totalMB           1529                1524                
measurement.usedMB            50                  37                  

So, roughly speaking, the locationIndex is a little under twice as fast, but the map-matching is a little over twice as slow.

@karussell
Copy link
Member

This looks strange. Did you run it twice and were the results consistent? Maybe there are some problems in the measurement suite. E.g. why should the location index be faster?

@kodonnell
Copy link
Contributor

Did you run it twice and were the results consistent?

No - I'd assumed all the warm-up and multiple testing would make it OK - as below, I was wrong = ) Not sure if it's a bug ... though maybe there could have been varying demands on the system (just my laptop) or some such. Anyway, I've run it multiple times, and consistently got results similar to below:

commits:
--------------------------

3cd6476 [2016-12-20] Performance measurement suite (#73)
d3cf21e [2016-12-23] Merge pull request #88 from stefanholder/issue70-penalize-paths

measurements:
-------------

                              3cd6476             d3cf21e             
                              -------             -------             
location_index_match.max      5.239315            5.63816             
location_index_match.mean     0.1950378346        0.19318890019999999 
location_index_match.min      0.001261            0.001284            
location_index_match.sum      975.189173          965.944501          
map_match.max                 1619.534054         2392.89043          
map_match.mean                127.55285619        271.12588989        
map_match.min                 1.563381            3.044777            
map_match.sum                 12755.285619        27112.588989        
measurement.count             5000                5000                
measurement.seed              123                 123                 
measurement.time              18018               35912               
measurement.totalMB           1526                1522                
measurement.usedMB            37                  37        

That said, about half the time the used MB is like above, and the other half, the latest commit is 50% higher.

So ... locationIndex is the same (as expected) though map matching is slower (which is expected, given we're increasing the number of candidates). For now, I'm guessing we're not too worried by the increased runtime (as it comes with a new feature)?

@karussell
Copy link
Member

No - I'd assumed all the warm-up and multiple testing would make it OK

It should. I expect the measurement suite is suboptimal somewhere

E.g. the while loop is something that I do not like much. With the new algorithm we should make every matching working I think and throw an error for problems.

For now, I'm guessing we're not too worried by the increased runtime (as it comes with a new feature)?

It depends a bit and if we understand the underlying issue. We should try as hard as possible to avoid slow down. Otherwise we end up in powerful but slow software at some point.

which is expected, given we're increasing the number of candidates

Are we increasing them?

@kodonnell
Copy link
Contributor

Are we increasing them?

My understanding is that every virtual node (a single candidate before) becomes (two) directed candidates (virtual node + direction). Assuming every node is a virtual one, then that's twice as many candidates, and four times as many transitions (which is likely to be the slow part, as it's the routing). The viterbi processing will also be slowed down due to the increased number of candidates (and may require more memory to store everything in the maps). At this stage, I'm assuming we're only seeing a 2.5x slowdown (as opposed to 4x or more) because not every candidate is virtual.

the while loop is something that I do not like much. With the new algorithm we should make every matching working I think and throw an error for problems.

Do you mean this one? Agreed, with #87 we should (?) hopefully never fail - at worst, we get returned a list of single point sequences. That said, another option is to tweak MiniPerfTest to (optionally) catch exceptions and count/report them in the final stats, and then we could exclude the while loop. I'd only do that if it was useful in other situations.

@karussell
Copy link
Member

Assuming every node is a virtual one, then that's twice as many candidates, and four times as many transitions

Ups, indeed. BTW: We should put the one-to-many "cache" to make this a lot faster (I hope). Was this meant with #81 ?

Do you mean this one?

yes

Agreed,with #87 we should (?) hopefully never fail

Then I'd prefer the procedure you describe (throwing vs. counting errors) and then (a slightly modified) measurement suite could also act as a quality indicator like in #89

@kodonnell
Copy link
Contributor

Was this meant with #81 ?

I'm not sure what you mean, sorry.

Then I'd prefer

OK, I'll try to remember to tweak it as part of #87.

@karussell
Copy link
Member

OK, I'll try to remember to tweak it as part of #87

Better in a separate PR

@stefanholder
Copy link
Contributor Author

stefanholder commented Jan 12, 2017

We should use a one-to-many Dijkstra algorithm, which only aborts after all target nodes have been found. I think this was meant with #82. @karussell: Is this also what you meant with one-to-many "cache" above?

@karussell
Copy link
Member

@karussell: Is this also what you meant with one-to-many "cache" above?

I meant exactly this, yes. But I'm not sure of intentions of the issue creator :)

@kodonnell
Copy link
Contributor

But I'm not sure of intentions of the issue creator :)

The intent of #82 was to use DijkstraOneToMany.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants