-
Notifications
You must be signed in to change notification settings - Fork 273
Penalize inner-link U-turns (builds on #83) #88
Penalize inner-link U-turns (builds on #83) #88
Conversation
.hints(new PMap().put(Parameters.CH.DISABLE, false)) | ||
.maxVisitedNodes(1000) | ||
.hints(new PMap().put(Parameters.CH.DISABLE, false) | ||
//TODO Fix that CH routing still uses default penalty of 300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testUTurns for CH still fails because CH routing does not consider the set parameter value for Parameters.Routing.HEADING_PENALTY. @karussell, can you please help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CH cannot consider this value in most cases due to theoretical limitations, but sometimes it works and you could try if here this would be the cases via put(CH.FORCE_HEADING, true) if not, then we would need to exclude the test for CH somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put(CH.FORCE_HEADING, true) did not work, so I excluded the test for CH.
*/ | ||
public double transitionLogProbability(double routeLength, double linearDistance, | ||
double timeDiff) { | ||
if (timeDiff == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For GPX traces with equal timestamps, all transitions had a probability of 1 and hence transitions were not considered during map matching. With directed candidates the siutation got even worse because it could happen that the Viterbi algorithm chose a candidate with wrong direction because penalties from unfavored edges would still result in a transition probability of 1. In this case the resulting map matching path would take unnecessary detours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because penalties from unfavored edges would still result in a transition probability of 1
@stefanholder would you mind to explain this to me? Do we just pick the distance then? But is this not somehow conflicting with the FastestWeighting we use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With conflicting I mean e.g. when we calculate the correct fastest path going on a motorway and then still prefer the track besides the motorway which is shorter (but slower), because the matchings were slightly closer to the track. But using the fastest weighting for me implies that we should prefer matching to fast roads even if there are no time stamps attached. So more generally I think we should even use path.getWeight() so that one could in theory use a completely different road preference (not sure if this is useful, so maybe we should stick to time/fastest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So more generally I think we should even use path.getWeight()
Good idea. I think we're all agreed that it'd be good to allow the user to configure their transition metric preference (or at least base it on the algoOptions). Maybe a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But using the fastest weighting for me implies that we should prefer matching to fast roads even if there are no time stamps attached.
What I tried to explain in the original comment was that the timeDiff==0 check was broken and that the consequences would be even worse for directed candidates so this had to be fixed. The code change above makes sure that penalties from unfavored edges are also considered if all timestamps are equal.
With FastestWeighting and the route length transition metric we are currently using the map matcher favors the shortest routes out of all fastest routes between candidate pairs (independent of whether timestamps are equal or not). This means that the map matcher actually achieves a compromise between fast and short routes. Moreover, we are using the transition metric suggested and verified by Newson & Krumm. Changing this should be done in a separate PR and carefully evaluated. (This discussion actually belongs into #86.)
The Travis CI build has a problem with the maven-failsafe-plugin, which should be unrelated to my changes. @karussell, can you please have a look? |
I'm sorry. This happened before to @michaz too, which I'm very sorry about. I didn't indent to remove the authorship or something. We could explicitly add this here. The problem also is that github offer this as default option which I should change now. Also squashing all commits into one is a lot simpler than making this author by author, maybe we should aim for just very few commits per PR and avoid squashing at all. |
No problem. Actually, it didn't happen to my commits but I know that the github squashing feature exist and I wanted to make you and other maintainers aware of the consequences.
Sure, just let me know if you want me to squash my commits. For reviewing, it is sometimes better to have more commits, e.g. having refactorings in separate commits. This also allows to easily revert certain commits. |
Ok. I think the only improvement that squashing does is that a new feature is condensed into as few as possible commits so that it also can be easily reverted. But as I didn't use squashing ~6 months ago I think we can revert to this behaviour without tweaking authorship.
Looks like retriggering solved this, but we need a new GH core version deployed before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comments
// Penalty in m for each U-turn performed in the path between two subsequent candidates | ||
// Note that this penalty should roughly correspond to the penalty used for unfavored | ||
// virtual edges because results get inconsistent otherwise. | ||
private static final double VIRTUAL_NODE_U_TURN_PENALTY = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current as well as the previous transition metric use path lengths in m. We could later try to change this to path times as discuss in #86 but this should be done in a separate PR.
What we could do now is to convert HEADING_PENALTY in s into a heading penalty in m by assuming a default speed. @karussell, do you have a suggestion how to set the default speed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with it in meters - as @stefanholder says, it makes sense given the transition metric. Maybe we should allow the user to change it, however?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now converting HEADING_PENALTY time into m using a conversion speed. This also allows the user to change the penalty by adjusting HEADING_PENALTY.
private final Graph routingGraph; | ||
private final LocationIndexMatch locationIndex; | ||
private double measurementErrorSigma = 50.0; | ||
private double transitionProbabilityBeta = 0.00959442; | ||
private double transitionProbabilityBeta = 2.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary in this PR or should we create a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter needed to be adjusted because because the normalized transition metric was changed to the non-normalized transition metric. This was necessary because the normalized transition metric divides by the squared time difference but for some traces such as #13, all time stamps are equal. The previous way to deal with this doesn't work with directed candidates and also has other problems as explained above.
allQueryResults.addAll(qrs); | ||
queryGraph.lookup(allQueryResults); | ||
|
||
logger.info("================= Query results ================="); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frequent logging should be avoided. Could we use a debug switch instead here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will use a static final variable that is checked before calling the logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmh, but with many logging statements we make it more unreadable. Especially the core is just a library and should not log (or really only if necessary or informative and this then can be configured via log config). Logging is often (of course not always :) !!) a sign that certain parts are not easy to test - maybe we should fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually logger.debug should be called instead of logger.info for the logging in MapMatching.java. Then this logging can be easily configured via log config and we don't need a switch anymore (unless we are worried about performance).
IMO, the added logging statements make sense because they help to figure out what is really going on during map matching. This is always useful when a trace is not map matched as expected. I don't think that more tests would replace this need for logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. logger.debug sounds reasonable to me. Furthermore we should create a good debugging experience e.g. using something like the MiniGraphUI which requires also to move this project into the core to avoid circle deps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will change this to logger.debug. A graph UI would be great indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | ||
* Find the possible locations of each qpxEntry in the graph. | ||
*/ | ||
private List<List<QueryResult>> findGPXEntriesInGraph(List<GPXEntry> gpxList, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be named like findQueryResults or lookupGPXEntries, findGPXEntries implies that we return something like List<GPXEntry>
(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can take care of this. This will result in a separate commit because @kodonnell added this method and I don't want to change his commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be named like findQueryResults or lookupGPXEntries, findGPXEntries implies that we return something like List (?)
That's true, lookupGPXEntries
is probably better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
queryGraph.getEdgeIteratorState(iter.getEdge(), iter.getAdjNode())); | ||
} | ||
} | ||
assert virtualEdges.size() == 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we throw an exception (IllegalStateException?) here with a detailed message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can throw an exception here but IllegalStateException should only be used for violated preconditions. Its javadoc says:
Signals that a method has been invoked at an illegal or inappropriate time.
However, this is an internal error. I would just use RuntimeExcption
. Is this ok with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistakes there - I didn't realise assertions are ignored without the -ea switch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
public void testTransitionLogProbability() { | ||
HmmProbabilities instance = new HmmProbabilities(); | ||
// see #13 for the real world problem | ||
assertEquals(0, instance.transitionLogProbability(1, 1, 0), 0.001); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I needed to remove the timeDiff check (see above), which made this unit test obsolete.
wpt.put("x", extension.getQueryResult().getSnappedPoint().lon); | ||
wpt.put("y", extension.getQueryResult().getSnappedPoint().lat); | ||
wpt.put("timestamp", extension.getEntry().getTime()); | ||
wpt.put("x", extension.queryResult.getSnappedPoint().lon); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we enforce calling getQueryResult() and getEntry() instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kodonnell, can you please answer this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't worry me at all, and I don't know enough about Java to say which is good practice. So happy to leave to @karussell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this was actually my change. Same answer as here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm usually a friend of public final stuff except where we guess that we could use subclasses and method overwriting, but the problem here is that this breaks API compatiblity for no (good) reason IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I wasn't aware that GPXExtension is part of the public API. For me it was mainly the candidate representation but for the user it is the map matched GPX entry. I will then use getters for all fields in GPXExtension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I wasn't aware that GPXExtension is part of the public API
Every public class is available from others and forms our public API. This will probably change in java 9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -64,6 +64,7 @@ | |||
public void doPost(HttpServletRequest httpReq, HttpServletResponse httpRes) | |||
throws ServletException, IOException { | |||
|
|||
logger.info("posted"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here logs should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karussell, @kodonnell: Should we remove this or change to logger.debug (change was done by @kodonnell)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to get rid of it - I shouldn't have left it in there. (I couldn't figure out how to get any debug output from running the GUI, and that was me trying, I think.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
+ ", gpxListIndex:" + gpxListIndex; | ||
} | ||
|
||
public QueryResult getQueryResult() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because queryResult is now public final. This achieves the same as this getter with less boilerplate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - I prefer this in terms of code cleanliness/development (though I'm not sure if it's good/bad practice etc.).
public final QueryResult queryResult; | ||
public final boolean isDirected; | ||
public final VirtualEdgeIteratorState incomingVirtualEdge; | ||
public final VirtualEdgeIteratorState outgoingVirtualEdge; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like having the virtual edges public, can we reduce visibility or use EdgeIteratorState instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will change this to EdgeIteratorState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
First - nice work @stefanholder! Looks like it was pretty involved, so thanks for sticking with it. If you'd like, I'm happy to review, though it may be a few days. @karussell - can you let me know whether you'd prefer me to review this, or work on #87?
I still don't know what squashing is, so you're safe from me! |
I'd prioritize this one here but this should not be in my power - please do what you think is best or fun :) ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure this will work - but I'm happy to be led by proof, and this does seem to work for mm-uturns1. However, @stefanholder can you verify you also get the same results as below for mm-uturns2?
That's my only real concern - otherwise just a few implementation details as below. It's not noted there, but @stefanholder - is it possible to filter the issue 70 OSM data to only include relevant stuff (as per here)?
@karussell - it'd be nice to test with #73 = ) I can manually merge stuff to try, but it'd be easier if it was in the repo etc.
if (penalizedEdges.contains(edge.getEdge())) { | ||
totalPenalty += VIRTUAL_NODE_U_TURN_PENALTY; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of points:
- shouldn't we only be checking first/last path edge? I can't see why the middle edges would ever be penalized - and it'll be faster if we avoid checking them.
- wondering if a HashSet is overkill (and slow). At least we could use e.g. trove IntHashSet (on edge ID). Or, if penalizedEdges is length = 2 (or 4?) and we're only comparing two edges, a loop is probably faster (and avoids allocating a new hash etc.). Maybe that's premature optimization though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if a HashSet is overkill (and slow). At least we could use e.g. trove IntHashSet (on edge ID)
HashSet is super fast for smaller sets (in Ks or even millions) and as we do not put all nodes in there it should be preferred because an array requires to be of the size of the graph which can be in the range of 100 millions.
What we could slightly improve is doing new HashSet(penalizedVirtualEdges.size())
... but with this change we do not need to copy it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was more that we'll only ever have 2 (or 4?) entries in the set, and only calling 'contains' twice (i.e. first/last edge of path), so I'm not sure we need a hashset. (And, if we are using one, we might as well just hash the edge IDs, hence TIntHashSet. Or we could even use int[].) However, if it's premature optimization, I'm more than happy to ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An array with just two entries and then calling contains should be indeed faster. Sorry I misunderstood this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but with this change we do not need to copy it?
Good point, I will just use the set returned by QueryGraph.getUnfavoredVirtualEdges
. @kodonnell, anything else would indeed be premature optimization IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we only be checking first/last path edge?
Good point, I changed this.
// get favored/unfavored edges: | ||
VirtualEdgeIteratorState incomingVirtualEdge = j == 0 ? e1 : e2; | ||
VirtualEdgeIteratorState outgoingVirtualEdge = j == 0 ? e2 : e1; | ||
// create candidate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're ignoring the possibility of incomingVirtualEdge == outgoingVirtualEdge (i.e. a U-turn). @stefanholder - can you comment? My suspicion is it wouldn't change the results - but if that's the case, can we maybe make a note to explain to anyone else who notes this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean it's possible that incomingVirtualEdge==outgoingVirtualEdge? I think it's not possible that the EdgeIterator returns the same edge twice and we are already asserting that we get exactly 2 virtual edges from the EdgeIterator. But if you want I can add another check here to make sure that incomingVirtualEdge!=outgoingVirtualEdge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that it's a perfectly valid route - I go in on the same edge I leave (i.e. U-turn). My gut feeling is we should include these as candidates (which will be penalized) ... I worry that specifically ignoring them might e.g. make U-turns impossible. However, it maybe just be an implementation thing, and we don't actually need to do it. So happy for you to make the call on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we created a candidate with incomingVirtualEdge==outgoingVirtualEdge then this candidate would actually allow to perform a U-turn without a penalty by going to and from the virtual node through the other virtual edge pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. If it's not discussed elsewhere, and you think it worthwhile, could you update the comment just to note that we don't need to consider these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment.
I get the same results if I use issue-70.osm.gz as map data. The reason is that the road, the map matcher should match to, is not contained in the map file. The trace matches correctly with the entire Serbia map.
The size of issue-70.osm.gz is 11 KB, so is this really a concern? If yes, could you do this as another commit to branch issue70 and maybe also include the map data for mm-uturns2? I will then squash this commit into your first commit in this PR. |
e4ad3a2
to
1ea7565
Compare
I updated my commits with the changes discussed above and squashed some commits. If you want to see only what has changed you can compare with branch issue70-penalize-paths-v0.1. The comparison doesn't work in the Github UI because both branches diverged but works in IntelliJ, for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks good to me.
I would like to get feedback from @michaz, do you (still) think this could/should be solved differently?
public GPXEntry getEntry() { | ||
return entry; | ||
/** | ||
* Returns true if the snapped point is a virtual node, otherwise returns false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is isDirected
the best wording here? At least the javadocs should be explicitely say 'true if the snapped point is a virtual node, otherwise it is a tower node and returns false'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that isDirected
is the best wording here because there are directed and undirected candidates (GPXExtensions). Hence the javadoc (implicitly) says that all virtual nodes are directed and all real nodes are undirected.
Sure, I can change the javadoc to what you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also suggest renaming it to isVirtual
if "virtual node" is the GraphHopper core terminology.
In addition, we should clarify/resolve this terminology in QueryGraph
, where virtual edges are introduced. Currently, neither 'directed' nor 'undirected' appears there. It says that four new edges are inserted: "base-snap, snap-base, [...]", which suggests directed edges, whereas edges in the base graph are undirected. That's a rather useful thing to know, and we should be more explicit about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point but from a map matching point of view, we either have directed or undirected candidates. Currently, all virtual nodes are transformed into directed candidates but we might later have directed candidates also for real nodes later as discussed before. So the method name isDirected
would be more stable than isVirtual
. Also the javadoc of this method is meant to be read in combination with the class javadoc, which should explain the whole picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the javadoc of isDirected
.
} | ||
|
||
/** | ||
* Returns null if this GPXExtension is not directed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead throw an exception to enforce that isDirected
is always called before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this is probably safer to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super-particular about these kinds of things. There's not a lot which can go wrong here -- the NullPointerException
you will get is pretty clear in its meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the NullPointerException
could be thrown much later than the IllegalStateException
(potentially after null
is passed a couple of times), which can make debugging hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
No no, I I still like it. My concerns were more with putting other distance-like things (like time) instead of distance into the transition probabilities. Just so I understand this correctly, and for future efforts: The reasons we are unfavoring the wrongly-directed virtual edges, rather than just not inserting them, are that 1a.) Similarly, the special case for "matching to a real node" is 1b.) because 3.) I think there may be some interference between unrelated things. We use one So, if we could tell QueryGraph to always create a pair of directed edge-matches (2 * (virtual node + 2 virtual edges)), it would fix 3.) and be more straight-forward, but at the cost of missing 2a) and 2b)? |
Yes, I think it is this case
Here kind of both apply. We can also improve QueryGraph, but avoiding to create virtual nodes is good for performance especially for multiple hundreds of points. But we probably should test it before using it as an argument :)
Here @stefanholder uses the 'unfavouring' stuff to clear and avoid interfering I think, still making QueryGraph stateful removes possibility to do matching on multiple threads (with the same QueryGraph).
You mean always creating virtual nodes&edges? |
The main reason is that during candidate generation we don't know yet what the wrongly-directed virtual edges are. This is determined by the Viterbi algorithm after the router computed the penalized path lengths. Moreover, both directed candidates can be valid choices even without any U-turn if the road is not a dead end.
I would say because we first wanted to address inner-link U-turns in this PR and not U-turns at real intersections. However, this shouldn't be hard to do in another PR if really needed.
Additional virtual nodes in the QueryGraph from other candidates shouldn't be a problem because
|
I mean instead of the and then doing the direction-thing in the map matching code... ...maybe have the and then do nothing more in the map matching code. And yes, treat all the matches like this. No special case for being |
Later, I mean. Not now. |
Interesting idea, this would make virtual edges truly unidirectional. From my understanding of Graphhopper internals, this would mean splitting the flags field into two flags fields - one for each direction. |
But as you wrote before, this would completely prevent inner-link U-turns instead of penalizing these. Since penalizing U-turns is more powerful than preventing U-turns, I think we should stick with penalties for inner-link U-turns. |
Interesting. I also favour penalizing u-turns over completely avoiding them. Also we would introduce a bit complexity due to two candidates '7a' and '7b' instead of one? We could improve clarity in a later PR via fixing QueryGraph to use just two bidirectional edges (instead of 4 unidir. edges) and use the edge based traversal mode, then we could use the TurnWeighting to avoid u-turns without a special virtual edge handling in the map matching core I guess. The costs would be that the full algorithm will be roughly 2 times slower but for motor vehicles we have to consider turn costs anyway. |
Allright, I'm just saying: The whole I did not intend it to model U-turns, this is just what happened. If I had found a way in |
Also with TurnWeighting we need two candidates per virtual node, one for each direction. This is because we don't know the correct direction at candidate generation (see above). Hence, I'm not sure what the benefit of TurnWeighting would be. Still it would be nice to support all traversal modes for map matching.
Sure, that's what I thought. |
I'm not 100% sure about the reason besides being simpler to debug. But it could be the following: although nodes&edges are stable if we do not change the source map, they could still change if something changes while import and checking just streets is more robust. The best solution which I would also like to include in the public web API would be to use OSM nodes which are okay to debug and relative stable (OSM way IDs are not that stable). |
…hopper#70) For GPX traces with equal timestamps, all transitions had a probability of 1 and hence transitions were not considered during map matching. With directed candidates the siutation got even worse because it could happen that the Viterbi algorithm chose a candidate with wrong direction because penalties from unfavored edges would still result in a transition probability of 1. In this case the resulting map matching path would take unnecessary detours.
1ea7565
to
58417cd
Compare
I updated my commits with the changes discussed above. The previous commits can be found in branch issue70-penalize-paths-v0.2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. The car example here (also the official example of the Dir API) is also fixed with this - there were 2 obstacles before and now is a perfect match - nice! (I used this extract for the pbf)
The problem is now how we measure quality (but @kodonnell opened already an issue for this). A smaller difference is an indicator but in this specific example the distance difference now is bigger!
Please let me know a +1 and I'll merge @michaz & @kodonnell |
@karussell, your last comment got thumbs up from both @michaz and @kodonnell (there is no email notification for this). Not sure if this is what you meant with +1. |
Thanks!
How ugly. They could at least give a notification in the web UI ... this is what discourse does (in general they have a much better notification system ...) |
Merged - thanks again all involved :) ! |
Yes, it's probably better that all reviewers approve a review via github pull request reviews. |
As an aside, as #73 was merged before this (thanks @karussell), we've got the following two measurement results. (We can't easily go back further.)
So, roughly speaking, the locationIndex is a little under twice as fast, but the map-matching is a little over twice as slow. |
This looks strange. Did you run it twice and were the results consistent? Maybe there are some problems in the measurement suite. E.g. why should the location index be faster? |
No - I'd assumed all the warm-up and multiple testing would make it OK - as below, I was wrong = ) Not sure if it's a bug ... though maybe there could have been varying demands on the system (just my laptop) or some such. Anyway, I've run it multiple times, and consistently got results similar to below:
That said, about half the time the used MB is like above, and the other half, the latest commit is 50% higher. So ... locationIndex is the same (as expected) though map matching is slower (which is expected, given we're increasing the number of candidates). For now, I'm guessing we're not too worried by the increased runtime (as it comes with a new feature)? |
It should. I expect the measurement suite is suboptimal somewhere E.g. the while loop is something that I do not like much. With the new algorithm we should make every matching working I think and throw an error for problems.
It depends a bit and if we understand the underlying issue. We should try as hard as possible to avoid slow down. Otherwise we end up in powerful but slow software at some point.
Are we increasing them? |
My understanding is that every virtual node (a single candidate before) becomes (two) directed candidates (virtual node + direction). Assuming every node is a virtual one, then that's twice as many candidates, and four times as many transitions (which is likely to be the slow part, as it's the routing). The viterbi processing will also be slowed down due to the increased number of candidates (and may require more memory to store everything in the maps). At this stage, I'm assuming we're only seeing a 2.5x slowdown (as opposed to 4x or more) because not every candidate is virtual.
Do you mean this one? Agreed, with #87 we should (?) hopefully never fail - at worst, we get returned a list of single point sequences. That said, another option is to tweak MiniPerfTest to (optionally) catch exceptions and count/report them in the final stats, and then we could exclude the while loop. I'd only do that if it was useful in other situations. |
Ups, indeed. BTW: We should put the one-to-many "cache" to make this a lot faster (I hope). Was this meant with #81 ?
yes
Then I'd prefer the procedure you describe (throwing vs. counting errors) and then (a slightly modified) measurement suite could also act as a quality indicator like in #89 |
Better in a separate PR |
We should use a one-to-many Dijkstra algorithm, which only aborts after all target nodes have been found. I think this was meant with #82. @karussell: Is this also what you meant with one-to-many "cache" above? |
I meant exactly this, yes. But I'm not sure of intentions of the issue creator :) |
The intent of #82 was to use DijkstraOneToMany. |
Things to note
When looking at the unit tests I was wondering why the map match results are only checked for the correct sequence of street names but not for the correct sequence of real nodes / real edges. Is there a reason for this?
@karussell, @kodonnell: When finally merging this PR, if you want to squash commits please only squash commits of the same author and please do not squash the entire PR so that it is still clear which commits were contributed by which author.