-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#671 (UKF cleanup) leads to infinite loops and hanging of the whole state estimation #777
Comments
Thanks, I'll take a look. I think I replaced those The old UKF logic wasn't really properly functioning as a UKF, so there are likely some parameters that need to be tuned now. |
@peci1 the Looking at the logs you posted, I don't see a single IMU measurement being received by the filter. You are attempting to produce a 3D state estimate, but your roll and pitch values have no reference. See this covariance matrix:
All of your absolute orientation variables have massive covariance values, presumably because (a) you have no roll/pitch reference, and (b) the correlation between yaw, roll, and pitch mean that you are going to see yaw covariance also increase. The correlation with the rest of the state variables will eventually lead to those exploding too. Looking at the bag you sent, the wheel encoder odometry is publishing for nearly 22 seconds before the IMU data even starts arriving. I echoed the first topic from each, then started playing the bag:
So you are asking the filter to produce a 3D state estimate, but starving it of 3D orientation measurements for over 20 seconds. That's going to make the covariance explode. |
The thing with missing IMU measurements in the log is a consequence of the reported problem. If I comment out receiving odom messages, the filter processes the IMU messages. If both are used, the filter gets stuck earlier than the IMU measurements start coming. You are right that the IMU messages are not coming for some time. Our IMU does bias estimation on startup and does not output anything until the biases are estimated. I didn't think this could be a problem because I'm fusing the absolute roll/pitch from IMU, so when messages start coming, the covariances should quickly get to reasonable numbers. And they did before #671. As the robot is forced to be stationary during the bias estimation, all odom messages should read zero velocity. Therefore, the filter should not be forced to project non-zero velocities along the exploding roll and pitch. I see it could be better to the filter to start it only when IMU measurements are available. Is there a setting supporting this (except the explicit pause service)? I.e., can I mark a sensor as required (so that predictions are not done if the sensor is timed out)? |
That's because the filter won't start until it receives its first measurement.
It might still do that if the filter was able to get to that point before things started freezing. For that reason, maybe try the branch I mentioned to see if it helps?
Yeah, the filter was effectively not properly implementing a UKF. Now it is, and I also found that it's more sensitive to this kind of thing. But I'd rather have the math be right, even if it means the filter has greater sensitivity to sensor measurement holidays. To be fair, 22 seconds is a very long time to go without any reference in one of your state variables.
Prediction is done using the filter's state, not the measurements. If there's a lot of instability in the covariance values, then I can see the "almost zero" state variables (they are really on the order of 1e-15) getting corrected via massive Kalman gains into non-zero values. That's not to say there isn't another issue lurking (and I'll poke around). It's just that I can see a legitimate cause for this behavior.
This does not exist, but it's a great idea (I think there might be a ticket for it?). I'd welcome a PR. :) |
Yeah, one thing I am not happy about is this part of the logs:
But then in the next prediction, we get
So the roll and pitch get projected into -pi and pi, respectively. Will dig into that. |
OK, I think I spotted an issue with how Euler angles are projected. Fix is pushed to the same |
No, that won't do what I thought. OK, I'll have to give the bag a go myself. Will try to find cycles for that. |
Good, thanks a lot! |
@peci1 can you do me a favour and add these parameters to your config file?
And then run and let me know how it goes. |
Ok, do you mean on the real robot or just running with the bag file? The robot will not be available until next week probably... |
I know it'll work on the bag file you sent, but if you have any others that caused failure, that would be great. But please at least verify on the bag you sent. |
Ok, with the new parameters, it does not get stuck on the bag. However, the localization is still erratic when the IMU messages start coming. Also, nans are lurking in (although not in any fatal way) - at least rviz says so when visualizing the odometries ( I'll test on the real robot next week. Unrelated: do you plan to backport all these news to melodic, too? I've just compiled noetic-devel branch on my Melodic PC and it seems to work ok. |
Are you by chance filtering out the "bad" values from the bag before you run again? |
What are bad values? You mean the time when only odom is coming? No, I was now testing on the same bag that's mentioned in the original post. What fixed frame do you use in rviz to get this result? I used 'odom'. |
Ah, I see, you mean the |
Good, when filtering out the old /imu_odom topic, I see exactly the result you show (when scaling up the covariances a lot in rviz). Do you have an explanation for the UKF parameter values that fixed it? |
Well, I don't think it's really a fix, as it turns out. It will solve that particular issue, but that breaks other things (like our regression tests). I know the main issue; I just need to think of a solution. In the meantime, I'd either switch to the EKF or add the feature whereby we wait for all sensor inputs before starting the filter. |
Thanks, I'll have a look. I got busy in the meantime, but I think I can do this soon. |
@ayrton04 this while loop is a disaster waiting to happen. My project is affected by this bug after migration from ROS melodic to noetic. robot_localization process is stuck at |
@daniel-zawadzki-vay that change touches far more than the rotation clamping. Please feel free to submit a separate PR with that in isolation. However, that clamping logic has been around since the very early days of the package, so I'm not clear on why it's suddenly an issue. This package is deprecated right now, so while I will definitely merge a PR to address the angle wrapping, I don't have the cycles to iterate on the UKF in general right now. |
Describe the bug
Running ukf_localization_node (or the nodelet) with #671 merged leads to the node stopping working a while after startup. The responsible commit was found with git bisect.
To Reproduce
Steps to reproduce the behavior:
node config (file odom_localization.yaml):
Launch the node with:
Attaching to the running filter with gdb, it is always stuck here:
Notice the gigantic
rotation
argument ofclampRotation()
. Apparently, the while loop has a lot to do to get this number to -2pi +2pi by subtracting 2pi at a time. Of course, using a modulo function would prevent this stucking, but if I look into the logfile, the state is apparently exploding, so it would not fix the underlying problem.Here is the debug log of the filter: debug_log.zip Apparently, the state explodes near the end of the log file. No more records are being added to this file once the filter gets stuck,
Version 2.6.3 does not result in this behavior, the filter runs normally.
The text was updated successfully, but these errors were encountered: