You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The mctpd daemon was loaded. During the OS operation, mctpi2c* interfaces were repeatedly created and deleted.
After some time, a device was added via the SetupEndpoint d-bus interface. The device was added, but there was no communication with it via the mctpd daemon. Only rebooting the daemon and re-adding the device helped.
I enabled debug output (by adding "-v" to mctpd) and received the following entries (the log was cleaned up a bit):
Feb 28 18:49:46 local mctpd[30382]: Adding interface #13 mctpi2c61
Feb 28 18:49:46 local mctpd[30382]: mctpd: Warning, interface mctpi2c61 is down
...
Feb 28 18:49:54 local systemd-networkd[306]: mctpi2c61: Link UP
Feb 28 18:49:54 local systemd-networkd[306]: mctpi2c61: Gained carrier
...
Feb 28 18:49:58 local mctpd[30382]: Adding local eid 8 net 1
Feb 28 18:49:58 local mctpd[30382]: mctpd: emit_endpoint_added: /xyz/openbmc_project/mctp/1/8
Feb 28 18:49:58 local pldmd[187]: Successfully initialized platform manager
Feb 28 18:50:09 local systemd-journald[69]: Forwarding to syslog missed 163 messages.
Feb 28 18:50:09 local mctpd[30382]: mctpd: read_message got from sockaddr_mctp_ext eid 9 net 1 type 0x00 if 13 hw len 1 0x32 len 6
Feb 28 18:50:09 local mctpd[30382]: Adding neigh to peer eid 9 net 1 phys physaddr if 13 hw len 1 0x32 state 1
Feb 28 18:50:09 local mctpd[30382]: Adding route to peer eid 9 net 1 phys physaddr if 13 hw len 1 0x32 state 1
Feb 28 18:50:09 local mctpd[30382]: mctpd: BUG peer_route_update: Unknown ifindex 13
Feb 28 18:50:09 local mctpd[30382]: mctpd: Failed adding route for peer eid 9 net 1 phys physaddr if 13 hw len 1 0x32 state 1: No such device
Feb 28 18:50:09 local mctpd[30382]: mctpd: endpoint_query_addr: sendto(sockaddr_mctp_ext eid 9 net 1 type 0x00 if 0 hw len 0 0x) 2 bytes failed. No route to host
Feb 28 18:50:09 local mctpd[30382]: mctpd: Error getting endpoint types for peer eid 9 net 1 phys physaddr if 13 hw len 1 0x32 state 1. Ignoring error -113 No route to host
Then I analyzed and debugged mctpd and found the essence of the problem. Receiving information and working with the kernel (in terms of managing addresses, routes and neighbors) is done via Netlink sockets. For this purpose, the context structures [1] "mctp_nl *nl" and "mctp_nl *nl_query" are defined in the ctx context. They both contain a list of kernel interfaces [2], which are filled when creating the nl, nl_query contexts.
When mctpd is running and receiving information from the kernel about network interface changes, changes are made only to the list of interfaces in the nl context (when calling the mctp_nl_handle_monitor function). The list of interfaces in the nl_query context does not change, so it becomes obsolete. Since the nl_query context is used to execute mctp_nl_route_del/mctp_nl_route_add, peer_neigh_update, peer_route_update queries, these queries operate with an obsolete interface table. I conducted an experiment by replacing nl_query with nl in the above queries and mctpd began to work correctly. I do not know for what purposes the work with the kernel Netlink interface was done through 2 contexts. In addition, each context contains 2 sockets (sd, sd_monitor), although sd_monitor is not used for the nl_query context. It turns out that the work is carried out via 3 netlink sockets. Therefore, perhaps my method is not correct. I would like to emphasize separately that this is only about exchanging netlink messages with the kernel and has nothing to do with exchanging with other mctp devices.
I also considered other ways to solve this situation. Since there is no need to keep 2 tables of network interfaces in each context, there were other options:
In the second context (nl_query), replace the array of interfaces with a pointer to the array from the first context (quick fix).
When making changes to the array of interfaces of the first context, simultaneously make changes to the second context.
In our solution we decided to use small fix, described in first option:
In the second context (nl_query), replace the array of interfaces with a pointer to the array from the first context (quick fix).
I am attaching a patch with my changes (Github doesn't allow me to attach a .patch file, so I renamed it).
The mctpd daemon was loaded. During the OS operation, mctpi2c* interfaces were repeatedly created and deleted.
After some time, a device was added via the SetupEndpoint d-bus interface. The device was added, but there was no communication with it via the mctpd daemon. Only rebooting the daemon and re-adding the device helped.
I enabled debug output (by adding "-v" to mctpd) and received the following entries (the log was cleaned up a bit):
Then I analyzed and debugged mctpd and found the essence of the problem. Receiving information and working with the kernel (in terms of managing addresses, routes and neighbors) is done via Netlink sockets. For this purpose, the context structures [1] "mctp_nl *nl" and "mctp_nl *nl_query" are defined in the ctx context. They both contain a list of kernel interfaces [2], which are filled when creating the nl, nl_query contexts.
When mctpd is running and receiving information from the kernel about network interface changes, changes are made only to the list of interfaces in the nl context (when calling the mctp_nl_handle_monitor function). The list of interfaces in the nl_query context does not change, so it becomes obsolete. Since the nl_query context is used to execute mctp_nl_route_del/mctp_nl_route_add, peer_neigh_update, peer_route_update queries, these queries operate with an obsolete interface table. I conducted an experiment by replacing nl_query with nl in the above queries and mctpd began to work correctly. I do not know for what purposes the work with the kernel Netlink interface was done through 2 contexts. In addition, each context contains 2 sockets (sd, sd_monitor), although sd_monitor is not used for the nl_query context. It turns out that the work is carried out via 3 netlink sockets. Therefore, perhaps my method is not correct. I would like to emphasize separately that this is only about exchanging netlink messages with the kernel and has nothing to do with exchanging with other mctp devices.
I also considered other ways to solve this situation. Since there is no need to keep 2 tables of network interfaces in each context, there were other options:
[1] https://github.com/CodeConstruct/mctp/blob/main/src/mctpd.c#L190
[2] https://github.com/CodeConstruct/mctp/blob/main/src/mctp-netlink.c#L44
The text was updated successfully, but these errors were encountered: