Fix a rare race condition in SCardGetStatusChange()
Thanks to Maximilian Stein for the patch [Pcsclite-muscle] Rare race condition in SCardGetStatusChange() http://lists.infradead.org/pipermail/pcsclite-muscle/2018-April/001068.html " Hello all, recently we stumbled upon another (rare) race condition in the SCardGetStatusChange() function. I especially ask Maksim Ivanov and Florian Kaiser kindly to share their opinion about the problem and the proposed fix, since they did a good job in improving SCardGetStatusChange/SCardCancel in the recent past. The problem: The problem is basically the small gap between fetching the reader states from pcscd [winscard_clnt.c:1741 and 2119] and registering for event notifications [winscard_clnt.c:2070]. If a notification is sent in this time period, SCardGetStatusChange() will sleep until another event or the internal time-out is reached. Consider the following flow of execution and events: Precondition: [client] knows the current state of all readers from a previous call to SCardGetStatusChange() and no changes happened since then. Basically consider a thread that monitors the reader states which calls SCardGetStatusChange() in a loop. 1. [client] calls SCardGetStatusChange() and fetches the reader states from pcscd (line 1741). The fetched states are equal to the already known states. 2. [pcscd] notices a change in any terminal state (smartcard movement or number of connected clients to the smartcard changes) and sends a signal to all *registered* clients via EHSignalEventToClients(). [client] is not yet registered for event notifications, so EHSignalEventToClients() won't send anything. 3. [client] continues execution and compares the given reader states to the states fetched before [pcscd] noticed the change. The states are equal so it registers for event notifications (line 2070). No notification was sent, so [client] sleeps until the internal time-out (60 seconds) is reached. 4. [client] reaches the internal time-out and fetches the current reader states from [pcscd]. Now the states are different and SCardGetStatusChange returns. In the end the state change is noticed thanks to the internal polling mechanism, but in automated environments, 60 seconds is a very long time until a state change is detected. The error is most likely to occur if multiple readers are used and state changes are frequent. Even more likely it occurs if reader state polling is used by pcscd, i.e. when the IFD handler does not support asynchronous card event notification (no capability TAG_IFD_POLLING_THREAD_WITH_TIMEOUT). Then multiple readers can accumulate their events to be processed in a very small time frame. Suggestion for a fix: The proposed fix makes fetching the reader states and registering for event notifications an atomic action. The command CMD_WAIT_READER_STATE_CHANGE expected no return value anyway, so I made it return the reader states equal to CMD_GET_READERS_STATE. The action is protected by the ClientsWaitingForEvent_lock in eventhandler.c, which prevents parallel calls of MSGSignalClient() via EHSignalEventToClients(). This is necessary to prevent a signal before the reader states are sent, which would appear as garbage in the client socket. With the proposed fix, the client is registered for events after the reader states were fetched. So if any difference is found in the local and remote state (so that SCardGetStatusChange() returns) we have to unregister from events. This was not necessary before, but works just like unregistering after a timeout. This could be refined by checking why the loop was exited and only unregister if necessary. Unfortunately the proposed fix will slightly alter the internal protocol between libpcsclite and pcscd, breaking statically linked client applications with newer pcscd versions. Further related thoughts: I'm a bit uncertain if my proposed fix works nicely with SCardCancel(), because I can think of one very rare situation when SCardGetStatusChange() times out and unregisters from event notifications, then gets cancelled but is not informed about that. Then it re-registers for notifications, because no changes happened. Thus it will not returned despite it was cancelled. But this should have been an issue even before my fix. I think the notification mechanism could be improved by using "response headers" analogous to the server side, or just an additional field "command" in the data structs. This way every message related to reader state events could be identified by the client and handled respectively. As I understand it, some of the past issues were because of signal, cancel or stop-reader-state-change messages messing up the client socket data. With a command field it can be decided what data the client received, and what data is still expected to be received. Best regards Maximilian Stein "
Showing with 78 additions and 37 deletions