Diagnosing a WCF CommunicationException

A small handful of the users of Logos Bible Software 4 are reporting that they can’t sign in to our server. Signing in just involves transmitting the user name and password over HTTPS so that the server can issue an authentication cookie, very much the same as if the user were to access https://www.logos.com/login in a browser, yet executing this method in the client code results in the following exception:

System.ServiceModel.CommunicationException: Could not connect to  
https://services.logos.com/. TCP error code 10045: The attempted  
operation is not supported for the type of object referenced.

Some quick searching reveals that “code 10045 represents WSAEOPNOTSUPP”, meaning, “The attempted operation is not supported for the type of object referenced. Usually this occurs when a socket descriptor to a socket that cannot support this operation is trying to accept a connection on a datagram socket.”

This isn’t a very helpful, especially when WCF is meant to hide all the details of the underlying sockets from the hapless application programmer who simply wants things to just work, so I did some more digging. The CommunicationException has an inner WebException, which has an inner SocketException:

System.Net.Sockets.SocketException: The attempted operation is not
  supported for the type of object referenced 
    at System.Net.Sockets.Socket.get_ConnectEx() 
    at System.Net.Sockets.Socket.BeginConnectEx(EndPoint remoteEP,
       Boolean flowContext, AsyncCallback callback, Object state) 
    at System.Net.Sockets.Socket.UnsafeBeginConnect(EndPoint remoteEP,
       AsyncCallback callback, Object state) 
    at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure,
       Socket s4, Socket s6, Socket& socket, IPAddress& address,
       ConnectSocketState state, IAsyncResult asyncResult, Int32 timeout,
       Exception& exception)

This gives the actual location of the error, which seems worth investigating. Disassembling the ConnectEx method reveals that it uses double-checked locking to initialize a static delegate named s_ConnectEx by calling WSAIoctl. If that fails, it throws a SocketException. The parameters passed to WSAIoctl reported in Reflector are GUIDs and integers, instead of symbolic identifiers, but searching the Visual C++ header files for the constants reveals that Socket is invoking the SIO_GET_EXTENSION_FUNCTION_POINTER command and asking for a function pointer to the ConnectEx function.

Now things are starting to make more sense. Winsock2 supports “layered service providers”, which allow new functionality to be added to the Winsock stack in a modular way. The SIO_GET_EXTENSION_FUNCTION_POINTER command can be used by an application to retrieve a pointer to a function that the application knows is implemented by an installed LSP. In the case of ConnectEx, the provider is the standard Microsoft Winsock2 libraries, so it should always be available. However, LSPs are chained together; presumably, if an installed LSP simply returns WSAEOPNOTSUPP because it doesn’t support the requested function, instead of passing the message to the next LSP in the chain, WSAIoctl will return WSAEOPNOTSUPP and Socket.ConnectEx will throw.

One of our users experiencing this problem reported that his Winsock catalog listing (which can be displayed by executing “netsh winsock show catalog”, or using AutoRuns) included lspcs.dll, which is an LSP for CyberSitter (an internet filtering program). When he uninstalled Cybersitter and switched to using Vista’s parental controls, Logos 4 was able to connect to our servers and sign in. It seems very likely that CyberSitter’s LSP has a bug that prevents .NET programs from accessing the ConnectEx function in the Microsoft Winsock stack. (Logos 4 is not the only program affected by this LSP.)

Posted by Bradley Grainger on November 16, 2009