Modify

#1046 closed defect (worksforme)

bitlbee hangs in `ssl_read` for google talk accounts

Reported by: gcr@… Owned by:
Priority: normal Milestone:
Component: BitlBee Version: 3.0.6
Keywords: ssl, google talk, jabber, openssl Cc:
IRC client+version: Client-independent Operating System: Linux
OS version/distro:

Description

When I do this:

  • Compile bitlbee with openssl
  • Add a google talk account, with ssl
  • Connect to bitlbee
  • Wait, anywhere from 1 second to five minutes or so

Then this happens:

  • Bitlbee hangs: stops responding to pings from my IRC client. My client eventually times out and reconnects
  • The hang occurs in:
    ^C
    Program received signal SIGINT, Interrupt.
    0x00007ffff6c3e2d0 in __read_nocancel () from /usr/lib/libpthread.so.0
    (gdb) bt
    #0  0x00007ffff6c3e2d0 in __read_nocancel () from /usr/lib/libpthread.so.0
    #1  0x00007ffff7167b9a in sock_read () from /usr/lib/libcrypto.so.1.0.0
    #2  0x00007ffff7165689 in BIO_read () from /usr/lib/libcrypto.so.1.0.0
    #3  0x00007ffff7497e9a in ssl3_read_n () from /usr/lib/libssl.so.1.0.0
    #4  0x00007ffff7499555 in ssl3_read_bytes () from /usr/lib/libssl.so.1.0.0
    #5  0x00007ffff7495f6a in ssl3_read () from /usr/lib/libssl.so.1.0.0
    #6  0x000055555558c2c5 in ssl_read (conn=0x5555558414c0, buf=
        0x7fffffffdb60 " RIVMSG &bitlbee :blist\r\n", len=512) at ssl_openssl.c:208
    #7  0x000055555559d33a in jabber_read_callback (data=0x555555853200, fd=17, cond=
        B_EV_IO_READ) at io.c:169
    #8  0x000055555558218d in gaim_io_invoke (source=0x5555558640d0, condition=
        G_IO_IN, data=0x555555853b30) at events_glib.c:88
    #9  0x00007ffff7720845 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
    #10 0x00007ffff7720b78 in ?? () from /usr/lib/libglib-2.0.so.0
    #11 0x00007ffff7720f72 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
    #12 0x00005555555820ff in b_main_run () at events_glib.c:64
    #13 0x000055555557f857 in main (argc=4, argv=0x7fffffffe158) at unix.c:183
    (gdb)

Workaround:

  • Compiling with gnutls makes the problem go away.

My system: Arch Linux on x64-64, bitlbee 3.2

NON-DEVELOPER SPECULATION: It's almost as if gio calls the callback when there aren't enough bytes available. I notice that bitlbee always tries to read 512 SSL bytes at a time.

Attachments (0)

Change History (8)

comment:1 Changed at 2013-04-18T14:47:18Z by anonymous

another note: this hang was completely reproducible last night, preventing me from running bitlbee for more than about 10 minutes at a time.

comment:2 Changed at 2013-04-18T14:48:53Z by anonymous

another note, possibly unrelated: the " RIVMSG &bitlbee :blist\r\n" was the last command that I sent to bitlbee before it hangs. (I kept spamming this to detect when the issue appears)

comment:3 Changed at 2013-04-18T14:59:01Z by gcr

A backtrace from "dx" in IRC: http://dpaste.com/1063481/

comment:4 Changed at 2013-04-18T17:01:18Z by dx

Apparently SSL_pending returns 0 during the hang. In my other tests, I managed to bring bitlbee back to life by using the gtalk web chat, which caused some bytes to be sent through the socket, which caused SSL_read to return.

^C
Program received signal SIGINT, Interrupt.
0xb7fdd424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7fdd424 in __kernel_vsyscall ()
#1  0xb7c7e703 in __read_nocancel () from /lib/libpthread.so.0
#2  0xb7d5c848 in sock_read () from /usr/lib/libcrypto.so.1.0.0
#3  0xb7d59d12 in BIO_read () from /usr/lib/libcrypto.so.1.0.0
#4  0xb7e8dca0 in ssl3_read_n () from /usr/lib/libssl.so.1.0.0
#5  0xb7e8f2fd in ssl3_read_bytes () from /usr/lib/libssl.so.1.0.0
#6  0xb7e8bc1d in ssl3_read () from /usr/lib/libssl.so.1.0.0
#7  0xb7ea5949 in SSL_read () from /usr/lib/libssl.so.1.0.0
#8  0x80030b70 in ssl_read (conn=0x800cecc8, buf=0xbffff1d4 " ", len=512) at ssl_openssl.c:208
#9  0x8004049b in jabber_read_callback (data=0x800d2218, fd=13, cond=B_EV_IO_READ) at io.c:169
#10 0x80027087 in gaim_io_invoke (source=0x800d5e00, condition=G_IO_IN, data=0x800cf390) at events_glib.c:88
#11 0xb7f5334e in ?? () from /usr/lib/libglib-2.0.so.0
#12 0xb7f12773 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#13 0xb7f12b10 in ?? () from /usr/lib/libglib-2.0.so.0
#14 0xb7f12f6b in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#15 0x80026fe8 in b_main_run () at events_glib.c:64
#16 0x80024a68 in main (argc=4, argv=0xbffff744) at unix.c:183
(gdb) up
#1  0xb7c7e703 in __read_nocancel () from /lib/libpthread.so.0

[...repeated several times, don't ask why...]

(gdb) up
#8  0x80030b70 in ssl_read (conn=0x800cecc8, buf=0xbffff1d4 " ", len=512) at ssl_openssl.c:208
208             st = SSL_read( ((struct scd*)conn)->ssl, buf, len );
(gdb) p SSL_pending(((struct scd*)conn)->ssl)
$10 = 0

comment:5 Changed at 2013-04-18T17:30:33Z by dx

13:49 < dx> wilmer: SSL_pending returns 0 before SSL_read, so you could just check for that before ssl_openssl.c:208
13:50 < dx> it's probably just a workaround instead of a proper fix, since i have no idea what's causing that function to be called with nothing pending...

14:20 < dx> nevermind, you can't just check for SSL_pending, the jabber_read_callback shouldn't be called at all
14:21 < dx> getting a notification from the GIOChannel that there's something to read, and reading 0 bytes usually means the connection is closed
14:22 < dx> so if i check SSL_pending there, not only it closes all the connections, but it also closes every connection that would block
14:22 < dx> so my patch gets me stuff like "msn - Login error: Error during Passport authentication: Empty HTTP reply"

comment:6 Changed at 2013-04-18T21:54:15Z by wilmer

Note that GnuTLS is the only fully supported SSL module in BitlBee. I'll try to fix OpenSSL when I have a chance, but that won't be before the weekend or possibly not before EoM.

comment:7 Changed at 2013-04-18T22:02:36Z by dalias

The problem seems to be that SSL data is arriving on the socket that does not translate into any output from SSL_read. There is code in ssl_openssl.c's ssl_read function to handle the equivalent of EAGAIN from SSL_read, but it's never used because the socket is in blocking mode. Commenting out line 193 in ssh_handshake seems to fix the problem. The only things I'm unclear on are (1) why gnutls works even with the same issue, and (2) whether ssl_write needs additional treatment to be reliable with a nonblocking socket. If so, a safe temporary fix would be to just switch to nonblocking mode before calling SSL_read, and switch back to blocking immediately afterwards, rather than enabling nonblocking mode all the time.

comment:8 Changed at 2015-03-16T00:06:01Z by dx

Resolution: worksforme
Status: newclosed

Closing since no one has reported this happening again, and thanks to this ticket we've moved most distros to gnutls anyway.

Also holy shit this was two years ago. Where is my time going?

Modify Ticket

Action
as closed The ticket will remain with no owner.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.