Opened at 2013-04-18T14:46:22Z
Closed at 2015-03-16T00:06:01Z
#1046 closed defect (worksforme)
bitlbee hangs in `ssl_read` for google talk accounts
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | BitlBee | Version: | 3.0.6 |
Keywords: | ssl, google talk, jabber, openssl | Cc: | |
IRC client+version: | Client-independent | Operating System: | Linux |
OS version/distro: |
Description
When I do this:
- Compile bitlbee with openssl
- Add a google talk account, with ssl
- Connect to bitlbee
- Wait, anywhere from 1 second to five minutes or so
Then this happens:
- Bitlbee hangs: stops responding to pings from my IRC client. My client eventually times out and reconnects
- The hang occurs in:
^C Program received signal SIGINT, Interrupt. 0x00007ffff6c3e2d0 in __read_nocancel () from /usr/lib/libpthread.so.0 (gdb) bt #0 0x00007ffff6c3e2d0 in __read_nocancel () from /usr/lib/libpthread.so.0 #1 0x00007ffff7167b9a in sock_read () from /usr/lib/libcrypto.so.1.0.0 #2 0x00007ffff7165689 in BIO_read () from /usr/lib/libcrypto.so.1.0.0 #3 0x00007ffff7497e9a in ssl3_read_n () from /usr/lib/libssl.so.1.0.0 #4 0x00007ffff7499555 in ssl3_read_bytes () from /usr/lib/libssl.so.1.0.0 #5 0x00007ffff7495f6a in ssl3_read () from /usr/lib/libssl.so.1.0.0 #6 0x000055555558c2c5 in ssl_read (conn=0x5555558414c0, buf= 0x7fffffffdb60 " RIVMSG &bitlbee :blist\r\n", len=512) at ssl_openssl.c:208 #7 0x000055555559d33a in jabber_read_callback (data=0x555555853200, fd=17, cond= B_EV_IO_READ) at io.c:169 #8 0x000055555558218d in gaim_io_invoke (source=0x5555558640d0, condition= G_IO_IN, data=0x555555853b30) at events_glib.c:88 #9 0x00007ffff7720845 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0 #10 0x00007ffff7720b78 in ?? () from /usr/lib/libglib-2.0.so.0 #11 0x00007ffff7720f72 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0 #12 0x00005555555820ff in b_main_run () at events_glib.c:64 #13 0x000055555557f857 in main (argc=4, argv=0x7fffffffe158) at unix.c:183 (gdb)
Workaround:
- Compiling with gnutls makes the problem go away.
My system: Arch Linux on x64-64, bitlbee 3.2
NON-DEVELOPER SPECULATION: It's almost as if gio calls the callback when there aren't enough bytes available. I notice that bitlbee always tries to read 512 SSL bytes at a time.
Attachments (0)
Change History (8)
comment:1 Changed at 2013-04-18T14:47:18Z by
comment:2 Changed at 2013-04-18T14:48:53Z by
another note, possibly unrelated: the " RIVMSG &bitlbee :blist\r\n" was the last command that I sent to bitlbee before it hangs. (I kept spamming this to detect when the issue appears)
comment:3 Changed at 2013-04-18T14:59:01Z by
A backtrace from "dx" in IRC: http://dpaste.com/1063481/
comment:4 Changed at 2013-04-18T17:01:18Z by
Apparently SSL_pending returns 0 during the hang. In my other tests, I managed to bring bitlbee back to life by using the gtalk web chat, which caused some bytes to be sent through the socket, which caused SSL_read to return.
^C Program received signal SIGINT, Interrupt. 0xb7fdd424 in __kernel_vsyscall () (gdb) bt #0 0xb7fdd424 in __kernel_vsyscall () #1 0xb7c7e703 in __read_nocancel () from /lib/libpthread.so.0 #2 0xb7d5c848 in sock_read () from /usr/lib/libcrypto.so.1.0.0 #3 0xb7d59d12 in BIO_read () from /usr/lib/libcrypto.so.1.0.0 #4 0xb7e8dca0 in ssl3_read_n () from /usr/lib/libssl.so.1.0.0 #5 0xb7e8f2fd in ssl3_read_bytes () from /usr/lib/libssl.so.1.0.0 #6 0xb7e8bc1d in ssl3_read () from /usr/lib/libssl.so.1.0.0 #7 0xb7ea5949 in SSL_read () from /usr/lib/libssl.so.1.0.0 #8 0x80030b70 in ssl_read (conn=0x800cecc8, buf=0xbffff1d4 " ", len=512) at ssl_openssl.c:208 #9 0x8004049b in jabber_read_callback (data=0x800d2218, fd=13, cond=B_EV_IO_READ) at io.c:169 #10 0x80027087 in gaim_io_invoke (source=0x800d5e00, condition=G_IO_IN, data=0x800cf390) at events_glib.c:88 #11 0xb7f5334e in ?? () from /usr/lib/libglib-2.0.so.0 #12 0xb7f12773 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0 #13 0xb7f12b10 in ?? () from /usr/lib/libglib-2.0.so.0 #14 0xb7f12f6b in g_main_loop_run () from /usr/lib/libglib-2.0.so.0 #15 0x80026fe8 in b_main_run () at events_glib.c:64 #16 0x80024a68 in main (argc=4, argv=0xbffff744) at unix.c:183 (gdb) up #1 0xb7c7e703 in __read_nocancel () from /lib/libpthread.so.0 [...repeated several times, don't ask why...] (gdb) up #8 0x80030b70 in ssl_read (conn=0x800cecc8, buf=0xbffff1d4 " ", len=512) at ssl_openssl.c:208 208 st = SSL_read( ((struct scd*)conn)->ssl, buf, len ); (gdb) p SSL_pending(((struct scd*)conn)->ssl) $10 = 0
comment:5 Changed at 2013-04-18T17:30:33Z by
13:49 < dx> wilmer: SSL_pending returns 0 before SSL_read, so you could just check for that before ssl_openssl.c:208 13:50 < dx> it's probably just a workaround instead of a proper fix, since i have no idea what's causing that function to be called with nothing pending... 14:20 < dx> nevermind, you can't just check for SSL_pending, the jabber_read_callback shouldn't be called at all 14:21 < dx> getting a notification from the GIOChannel that there's something to read, and reading 0 bytes usually means the connection is closed 14:22 < dx> so if i check SSL_pending there, not only it closes all the connections, but it also closes every connection that would block 14:22 < dx> so my patch gets me stuff like "msn - Login error: Error during Passport authentication: Empty HTTP reply"
comment:6 Changed at 2013-04-18T21:54:15Z by
Note that GnuTLS is the only fully supported SSL module in BitlBee. I'll try to fix OpenSSL when I have a chance, but that won't be before the weekend or possibly not before EoM.
comment:7 Changed at 2013-04-18T22:02:36Z by
The problem seems to be that SSL data is arriving on the socket that does not translate into any output from SSL_read. There is code in ssl_openssl.c's ssl_read function to handle the equivalent of EAGAIN from SSL_read, but it's never used because the socket is in blocking mode. Commenting out line 193 in ssh_handshake seems to fix the problem. The only things I'm unclear on are (1) why gnutls works even with the same issue, and (2) whether ssl_write needs additional treatment to be reliable with a nonblocking socket. If so, a safe temporary fix would be to just switch to nonblocking mode before calling SSL_read, and switch back to blocking immediately afterwards, rather than enabling nonblocking mode all the time.
comment:8 Changed at 2015-03-16T00:06:01Z by
Resolution: | → worksforme |
---|---|
Status: | new → closed |
Closing since no one has reported this happening again, and thanks to this ticket we've moved most distros to gnutls anyway.
Also holy shit this was two years ago. Where is my time going?
another note: this hang was completely reproducible last night, preventing me from running bitlbee for more than about 10 minutes at a time.