#690 closed defect (fixed)

strip_html doesn't parse OTR messages

Reported by: ilf@… Owned by: pesco
Priority: normal Milestone:
Component: OTR Version: devel
Keywords: Cc:
IRC client+version: Client-independent Operating System: Linux
OS version/distro:


Yay for the OTR merge!

However, on OTR encrypted messages HTML is not stripped, even though $strip_html = true.

It seems, strip_html is run before OTR decrypt, which should be changed.

Attachments (0)

Change History (34)

comment:1 Changed at 2010-10-15T21:53:24Z by wilmer

Component: BitlBeeOTR
Owner: set to pesco

Hm, not sure what this is. The stripping is *definitely* done after decrypting, I just verified that. What if you turn off stripping, do you get HTML &lt; &gt; stuff instead of < > brackets?

comment:2 Changed at 2010-10-16T10:17:17Z by ilf

Just checked it out. Nope, even with $strip_html = false, the brackets are < and >.

comment:3 Changed at 2010-10-16T10:56:25Z by ilf

I tried to debug it further with xmlconsole, but with xmlconsole added, the jabber account connection is closed when I send a message. This is bzr-devel-700 and happened three times.

comment:4 Changed at 2010-10-23T10:33:05Z by harry.bitlbee@…

I had a similar problem. I deleted my keys and recreated them and (I think?) it went away.

People who don't even have OTR installed would appear in red text, and surrounded by HTML tags (<FONT FACE="Helvetica" ABSZ=12 SIZE=3 BACK="#ffffff">message</FONT>, as one example..)

Only started to happen once I used OTR. It's a nice feature, but don't know why it's doing this. set strip_html = true as well.

comment:5 Changed at 2010-10-27T16:42:55Z by anonymous

Adium has a similar problem ( ), and I think that's where this is coming from.

Adium has OTR built in, and the only time this problem has surfaced for me is when I was talking to someone who used Adium.

So far this has only happened with an MSN account. Has anyone here experienced it on other protocols as well?

comment:6 Changed at 2010-10-27T16:46:57Z by anonymous

I also think that Harry's problem "going away" may have something to do with the other party no longer automatically initiating OTR because his keys have changed.

If this is true, the problem itself is not bitlbee, and the best that can be done on this end is stripping this stuff out after it's received.

comment:7 in reply to:  6 Changed at 2010-10-27T18:17:44Z by harry.bitlbee@…

Replying to anonymous:

the other party no longer automatically initiating OTR because his keys have changed

Nah, they didn't have OTR even installed, and it happened to all the people who tried to contact me at that time (4+ people). :/

comment:8 Changed at 2010-10-27T19:47:57Z by ilf

I am experiencing this over XMPP.

comment:9 Changed at 2010-11-06T23:30:52Z by anonymous

I am also experiencing this while chatting with an adium user.

comment:10 Changed at 2010-11-15T15:01:52Z by anonymous

Experiencing this with Pidgin- and ICQ-users (I think latest version, do not know exactly) via oscar, also.

comment:11 Changed at 2011-02-10T14:57:54Z by lrm242

Same problem here, running bitlbee 3.0.1. While chatting w/ a user on Adium I'm seeing HTML, in particular I'm seeing: <FONT>msgmsgmsg</FONT>.

comment:12 Changed at 2011-02-21T22:41:17Z by matt@…

I've got the same issue. Seeing: <FONT FACE="Helvetica" ABSZ=10 SIZE=2>msg</FONT> every message can be quite annoying.

comment:13 Changed at 2011-03-07T18:59:37Z by kode54

Experiencing this problem with a Pidgin user, over XMPP, and only when OTR has been initiated on either end.

Also experiencing that my own attempts to send angle brackets results in the enclosed text being treated as an unknown HTML tag and thus not rendering anything for the Pidgin user on the other end. I don't know if that's OTR related or not.

comment:14 Changed at 2011-05-19T09:27:00Z by anonymous

I can also confirm this problem. in OTR with adium

comment:15 Changed at 2011-06-25T21:30:19Z by pesco

Status: newaccepted

does the problem persist when setting strip_html="always"?

comment:16 Changed at 2011-06-26T00:35:59Z by wilmer

The problem goes away then AFAIK./

I'm pretty sure the problem is that libotr adds the HTML. So possibly strip_html=always is pretty much exactly the desired behaviour here. Not sure if there are situations where this would strip too much though.

comment:17 Changed at 2011-06-26T15:55:46Z by pesco

even if libotr did that, the strip_html call after the filter_message_in hook should still apply. i would argue that somehow OTR causes OPT_DOES_HTML to be unset when it should be set, and thus strip_html isn't called, but i don't see how...

... after some thought i can imagine some scenarios as causes for the effect:

1) existence of other clients that send HTML in OTR messages on otherwise non-HTML protocols. e.g. on an ICQ connection, we do not call strip_html because we assume that ICQ doesn't do HTML. maybe there are clients that treat "OTR" like a seperate protocol and always put HTML in the encrypted messages while we think of it as a transparent layer or wrapper where the message format is inherited from the underlying transport.

2) libpurple becoming confused about DOES_HTML. in the oscar module we just check whether we're dealing with an ICQ or AIM user and set the flag accordingly: ICQ => no HTML, AIM => does HTML. the purple module copies the flag from libpurple. maybe they do something more fancy like looking for <html> tags in actual message content. these would be hidden by OTR, obviously.

3) HTML being stripped by something other than the common strip_html call at the end of message processing. for instance, maybe the XMPP module would already remove markup from the message during parsing of the protocol XML? then it would stand to reason for it to not set DOES_HTML, but the sender would still encapsulate the markup inside the OTR "blackbox".

comment:18 Changed at 2011-06-26T19:04:24Z by anonymous

Just got this...

<FONT FACE="Lucida Grande" ABSZ=11 SIZE=3>adium</FONT>

They were using adium on xmpp.

comment:19 Changed at 2011-06-26T22:21:22Z by pesco

a mailing list thread of relevance:
[edit: fixed link to point to the interesting mail in thread]

so i suspect that for whatever reason at least pidgin, and conceivably adium too, encrypt the HTML representation of the message and place the resulting ciphertext into both the HTML and plain part of the outgoing XMPP packet. while BitlBee (kind of understandably) assumes that it doesn't have to do strip_html on jabber, because it only looks at the plain part.

i'm further guessing that the behavior of pidgin-otr could be because it would be complicated or infeasable to do the encryption twice, for both the plain and HTML reps.

Last edited at 2011-06-27T00:07:01Z by pesco (previous) (diff)

comment:20 Changed at 2011-06-26T23:52:48Z by pesco

this is the sending hook used by pidgin-otr:

note that message is passed as one character string. i.e. no chance for the otr plugin to differentiate between plain and HTML representation, supporting suspicion described above. assuming following data flow: pidgin frontend composes HTML message, passes to libpurple. libpurple calls sending-im-msg hook, libotr encrypts HTML message. libpurple passes encrypted message to XMPP backend, XMPP backend performs HTML stripping on ciphertext (which is a no-op), ends up placing identical data into both plain and HTML parts of packet.

i think this is also exactly what happens in Adium and the cause of their re-opened bug (the more recent reports with XMPP, not the original issue with ICQ). will leave them a link here.

comment:21 Changed at 2011-06-27T00:46:45Z by pesco

so, here's what i think we should do to mitigate the problem:

given that it appears to be in fact strictly OTR-specific, it stands to reason to implement the solution in the OTR plugin and not clutter the rest of the code with it. to this end:

add an option otr_extra_strip_html that contains a list of protocols (or true for all, false for none) for which the OTR receive hook will perform strip_html if the global setting is enabled and the top level won't do it (i.e. DOES_HTML not set for this connection). default: jabber.

or maybe, if that's not too complicated to do, this could be a per-account setting with protocol-specific defaults (jabber: true, others: false).

does that sound about right?

comment:22 Changed at 2011-06-27T10:18:17Z by pesco

in fact, i just realized we need to do more. if OTR-aware Jabber clients are expected to always treat the decrypted message as HTML, we must also always HTML-encode Jabber messages before OTR encryption. this is the reason for the disappearing angle brackets of comment 13.

so the option should actually be named something like otr_does_html and control which protocols/connections get both forced HTML encoding and decoding in OTR. global $strip_html=false should probably still override the decoding side, just to be consistent.

comment:23 Changed at 2011-06-27T12:29:41Z by wilmer

Sounds good. Does this mean tha there's no way to predict if we can expect HTML from messages or not? Do other clients just guess about this? :-/

Sounds like it's even possible to get both from one connection..

Last edited at 2011-06-27T12:30:27Z by wilmer (previous) (diff)

comment:24 Changed at 2011-06-27T22:45:01Z by pesco

i had a bit of a thought about the situation on ICQ and let me go out on a limb here and guess: ICQ does HTML stripping server-side?!

i.e. "fancy" ICQ client connects to ICQ server, server notices somehow that this is a fancy client and expects it to send HTML. we connect to ICQ server as a "plain" client and server strips HTML from fancy people's messages for us. if we use OTR in this scenario, the fancy client would encapsulate its HTML in the encryption, effectively tunneling it past the stripping on the server.

comment:25 in reply to:  23 Changed at 2011-06-27T23:30:30Z by pesco

Replying to wilmer:

Does this mean tha there's no way to predict if we can expect HTML from messages or not? Do other clients just guess about this? :-/

well, for Jabber it seems it is defacto standard that OTR messages are always HTML. this is what pidgin-otr implements and that's kind of like the reference implementation. i don't think it's pretty but it's what we have right now. NB: i think this must go in the OTR spec somehow - all clients need to agree here.

of course we can have clients putting non-HTML-encoded text in there, but it's easy to argue that they are wrong unless the standard (defacto or not) changes. cf. the "disappearing HTML tags" issue above.

so that's the situation for Jabber. relatively clear -- just treat the encrypted channel as HTML, period.

ICQ, if it behaves the way i described above, is much worse. because there, clients are allowed to send as HTML or not, whichever they please, and we are never told! so in that case we really can never know what to expect inside an OTR packet and it will differ from person to person. so I guess the safest thing we can do here is to offer the option, just like for Jabber, and tell people to use it if they get HTML crap. i mean, a fancy solution might be some magic HTML autodetect, but that's for another ticket.

one more thing about ICQ, what to send in OTR messages, HTML-encoded or not? this is the same problem of course, just looking from the other direction. we're tunneling past any server-side mechanisms and have no way of knowing what the other side expects. if we send plain and they expect HTML, angle brackets and stuff will be swallowed. if we send HTML and they expect plain, they will see HTML entities. fortunately, if they send us HTML, they will also expect HTML, so the already discussed option is enough.

sooo... i'll implement the option as discribed above, otr_does_html; per-account, default true for jabber, false for others.

comment:26 Changed at 2011-06-28T03:12:51Z by anonymous

otr settings should be per user because it depends on the client? (same with the rest of the otr stuff?)

comment:27 in reply to:  26 Changed at 2011-06-28T09:42:46Z by pesco

Replying to anonymous:

otr settings should be per user because it depends on the client? (same with the rest of the otr stuff?)

hm. maybe, there could be a per-user override. it would be useful i guess. but we don't have infrastructure for per-user settings so far, do we? all the OTR data like fingerprints etc. is kept by libotr, IIRC.

comment:28 Changed at 2011-06-28T11:56:54Z by Wilmer van der Gaast <wilmer@…>

We don't and we won't. I can't other clients would have a setting like this either. Plus, it won't solve the problem as a person won't always be using the same client.

I wonder how other clients do this. They also have to figure out if they need to render HTML or not, if <stuff like tihs> should be shown as-is or rendered as HTML, etc.

what a lovely bunch of people BTW:

"Frankly, I care not a single whit what happens with HTML, and I don't see why anyone should put any effort into its handling until everything else works properly." has a whole debate about it. Great work guys, ignore the problem so you give it time to become even more of a fucking mess. :-(

comment:29 Changed at 2011-06-28T13:12:45Z by pesco

the followup thread on the dev list actually contains an enlightening post, by Ian Goldberg (main dev if i'm not completely mistaken):

especially this bit:

So here's my proposal, which is of course open for debate.

- Specify that OTR plaintext is UTF-8 text/xhtml.

- Clarify in the section on encrypting/decrypting that you need to
  convert your plaintext to/from that format.

unfortunately, the thread didn't reach a conclusion. so maybe we should just implement it like that, if we can find that in won't break interop with a ton of clients. but like i said, we would be no worse than pidgin-otr, if that means anything.

NB: one of the replies to that email made a case for the "inherit encoding from underlying transport" option on the standpoint of non-HTML clients being troubled by OTR carrying HTML over otherwise non-HTML protocols. even though up to now i thought that encoding inheritance was the intended way, i don't follow the view of the other solution being unduly complicated. we can see by our own example that "OTR always carries text/xhtml" is actually easier to work with in practice.

comment:30 Changed at 2011-06-29T01:47:02Z by pesco

implemented a global setting otr_does_html (default: true) that causes OTR plaintext to always be considered HTML. the message filters of the OTR plugin perform stripping/escaping as necessary. global $strip_html is still honored. tested with two BitlBee instances over Jabber.

please pull from usual place ( and close ticket if you think this solution is adequate.

comment:31 Changed at 2011-07-24T12:29:14Z by wilmer

Hrm, instead of duplicating the stripping code, what if you just set OPT_DOES_HTML for all accounts? OTR can't be (un)loaded at runtime and I don't intend to ever change that, so just setting that flag for all accounts gets you this behaviour for free.

The only problem is, *when* to do this. Do we have the hooks we need for that?

I'm fine with this patch too (don't want to get in your way for things that only affect OTR), just let me know.

comment:32 Changed at 2011-07-29T07:20:59Z by pesco

we would have to set DOES_HTML temporarily, only while processing incoming/outgoing OTR messages. i'm not sure we have the right hooks for that. my guess from the top of my head is that something would need to be rearranged in the code to accomodate.

apart from that, i'd see the DOES_HMTL as pertaining to the underlying IM protocol from which the OTR channel is considered separate as per previous discussion. so twiddling that bit just to get some behavior we need inside the OTR channel to one particular user feels somewhat hackish. granted, so does (at least a little bit) the looking at the flag to see whether a given msg is already in HTML or not (i.e. do we need to do something to it); but arguably, that's what the flag is for, isn't it?

so in short, i'd keep it the way i did it.

comment:33 Changed at 2011-07-31T23:01:11Z by wilmer

Resolution: fixed
Status: acceptedclosed

Oh yes, you're completely right. I forgot not all messages are OTR, definitely not a good solution then.

Merged in changeset:devel,803.

Thank you! I'll close this bug, maybe there are some more I should close though.

comment:34 Changed at 2015-03-24T16:06:47Z by devoid

..just a note

Seems there was a typo in the ML link that pesco posted. The correct link is the one below (777 instead of 772.)

Modify Ticket

as closed The owner will remain pesco.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment

E-mail address and name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.