airynothing: (Default)
airynothing ([personal profile] airynothing) wrote in [site community profile] dw_suggestions2016-08-19 03:57 am

Fully threaded comment tracking emails using more than two References: headers

Title:
Fully threaded comment tracking emails using more than two References: headers

Area:
comment tracking emails

Summary:
Fully threaded mailreaders use the References: header in the email to build the tree structure of emails, expecting to find *all* direct ancestor messages' message-ids in the header. Currently comment notification emails only include up to two message-ids in the References: header -- one for the post, one for the top comment in the thread -- preventing the actual tree structure from being built by the mailreader. If all available parent-comment-of-parent-comment message IDs are included, a correct tree structure will be available in the mailreader.

Description:
I tracked a large comment meme and sent it to a fully threaded mailreader, only to find that the tree structure of the comments was not preserved. Threaded mailreaders use the References: header to build the tree, and all direct ancestor comments of the comment in question should be included. Currently in the email the only message-ids included in the References: header are for the post and the top-level comment. Result in the mailreader: chaos (due to the large size of the comment threads). Solution: include more references (all parent and parent-of-parent comments) in the References: header.

Poll #18022 Fully threaded comment tracking emails using more than two References: headers
Open to: Registered Users, detailed results viewable to: All, participants: 29


This suggestion:

View Answers

Should be implemented as-is.
8 (27.6%)

Should be implemented with changes. (please comment)
0 (0.0%)

Shouldn't be implemented.
0 (0.0%)

(I have no opinion)
20 (69.0%)

(Other: please comment)
1 (3.4%)

denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2017-02-21 06:41 pm (UTC)(link)
Note: this is an older suggestion that just got posted as I clean up the queue.

I believe the reason we don't do this is because there's a restriction in how long that header can be for email and part of the reason this was sitting in the queue was that I wanted to research before the discussion could happen. In the interests of getting the queue cleaned out I'm posting without that research, but if someone knows more about mail headers (or wants to look into it for me) that would be great!
azurelunatic: Several toasted ham-and-cheese sandwiches. (thirty-five ham and cheese sandwiches)

[personal profile] azurelunatic 2017-02-22 10:56 pm (UTC)(link)
I'm all for it if the technical bits work! And a scramble through the internet seems to suggest that a mail routing loop is more likely to screw up message delivery than a few extra in-reply-to.

So I think that at *minimum*, we should send references to:

* the entry
* the direct parent
* the top comment in the thread

And perhaps:

* a sensible number of grandparent comments (sensible number to be determined by whom?)
* a sensible number of older-sibling comments
* a sensible number of cousin-thread roots(?)


Looking at it, I see the potential for extra work Dreamwidth-side when sending out comments, if the number of grandparent references aren't trimmed suitably when generating the notification, in super super deep comment threads.


My research journey is below.




Looking through RFC 822, I see:

* 3.4.8. - Folding Long Header Fields
So each header field wants to be 65 characters or fewer, or else "foldable".

* 4.1 indicates (by way of 2.4) that there may be an indefinite number of in-reply-to or references lines.

* 4.6.2 and 4.6.3 indicates that message identifiers for in-reply-to and references must use the msg-id specification format.


HOWEVER, RFC 2822 is the new hotness.

It says, among other things:

2.1.1. Line Length Limits

There are two limits that this standard places on the number of
characters in a line. Each line of characters MUST be no more than
998 characters, and SHOULD be no more than 78 characters, excluding
the CRLF.

998 is for "things break"; 78 is for "this will look ugly on some displays".


2.2. Header Fields

Header fields are lines composed of a field name, followed by a colon
(":"), followed by a field body, and terminated by CRLF. A field
name MUST be composed of printable US-ASCII characters (i.e.,
characters that have values between 33 and 126, inclusive), except
colon. A field body may be composed of any US-ASCII characters,
except for CR and LF. However, a field body may contain CRLF when
used in header "folding" and "unfolding" as described in section
2.2.3. All field bodies MUST conform to the syntax described in
sections 3 and 4 of this standard.


2.2.1. Unstructured Header Field Bodies

Some field bodies in this standard are defined simply as
"unstructured" (which is specified below as any US-ASCII characters,
except for CR and LF) with no further restrictions. These are
referred to as unstructured field bodies. Semantically, unstructured
field bodies are simply to be treated as a single line of characters
with no further processing (except for header "folding" and
"unfolding" as described in section 2.2.3).

2.2.2. Structured Header Field Bodies

Some field bodies in this standard have specific syntactical
structure more restrictive than the unstructured field bodies
described above. These are referred to as "structured" field bodies.
Structured field bodies are sequences of specific lexical tokens as
described in sections 3 and 4 of this standard. Many of these tokens
are allowed (according to their syntax) to be introduced or end with
comments (as described in section 3.2.3) as well as the space (SP,
ASCII value 32) and horizontal tab (HTAB, ASCII value 9) characters
(together known as the white space characters, WSP), and those WSP
characters are subject to header "folding" and "unfolding" as
described in section 2.2.3. Semantic analysis of structured field
bodies is given along with their syntax.



http://stackoverflow.com/a/2721849 says there is no total header limit for the header body. The comments detail that they updated their answer after looking at 2822 as well as 822. Nobody has chimed in to name specific implementations that have a particularly restrictive header length limit.

https://technet.microsoft.com/en-us/library/bb124345(v=exchg.160).aspx says that Exchange admins can put a size limit on headers.

The default maximum value for headers (all fields) in a message sent *out* via Exchange 2016 is 256kb, or 262,114 characters. The length limit for an entry on Dreamwidth is 300,000 characters.

https://sourceforge.net/p/assp/wiki/SMTP_Session_Limits/ is an anti-spam SMTP proxy server settings page. I assume this is in bytes.
npSizeLocal: Message Size Limit Local Messages -- ASSP will treat outgoing messages larger than as 'No Processing' mail. default: 1 HeaderMaxLength: Maximum Header Size -- The maximum allowed header length, in bytes. At each mail hop header information is added by the mail server. A large mail header can indicate a mail loop. If the value is blank or 0 the header size will not be checked. default: 50000
So, 50ish kb.


And, just for fun:

https://www.jwz.org/doc/threading.html

The In-Reply-To header was originally defined by RFC 822, the 1982 standard for mail messages. In 2001, its definition was tightened up by RFC 2822.

RFC 822 defined the In-Reply-To header as, basically, a free-text header. The syntax of it allowed it to contain basically any text at all. The following is, literally, a legal RFC 822 In-Reply-To header:

In-Reply-To: thirty-five ham and cheese sandwiches
So you're not guaranteed to be able to parse anything useful out of In-Reply-To if it exists, and even if it contains something that looks like a Message-ID, it might not be (especially since Message-IDs and email addresses have identical syntax.)


[I need a new icon. BRB...]

References:
The References header was defined by RFC 822 in 1982. It was defined in, effectively, the same way as the In-Reply-To header was defined: which is to say, its definition was pretty useless. (Like In-Reply-To, its definition was also tightened up in 2001 by RFC 2822.)

However, the References header was also defined in 1987 by RFC 1036 (section 2.2.5), the standard for USENET news messages. That definition was much tighter and more useful than the RFC 822 definition: it asserts that this header contain a list of Message-IDs listing the parent, grandparent, great-grandparent, and so on, of this message, oldest first. That is, the direct parent of this message will be the last element of the References header.

It is not guaranteed to contain the entire tree back to the root-most message in the thread: news readers are allowed to truncate it at their discretion, and the manner in which they truncate it (from the front, from the back, or from the middle) is not defined.

Therefore, while there is useful info in the References header, it is not uncommon for multiple messages in the same thread to have seemingly-contradictory References data, so threading code must make an effort to do the right thing in the face of conflicting data.

RFC 2822 updated the mail standard to have the same semantics of References as the news standard, RFC 1036.


azurelunatic: A glittery black pin badge with a blue holographic star in the middle. (Default)

[personal profile] azurelunatic 2017-02-22 11:46 pm (UTC)(link)
1,000 lines by default for this Cisco security appliance rule: http://www.cisco.com/c/en/us/support/docs/security/email-security-appliance/118495-technote-esa-00.html
Edited (be specific, Rev. Lunatic!) 2017-02-22 23:47 (UTC)
momijizukamori: Green icon with white text - 'I do believe in phosphorylation! I do!' with a string of DNA basepairs on the bottom (Default)

[personal profile] momijizukamori 2017-02-24 02:35 am (UTC)(link)
I'm voting in favor, mostly in the hopes that these changes might lead to gmail handling comment notif threading better *g*