egg changed the topic of #kspacademia to: https://git.io/JqLs2 | Dogs are cats. Spiders are cat interferometers. | Document well, for tomorrow you may get mauled by a ネコバス. | <UmbralRaptor> egg|nomz|egg: generally if your eyes are dewing over, that's not the weather. | <ferram4> I shall beat my problems to death with an engineer. | We can haz pdf | Logs: https://esper.irclog.whitequark.org/kspacademia
_whitelogger has joined #kspacademia
<egg|matrix|egg>
It is not in XID_Start because it is in Pattern_Syntax. It is in Pattern_Syntax because of where it was encoded.
<egg|matrix|egg>
It does not have punctuation-like semantics, it is really a letter; see the annotation • used for Cyrillic yerik
<egg|matrix|egg>
It was probably a mistake to encode it where it ended up.
<egg|matrix|egg>
See the proposal https://www.unicode.org/L2/L2007/07003r-n3194r-cyrillic.pdf, VERTICAL TILDE ⸯ is the spacing form of U+033E COMBINING VERTICAL TILDE, which represents the Cyrillic yerik, a character which functions similarly to the PAYEROK.
<egg|matrix|egg>
And CYRILLIC PAYEROK ꙿ is the spacing equivalent of the COMBINING CYRILLIC PAYEROK, used to replace an
<egg|matrix|egg>
omitted yer, later also to break up consonant clusters.
<egg|matrix|egg>
> If I understand that right, the Unicode 3.0 (through 4.0.1) definition of identifiers is unstable in some way, and U+2E2F is a prototype for that incompatibility as something introduced in 5.1. I don't really see the incompatibility problem, but at least I know it's an explicit omission
<egg|matrix|egg>
I would not put it exactly that way
queqiao has quit [Quit: Bridge terminating on SIGTERM]
egg|matrix|egg has quit [Quit: Bridge terminating on SIGTERM]
whitequark has quit [Quit: Bridge terminating on SIGTERM]
queqiao has joined #kspacademia
whitequark has joined #kspacademia
egg|matrix|egg has joined #kspacademia
<egg|matrix|egg>
Version 3.0 of the Unicode Standard recommended an identifier definition based on general category, and said you needed to reference a specific version of the standard for stability: https://www.unicode.org/versions/Unicode3.0.0/ch05.pdf pp. 134 sq.
<egg|matrix|egg>
Version 4.0 recommended the same definition, and said that if you had been using the 3.0 definition, you needed to add some characters that used to be in those general categories: https://www.unicode.org/versions/Unicode4.0.0/ch05.pdf p. 131.
<egg|matrix|egg>
If you did not heed the warnings about backward compatibility and just used the set of GCs, you were unstable. The scheme described in 4.0 got formalized as properties in 4.1 with Other_(X)ID_Meow being those things that used to be identifiers. If you used the recommended gc-based definition in 3.0 or 4.0 and then switched to the recommended (X)ID definition in 4.1, you have no compatibility issues. If you picked up the set of GCs from 3.0 and
<egg|matrix|egg>
use it with a later UCD, switching to XID may be incompatible because of 2E2F.
<SnoopJ>
egg|matrix|egg, and 2E2F was not added to (X)ID_Start with its introduction as an arbitrary "it doesn't really make sense as an identifier" choice, right? That's I think the only bit I'm foggy on, before discovering this issue I thought that the "derived from" in the definition of ID_Start was 1:1 with the GCs mentioned, but now I understand that it isn't (and IIUC this is more stable
<SnoopJ>
because you're always tied to some particular version of UCD to reference those properties)
<egg|matrix|egg>
> egg|matrix|egg, and 2E2F was not added to (X)ID_Start with its introduction as an arbitrary "it doesn't really make sense as an identifier" choice, right?
<egg|matrix|egg>
Wrong
<egg|matrix|egg>
See above, that is because it is Pattern_Syntax.
<egg|matrix|egg>
(which cannot be allowed to overlap with XIDC per the stability policy.)
<SnoopJ>
Oh, I missed that part of scrollback, sorry