#kspacademia on 2018-04-04 — irc logs at esper.irclog.whitequark.org

00:00 <egg|zzz|egg> bofh: um yes but that x^3-y is part of the cbrt implementation,

00:00 <egg|zzz|egg> bofh: how many layers of cbrt are you on,

00:01 <egg|zzz|egg> bofh: although maybe I could distribute the numerator to make it less cancelly than precisely the thing that I'm trying to make most cancelly

00:06 <bofh> egg|zzz|egg: ohh I see what you mean. I don't *think* you're losing too many bits there...

00:06 <egg|zzz|egg> bofh: I think that's why I'm not improving things with better guess + Householder 6 vs. Householder 10 and it's all at a ridiculous 0,9 ULPs

00:08 <bofh> It's possible (tho I maintain anything < 1ULP is good), actually. Hm. I actually wonder if just doing the *subtraction* in double-double suffices to help at all.

00:09 <egg|zzz|egg> bofh: also I appear to be alternating between the two decimal markers defined by CGPM 9 resolution 7/CGPM 22 resolution 10/ISO 31-0 3.3/ISO 80000-1 7.3.2 :-p

00:10 <egg|zzz|egg> bofh: nah any egg can do faithfully rounded, it's only interesting if you try to get close to correctly rounded

00:10 <egg|zzz|egg> bofh: the subtraction is exact

00:11 <egg|zzz|egg> the problem is not the rounding of the subtraction, it is the condition of the subtraction

00:12 <bofh> Ahh. Right.

00:13 <bofh> also I maintain what is interesting is faithfully rounded with optimal perf :p

00:14 <egg|zzz|egg> bofh: the true answer is probably that you need an army of numericists to do the end-to-end error analysis and get optimal perf for whatever precision you need

00:14 <egg|zzz|egg> (and optimal precision within that perf if magic numbers & polynomials are tunable)

00:15 <egg|zzz|egg> bofh: e.g. if it's the step resizing factor in adaptive stepsize then touching the FPU for your rootn is sinful

00:15 <egg|zzz|egg> since you multiply the result with 0,9 anyway :-p

00:16 <egg|zzz|egg> bofh: but, https://twitter.com/FioraAeterna/status/968884994167881728 (1), so sub-ULP accuracy is good

00:16 <kmath> <FioraAeterna> @f1ac5 there's a few other possibilities! ⏎ ⏎ 1. you have a fetish for floating point ⏎ 2. you need predictable wrongnes… https://t.co/1lxO0i9PIk

00:18 <bofh> also you're making me want to revisit and improve the accuracy of my Airy function implementation past ~3ULP worst-case error for negative values near zeroes but fucking hell that thing is a *nightmare*.

00:19 <egg|zzz|egg> bofh: uh so distributing that cancellation in the 6th order householder somehow makes it unfaithful Ꙩ_ꙩ

00:19 <egg|zzz|egg> did I just introduce a bigger cancellation,

00:19 <bofh> ...*wat*

00:19 <bofh> how the hell

00:20 <egg|zzz|egg> plz2halp with the error analysis of

00:20 <egg|zzz|egg> double const numerator =

00:20 <egg|zzz|egg> x * ((5 * x³ + 12 * y) * x⁶ - (12 * x³ + 5 * y) * y²);

00:20 <egg|zzz|egg> return x - numerator / denominator;

00:20 <egg|zzz|egg> double const denominator = (7 * x³ + 42 * y) * x⁶ + (30 * x³ + 2 * y) * y²;

00:20 <egg|zzz|egg> oh

00:20 <egg|zzz|egg> uh

00:20 <egg|zzz|egg> yeah I just made the cancellation worse

00:20 <egg|zzz|egg> *sobbing*

00:21 <bofh> firsyt off why aren't you precomputing z = x^3 as an intermediate and using that in your numerator/denominator expressions? but otherwise... that numerator looks like it can have nasty cancellation in some cases.

00:21 <egg|zzz|egg> bofh: superscript digits are perfectly good characters in identifiers,

00:21 <bofh> ...

00:22 <egg|zzz|egg> seriously this is way more legible this way

00:23 <egg|zzz|egg> bofh: eggsample https://github.com/eggrobin/Principia/blob/cbrt-benchmarks/benchmarks/cbrt.cpp#L79-L101

00:23 <bofh> okay, so I should assume any high-order superscript digits actually have the common sub-powers precomputed?

00:23 <egg|zzz|egg> yeah, probably russian peasant

00:24 <egg|zzz|egg> (wait is it called that in english too or does the above sentence sound completely nonsensical)

00:25 <bofh> yeah, the russian peasant algo

00:25 <egg|zzz|egg> okay, it's called the same in french funnily enough

00:25 <egg|zzz|egg> (paysan russe)

00:27 <egg|zzz|egg> bofh: I mean it's just perfectly good C++, you should assume it's C++ and so the things that are identifiers are identifiers :-p

00:28 <egg|zzz|egg> bofh: and the numerator is not just cancelly in some cases, it's *actively trying to cancel* since you try to get x close to the cube root of y

00:30 <egg|zzz|egg> bofh: okay so basically what I'd like is a high-order iterate that is a rational function in (x-y), x, and y with positive coefficients

00:30 <egg|zzz|egg> bofh: also a pony

00:32 <bofh> Yeah I don't *think* you'll get that, particularly not the "only positive coeffs" bit I don't think.

00:34 <egg|zzz|egg> bofh: because the Householder stuff seems to be inherently heavily cancelly,

00:34 <egg|zzz|egg> there are methods other than householder though, kahan lists two of order 5

00:39 e_14159 has quit [Ping timeout: 198 seconds]

00:40 <bofh> egg|zzz|egg: I suspect it's just Householder composed with some smooth function, but am not sure.

00:45 e_14159 has joined #kspacademia

01:02 <egg|zzz|egg> bofh: hm but can it help with the cancelly bits

01:03 <egg|zzz|egg> bofh: like if you can pull off some trick like the conjugate multiplication thing maybe you can make this non-horrible

01:04 <egg|zzz|egg> gah we really need to get the cat in here for advice on this stuff

01:04 <bofh> egg|zzz|egg: I don't *think* this can be made non-cancellable via conjugate multiplication or similar, but let me think.

01:04 <egg|zzz|egg> maybe by linking moar catpics https://twitter.com/whitequark/status/980575216047833089

01:04 <kmath> <whitequark> https://t.co/jSmmXQdRfy

01:05 <egg|zzz|egg> !wpn котя

01:05 * Qboid gives котя a thoracic buffered truffle

01:33 ferram4 has quit [Read error: Connection reset by peer]

01:33 ferram4 has joined #kspacademia

01:48 SnoopJeDi has joined #kspacademia

03:14 <Ellied> https://twitter.com/6b766e/status/980315353782693888 (via @Aiden_Eldrich)

03:14 <kmath> <6b766e> https://t.co/8HQmE64KAE

05:00 <bofh> https://twitter.com/x86instructions/status/981395727036420107

05:00 <kmath> <x86instructions> VGATHERDPS - Summon Very Angry Bears (Circular Formation)

05:46 <UmbralRaptor> Hrm https://twitter.com/brettmor/status/981220505519767552

05:46 <kmath> <brettmor> Shout out to all of the grads who applied and *didn't* get a fellowship this year. You're in good company! https://t.co/Sq7qhYDg3J

06:31 Technicalfool_ has quit [Ping timeout: 383 seconds]

06:36 <egg|cell|egg> Bofh: I guess that's why the cat looks for a perfect cube in a table

06:41 <bofh> egg|cell|egg: oh?

06:42 <bofh> (I *extremely* dislike that table fwiw)

06:42 <egg|cell|egg> Bofh: for a perfect cube x³-y will be nice to compute

06:42 <egg|cell|egg> But what if you 0 the low bits of your guess?

06:43 <egg|cell|egg> Hmm

06:43 <egg|cell|egg> Nerd sniping intensifies

07:04 <bofh> egg|cell|egg: yes but now you're storing a large table for at best fractional ULP improvement, this is not better at all imho.

07:04 <egg|cell|egg> Bofh: but fractional ulps are good,

07:05 <egg|cell|egg> Bofh: but 0ing bits of your guess might work

07:06 <egg|cell|egg> How many sigbits can you have and still have X*X*X exact

07:08 <egg|cell|egg> !Wpn bofh

07:08 * Qboid gives bofh a coverage-guided condenser

07:11 <bofh> hrm

07:11 <bofh> isn't it always going to be exact in the absence of overflow or underflow?

07:14 <egg|cell|egg> Wat

07:15 <egg|cell|egg> Bofh: if it were exact there would be no cancellation issue

07:17 armed_troop has quit [Ping timeout: 182 seconds]

07:18 armed_troop has joined #kspacademia

07:18 <bofh> okay, right, but okay I'm actually having trouble thinking of when multiplication is inexact in the absence of over/underflow

07:19 <egg|cell|egg> When the result has more than 53 bits?

07:20 <egg|cell|egg> So I guess the guess must have 17 bits or so

07:30 <egg|zzz|egg> bofh: okay I actually forgot this part in the kahan paper, he covers this (that explains why I got shitty precision on his method) "must be accurate to almost, and *rounded* to at most, a third as many figures as the arithmetic carries"

07:31 <egg|zzz|egg> bofh: so if I do that maybe I don't get 0,89 ULPs where he claims 0,59

07:33 <bofh> Ahh, I see.

07:35 <bofh> I'm.fairly certain the approximate_rootn.pdf guess has between 5 & 6 bits, & one Newton iterate won't put that above 12, so you're good?

07:36 <egg|zzz|egg> bofh: well still need to actually do the bitand with 0xFFFF'FFF0'0000'0000

07:37 <egg|zzz|egg> bofh: once I do that it is *a lot* better

07:37 <egg|zzz|egg> wow

07:37 <egg|zzz|egg> also what is up with the cat's implementation, it has comparatively pretty bad rounding?

07:38 <egg|zzz|egg> bofh: so, max ULPs is a bit noisy on 10 000 values, looking at frequency of incorrect rounding instead

07:38 <egg|zzz|egg> Atlas: 0.0861

07:38 <egg|zzz|egg> Householder 10: 0.0147

07:38 <egg|zzz|egg> Newton on the inverse + Householder 6: 0.0136

07:38 <egg|zzz|egg> Kahan 0.0011 (!!)

07:39 <egg|zzz|egg> Microsoft: 0.2803 (lol)

07:39 <bofh> wait what the shit, how did clobbering bits we don't care about *improve* the result? the multiplication of 17 bits^3 vs 53 bits^3 where 24 are garbage should have the same amount of useful bits: the garbage ones get shifted out in the multiply anyway.

07:40 <bofh> let alone that good of a result, damn.

07:41 <egg|zzz|egg> for max ULPs on my 10 000 test values:

07:41 <egg|zzz|egg> Atlas: 0.71814

07:41 <egg|zzz|egg> Householder 10: 0.57580

07:41 <egg|zzz|egg> Newton then Householder 6: 0.58524

07:41 <egg|zzz|egg> Kahan: 0.55148

07:41 <egg|zzz|egg> Microsoft: 1.2779 (lol)

07:42 <egg|zzz|egg> bofh: well because clobbering the bits means you have no error in x³-y instead of a massive cancellation

07:42 <egg|zzz|egg> bofh: so your correction term from x is much better

07:43 <egg|zzz|egg> otherwise your correction term is mostly rounding errors from the computation of x³

07:43 <bofh> ohh, right, rounding is a thing.

07:43 <bofh> duh.

07:43 <egg|zzz|egg> :D

07:44 <bofh> also I'm morbidly curious what MS does now, it's slower and not even accurate to <1ULP.

07:45 <egg|zzz|egg> bofh: I'm really surprised that atlas's method is that badly rounded; I'm actually using the C implementation from the ARM directory which uses his polynomial and table but is not by him, maybe it has additional errors?

07:56 <bofh> Possibly, I'm not sure. I wonder if expressions got replaced with mathematically equivalent but ill-conditioned ones or something, hm.

07:57 <bofh> (alternatively it's possible he didn't care at the time. he *did* say recently he needs to revisit it :P)

07:59 <egg|zzz|egg> now with 100 000 random values: incorrectness frequencies for the same: {0.08777, 0.01624, 0.01384, 0.001, 0.27913}

08:00 <egg|zzz|egg> and max errors: 0.72486, 0.60568, 0.58524, 0.56219, 1.3531

08:01 <bofh> Not bad.

08:09 <egg|zzz|egg> bofh: obviously the high householders from a shitty guess are worse than Kahan's 4th order iterate from a near-optimal guess

08:09 <egg|zzz|egg> bofh: not sure how to get a 17-bit guess without a division though :-/

08:11 <bofh> inverse cbrt then 1 Newton iterate then do x * z * z with clever masking?

08:11 <bofh> wait, I thought Kahan and your Householders all used the exact same starting guess code?

08:12 <egg|zzz|egg> bofh: Kahan does a Halley then a 4th order iterate

08:12 <egg|zzz|egg> and rounds to 17 bits after the Halley

08:12 <egg|zzz|egg> that gives you a 17 bits guess which is how you get almost always correct rounding

08:13 <bofh> Ahh. Actually I'm curious how Halley, round to 17, Halley compares.

08:15 <egg|zzz|egg> bofh: that's not going to be faithful

08:16 <egg|zzz|egg> bofh: the bit clobbering trickery is only relevant when you're near the limits of the precision, otherwise you're far from caring about rounding

08:18 <egg|zzz|egg> bofh: and Halley can't get you from 17 bits to all so you can't use it as your last step

08:26 <bofh> Right, Halley will be at best non-faithful, at worst 1-3 bits off outright.

08:29 ferram4 has quit [Ping timeout: 198 seconds]

08:35 <egg|zzz|egg> bofh: well >1 ULP off is what unfaithful means right?

08:35 <egg|zzz|egg> bofh: I wonder whether you can do 2 rounds of newton on the inverse faster by Estrining it as a single horrible polynomial, then use that as your guess for a 4th order iteration

08:37 <bofh> So I considered doing that before but never tried it, it seems worth it tho.

08:38 <egg|zzz|egg> *nerd sniping intensifies*

09:37 tawny has quit [Ping timeout: 190 seconds]

09:52 APlayer has joined #kspacademia

10:14 ferram4 has joined #kspacademia

10:19 ferram4 has quit [Ping timeout: 198 seconds]

12:03 awang has quit [Ping timeout: 182 seconds]

12:26 awang has joined #kspacademia

12:34 tawny has joined #kspacademia

13:07 tawny has quit [Ping timeout: 182 seconds]

14:20 <UmbralRaptor> uhm https://twitter.com/DrPhiltill/status/981506638912909313

14:20 <kmath> <DrPhiltill> #TFW you get a calendar invitation to interview an applicant in the middle of the night — 2:30 a.m. on Saturday mor… https://t.co/yntg412Euj

14:25 ferram4 has joined #kspacademia

14:30 ferram4 has quit [Ping timeout: 198 seconds]

14:34 ferram4 has joined #kspacademia

14:39 ferram4 has quit [Ping timeout: 198 seconds]

14:51 Snoozee has quit [Ping timeout: 190 seconds]

14:53 Snoozee has joined #kspacademia

14:53 Snoozee is now known as Majiir

14:57 Technicalfool has joined #kspacademia

14:59 <UmbralRaptor> Humanity Star git Incredible Journey'd? https://twitter.com/Marco_Langbroek/status/981538723841150977

14:59 <kmath> <Marco_Langbroek> (1/5) ⏎ Remember @TheHumanityStar, and how its builders @RocketLab claimed it would be visible for 9 months? While in… https://t.co/6Wt9fwy1Ia

15:43 <APlayer> Huh, what? It reentered already?

15:44 <APlayer> I was totally planning to photograph it somewhen in early summer, and kind of put it on my long term to-do list... And now it's gone, just like that, half a year earlier than I was told

15:44 <APlayer> Well, whatever

16:00 ferram4 has joined #kspacademia

16:05 tawny has joined #kspacademia

16:05 ferram4 has quit [Ping timeout: 198 seconds]

16:09 ferram4 has joined #kspacademia

16:18 APlayer has quit [Ping timeout: 383 seconds]

16:29 <UmbralRaptor> s/git/got/

16:29 <Qboid> UmbralRaptor meant to say: Humanity Star got Incredible Journey'd? https://twitter.com/Marco_Langbroek/status/981538723841150977

16:29 <kmath> <Marco_Langbroek> (1/5) ⏎ Remember @TheHumanityStar, and how its builders @RocketLab claimed it would be visible for 9 months? While in… https://t.co/6Wt9fwy1Ia

16:34 APlayer has joined #kspacademia

16:36 <iximeow> from that thread

16:36 <iximeow> >(* and to validate my orbital lifetime estimates for an upcoming launch)

16:36 <iximeow> :D

16:38 tawny has quit [Ping timeout: 190 seconds]

16:41 <UmbralRaptor> \o/ (I hope)

16:47 Majiir has quit [Ping timeout: 190 seconds]

16:47 <UmbralRaptor> https://twitter.com/mcclure111/status/981563070714646528

16:47 <kmath> <mcclure111> Why do we use floating point coordinates for game physics engines anyway. Why not fixed point

16:48 <SnoopJeDi> LOL

16:53 <UmbralRaptor> I think Doom used fixed point?

16:54 Snoozee has joined #kspacademia

16:54 Snoozee is now known as Majiir

16:58 <UmbralRaptor> !wpn Majiir

16:58 * Qboid gives Majiir a thorium symbol

17:32 <egg|zzz|egg> bofh: I am amused by that bit-clobbering trick to get an exact cube

17:33 <egg|zzz|egg> bofh: also I reinvented it this morning and then I found it in Kahan's thing, maybe I should read Kahan's thing more carefully :-p

17:36 * UmbralRaptor pokes Gaussian units in the random 4π.

17:36 <bofh> rofl. I mean same, since I completely overlooked it myself.

17:54 Technicalfool has quit [Ping timeout: 182 seconds]

18:15 tawny has joined #kspacademia

18:26 <egg|zzz|egg> !wpn rqou

18:26 * Qboid gives rqou a caffeinated metric with a radiator attachment

18:27 * rqou meows

18:30 UmbralRaptor is now known as NomalRaptor

18:33 <APlayer> Random (literally) question: You have a probability experiment with n different, equally likely outcomes, and you do it n times. I noticed that the probability of one specific outcome occurring at least once is very close to 2/3. Is there some sort of theorem or something for that?

18:34 <APlayer> Or is it a false assumption that the probability is next to 2/3?

18:38 <egg|zzz|egg> !wpn NomalRaptor

18:38 * Qboid gives NomalRaptor a toasted panzer

18:39 <NomalRaptor> So, E&M is this week, not last.

18:40 * egg|zzz|egg gives NomalRaptor a gold cabinet

18:40 <NomalRaptor> ???

18:41 <NomalRaptor> And I'm certain that one of the homework problems involved using a non-constructive proof to generate an equation for the potential around an object.

18:42 <SnoopJeDi> APlayer, the probability that you *don't* observe one of the outcomes is ((n-1)/n)**n, which in the lim n->∞ is 1/e

18:42 <SnoopJeDi> 1 - 1/e is close to 2/3

18:42 <SnoopJeDi> "close"

18:43 <NomalRaptor> ^

18:43 <NomalRaptor> e ≈ 3 ≈ π

18:43 <SnoopJeDi> n.b. ((n-1)/n)**n) = (1 - 1/n)**n

18:43 <APlayer> π ≈ 4?

18:44 <egg|zzz|egg> rounding to nearest to 1 bit, yes

18:45 <NomalRaptor> π ≡ 3

18:45 * NomalRaptor ducks.

18:45 <egg|zzz|egg> wasn't it bofh who had tales of π = 1?

18:45 <APlayer> Wait, is lim n -> inf ((n-1)/n)^n not the definition of e?

18:46 <SnoopJeDi> (1 + 1/n)**n, not -

18:46 <SnoopJeDi> hence 1/e

18:47 <APlayer> Ah, yes

18:47 <APlayer> Alright, so what I was seeing was actually approaching 1/e and not 1/3, haha

18:47 <APlayer> Well, this is interesting

18:47 <SnoopJeDi> n.b. this is only true if these are independent events, but I assumed that's what you meant

18:48 <APlayer> Because I used to (for no reason at all, don't ask me why) intuitively assume the probability in such cases to be 1/2

18:48 <APlayer> And used that to estimate things in life

18:49 <APlayer> Will correct to 2/3 for estimation purposes and 1 - 1/e for calculation purposes

18:50 <APlayer> Thanks for the explanation!

18:51 <egg|zzz|egg> cat. https://twitter.com/stephentyrone/status/853029289444376576

18:51 <kmath> <stephentyrone> I have no idea what’s happening here, but it’s wonderful. https://t.co/7PVDGPnQnx

18:52 <egg|zzz|egg> bofh: those two are not tagged floatingpointwithatlas: https://twitter.com/stephentyrone/status/848172687268687873 https://twitter.com/stephentyrone/status/853959695224188929

18:52 <kmath> <stephentyrone> https://t.co/FTuvYktX9d

18:56 <egg|zzz|egg> NomalRaptor: well I gave Nomal a gold cabinet

18:56 <NomalRaptor> yay

19:05 <SnoopJeDi> APlayer, it vaguely reminds me of the Monty Hall problem but I'm not sure it's related

19:05 <APlayer> Was it the one with two goats and a car?

19:05 <SnoopJeDi> yea, that's the problem

19:05 <APlayer> Ah, I remember that

19:06 <APlayer> It was a fun one, though I couldn't solve it (not sure if I could now, but I had no knowledge about probabilities back then)

19:22 <awang> Hey guys, I have a present for you

19:23 <awang> https://imgur.com/a/1AGrS

19:23 <awang> I'm pretty sure this is supposed to be the EULA thing for an OS upgrade

19:23 <awang> I have no idea how it ended up like this

19:25 <APlayer> The Unicode gods were displeased with your sacrifice

19:26 * awang hurries to the store to buy more �

19:27 <APlayer> !u �

19:27 <Qboid> U+FFFD REPLACEMENT CHARACTER (�)

19:27 <APlayer> What

19:27 <APlayer> This was not informative at all, haha

19:28 <APlayer> Ah, wait, so this is /supposed/ to be a tofu?

19:30 <awang> Um

19:30 <awang> !u �

19:30 <Qboid> U+FFFD REPLACEMENT CHARACTER (�)

19:30 <awang> ^That's what it is on my computer, idk about tofu

19:32 <APlayer> Some random chinese site I found on a Google search: https://houkanshan.com/img/fontonload/fffd.png

19:48 egg|cell|egg has quit [Ping timeout: 383 seconds]

19:58 Technicalfool has joined #kspacademia

20:09 * NomalRaptor throws Mario head first at that page,

20:11 <SnoopJeDi> "Mac: um we're indistinguishable from PC now but we don't have chromatic abberation?"

20:13 <SnoopJeDi> tangentially: PlayerUnknown's Battlegrounds has some of the most satisfying rifle scopes I've used in a videogame now that they model parallax and spherical/chromatic aberration (although I don't play milsim so...?)

20:57 <egg|zzz|egg> bofh: I have something which is correctly rounded on 100 000 random values

20:57 <egg|zzz|egg> bofh: it is slightly slower than Kahan's method though

21:06 <egg|zzz|egg> bofh: okay wat double-newton on the inverse is actually slower than Halley despite its fdiv? how the hell did I write my double-newton to get that

21:11 <bofh> What on *earth*? I want to see how that works on your Sandy Bridge desktop too fwiw

21:12 <bofh> https://pbs.twimg.com/media/DZ8-1hWVwAA4cpY.jpg:orig I need this.

21:14 <egg|zzz|egg> bofh: I suspect I just wrote that polynomial badly and am getting unpleasant deps

21:16 <egg|zzz|egg> bofh: okay any obviousness here (the divisions in the constant obviously are evaluated at compile time so don't worry about those) https://github.com/eggrobin/Principia/blob/fa4d2d4/benchmarks/cbrt.cpp#L159-L167

21:17 <egg|zzz|egg> obviously an annoying thing is that you can't get to the polynomial til you've evaluated r³y

21:17 <egg|zzz|egg> hmm

21:17 <egg|zzz|egg> is it iacatime again

21:17 <egg|zzz|egg> i do not like iacatime

21:22 <bofh> iacatime?

21:23 <bofh> also sec, I have to run for a bit atm

21:24 <egg|zzz|egg> bofh: IACA

21:24 <egg|zzz|egg> IACA?

21:24 <Qboid> egg|zzz|egg: [IACA] => Intel® Architecture Code Analyzer

21:24 * egg|zzz|egg stares at the graph

21:24 <egg|zzz|egg> this tree is not very wide

21:26 <egg|zzz|egg> maybe I should look at the polynomial x * x * y directly

21:28 <egg|zzz|egg> oooor maybe looking at something that summons 32nd powers is not such a good idea

21:28 <egg|zzz|egg> but this chain of three muls at the end is shit

21:38 <egg|zzz|egg> bofh: IACA says the Halley is 53 cycles on Haswell, 55 on Sandy, whereas the above double-newton-inverse is 60 on both

21:59 <egg|zzz|egg> yeah I can bring it down a bit more but it just isn't a very nice polynomial tbh

22:04 * NomalRaptor sobs.

22:04 NomalRaptor is now known as UmbralRaptor

22:06 <egg|zzz|egg> bofh: okay further care brings it faster than Halley, sanity is restored

22:10 <bofh> egg|zzz|egg: okay yeah I figured. mind uploading the optimized expression? (just got back).

22:12 <egg|zzz|egg> bofh: https://github.com/eggrobin/Principia/blob/cbrt-benchmarks/benchmarks/cbrt.cpp#L171-L178

22:12 <egg|zzz|egg> bofh: still slower than non-Estrin Householder, but now faster than Kahan

22:12 <egg|zzz|egg> and better rounded

22:13 <egg|zzz|egg> I beat the wolf on all metrics, but beating the cat similarly might be harder

22:14 awang has quit [Ping timeout: 182 seconds]

22:15 awang has joined #kspacademia

22:20 <egg|zzz|egg> bofh: okay I arguably beat the cat with just 10th order householder, but that's not very well rounded

22:25 <egg|zzz|egg> bofh: I vaguely wonder whether we can compute the cube of the guess in integer arithmetic

22:33 <bofh> it's only ~6 bits precision, no? so your mantissa product becomes just a regular multiply plus some shifts. I *highly* doubt it'll be faster than the fp cube tho.

22:35 <egg|zzz|egg> bofh: for the 6-bit one you're nicely using your throughput with (r * y) * (r * r) so that's fine

22:35 <egg|zzz|egg> bofh: more annoying is x * x * x on the 17-bit one

22:36 <egg|zzz|egg> it's exact, but you can't do anything until you have this cube

22:36 <egg|zzz|egg> wait nevermind there are plenty of constant * y you can do

22:36 <egg|zzz|egg> bofh: anyway I should try Householder 5th instead of 6th to see if it's still correctly rounded

22:37 <egg|zzz|egg> bofh: amusingly, Householder 6th with Estrin is faster than Kahan's Hornery 4th order method?!

22:37 <egg|zzz|egg> Horner in main()

22:39 <egg|zzz|egg> !wpn UmbralRaptor

22:39 * Qboid gives UmbralRaptor a FITS radiography

22:39 <egg|zzz|egg> that makes a surprising amount of sense

22:41 APlayer has quit [Ping timeout: 383 seconds]

22:59 <bofh> egg|zzz|egg: I mean I'm not *too* surprised, Estrin is fast and I've come to the conclusion people need to use it much more.

23:00 <egg|zzz|egg> bofh: high order is good, estrin is good, forget everything you learned about hornering all the things and cubic splines and leaprogging newtons

23:00 <egg|zzz|egg> :D

23:01 <bofh> https://twitter.com/FakeUnicode/status/981666365378412544 ACCURATE

23:01 <kmath> <FakeUnicode> ᴀ ⏎ 𝝖ͣᵃᴀ　A ⏎ 　ͣᴬAᴀ𝝖ͣ ⏎ 　　ͣ Aᴬᴀ　　 A ⏎ 　 ͣ ᴬAᴀͣᴬ𝝖ᴀ ⏎ 　　　𝝖ͣᴀᵃAₐ ᴬ ⏎ A　　ᴬᴀ·Aͣᴀ ᴬA ⏎ 　　　ᴬ Aₐᴀᵃ𝝖ᴀ𝝖　ͣ ⏎ 　　A 𝝖 ᴀͣᴬAₐᴀ ⏎ 　　　 A　𝝖 ᴬͣᴀ A ᴀ… https://t.co/Cmcz1Zosye

23:02 <bofh> egg|zzz|egg: so cubic splines always sucked for transcendental functions :P my issue with Estrin is it generally only started beconing useful for higher orders, which used to be not that good perf-wise. But now that they *are*...

23:03 <bofh> becoming*

23:03 <egg|zzz|egg> ΑАA

23:03 <egg|zzz|egg> bofh: well it can be useful for low orders tbh

23:03 <egg|zzz|egg> it's just that for low orders it's not called Estrin, it's called "as written in the monomial basis") :-p

23:06 <egg|zzz|egg> OK I can get a bit more speed if I go down to 4th order Householder (5th performs as 6th), but then I'm not correctly-rounded anymore on 100 000 random inputs, I have 0,021 % incorrect roundings and 0,502 ULPs max on those test values (probably bigger max in general, also I don't have 2 sig dec on that percentage)

23:06 <egg|zzz|egg> bofh: not worth it for a perf improvement from 37ish ns to 36ish ns

23:07 <egg|zzz|egg> (Atlas's method and Householder 10 are 25ish)

23:08 <egg|zzz|egg> bofh: also Atlas's method is way worse than Householder 10 rounding-wise so basically it's all householders? until the cat invents something smarter

23:08 <egg|zzz|egg> bofh: I should ask the cat whether that 8,8 % makes sense or is the implementer screwing up his fancy polynomials though

23:09 <bofh> Yeah, I'm curious myself. Totally unsurprised Householder works well, you can imagine my confusion a few days ago when it seemed otherwise :P

23:10 <egg|zzz|egg> bofh: it's all about that 36-bit clobbering :-p

23:10 awang has quit [Ping timeout: 383 seconds]

23:10 <bofh> Apparently :P

23:11 <egg|zzz|egg> bofh: the implementation of Atlas's method is by https://twitter.com/jakevortex it seems

23:25 egg|phone|egg has joined #kspacademia

23:26 UmbralRaptor has quit [Quit: Bye]

23:26 UmbralRaptop has joined #kspacademia

23:29 <bofh> egg|zzz|egg: no idea who that is, but let me check over both atlas's code and the impl myself since I do know arm asm myself.

23:30 <egg|zzz|egg> bofh: the arm implementation is C, not arm asm

23:30 <egg|zzz|egg> that's why I use this one

23:31 <egg|zzz|egg> because feeding the intel asm for the whole function to MSVC is probably not going to be a pleasant experience

23:31 UmbralRaptor has joined #kspacademia

23:31 <bofh> egg|zzz|egg: so you linked me an impl in assembly, and I didn't scroll down far enough to see what ISA it was.

23:31 <UmbralRaptor> Jackson: Use the boundary conditions to find a unique solution. Also Jackson, let me give an example with a power series that's unlike the Taylor series you would actually find. https://photos.app.goo.gl/wOjc2OaefkSvqfYy1

23:31 <bofh> oh, it's x86_64 asm? rofl.

23:31 UmbralRaptor has quit [Client Quit]

23:31 <egg|zzz|egg> bofh: cbrtf.s is atlas's, for Intel

23:31 <bofh> that's easy why didn't I peer at it sooner

23:31 UmbralRaptor has joined #kspacademia

23:31 <egg|zzz|egg> cbrt.c is a reimplementation in C for ARM

23:31 <bofh> (I blame being busy and a bit exhausted)

23:32 <bofh> mind relinking cbrtf.s?

23:32 <egg|zzz|egg> https://github.com/simonbyrne/apple-libm/blob/master/Source/Intel/cbrtf.s

23:32 <egg|zzz|egg> https://github.com/simonbyrne/apple-libm/blob/master/Source/ARM/cbrt.c

23:32 <egg|zzz|egg> and I use the ARM implementation and compile it for intel because linking the intel one doesn't seem fun

23:33 UmbralRaptop has quit [Ping timeout: 198 seconds]

23:34 <bofh> Right, you're on windows--wait the calling convention for floats is ~the same, so you should be fine.

23:34 <bofh> UmbralRaptor: wait, what's the weird power series expansion that you're referring to?

23:36 <bofh> UmbralRaptor: like I'm seeing the exact series solution he mentions, multiplied by a Legendre polynomial in cos(\theta)

23:38 <egg|zzz|egg> bofh: not sure what the calling convention is, but also even if it works just figuring out how to link it in sounds unfun :-p

23:39 <bofh> just add the object file into the call to link.exe or w/e MSVC uses nowadays? :P

23:40 <UmbralRaptor> bofh: \frac{1}{x-x'} = \frac{1}{r_\gt} \sum_{l=0}^\infin \left( \frac{r_\lt}{r_\gt} \right)^l

23:42 <SilverFox> what is the best method of while(true) without maxing the CPU?

23:43 <bofh> \frac{r_\lt}{r_\gt} <- the fuck is this?

23:43 <egg|zzz|egg> r</r> ? this is sounding like HTML

23:43 <bofh> is that r_</r_>???

23:44 <bofh> UmbralRaptor: I'm not sure this makes any sense, how do r_< & r_> relate to x & x'?

23:44 <UmbralRaptor> It's that sum about 1/3 of the way down page 103.

23:44 <bofh> (like this is half my problem with Jackson, he *loves* doing complicated coordinate transformations without any intermediate steps and you're like WTF)

23:45 <egg|zzz|egg> bofh: I hear Euler otoh goes into painstaking detail and copies giant expressions where he changes one thing :-p

23:46 <bofh> OH FUCK NEVERMIND I HATE THIS NOTATION SO MUCH

23:46 <bofh> r_< is min(|x|,|x'|) and r_< is max(|x|,|x'|)

23:46 <bofh> er, that second one should be a r_>

23:48 <egg|zzz|egg> bofh: J. Twit. Numer. Anal. https://twitter.com/eggleroy/status/981679873537298432

23:48 <kmath> <eggleroy> @stephentyrone I seem to have found a method faster than Kahan’s & which correctly rounds on those 100 000 inputs:… https://t.co/ugvPFJLIQr

23:49 <egg|zzz|egg> (which this tweet is too short to contain?)

23:49 <Majiir> !wpn egg|zzz|egg

23:49 * Qboid gives egg|zzz|egg a Chomskyan COME FROM

23:49 <Majiir> !wpn hatbot

23:49 * Qboid gives hatbot a bipartite pharmacy

23:49 <egg|zzz|egg> !wpn Majiir

23:49 * Qboid gives Majiir an explosive Poké Ball/sole hybrid

23:49 <Majiir> RIP hatbot

23:50 <egg|zzz|egg> it's been a while since we've seen hatbot

23:50 <egg|zzz|egg> ;seen hatbot

23:50 <kmath> egg|zzz|egg: hatbot (hatbot!~hatbot@ec2-54-167-168-157.compute-1.amazonaws.com) was last seen joining in #bottorture at 2017-10-14 01:05:05 +0000

23:50 <bofh> hatbot?

23:50 <bofh> egg|zzz|egg: nice!

23:51 * UmbralRaptor stares at the lack of > and < in LaTeX.

23:53 <UmbralRaptor> Oh, they work directly. o_O

23:53 <bofh> https://twitter.com/bofh453/status/846826790530244608 still accurate

23:53 <kmath> <bofh453> OH: "I contend that the SI unit of suffering is the milliJackson."

23:53 * UmbralRaptor was expecting some special command was needed.

23:53 <UmbralRaptor> Pretty much.

23:54 <UmbralRaptor> AN EXAMPLE SHOULD NOT LEAD THE STUDENT TO ABANDON THE METHOD THAT YOU'RE SHOWING BECAUSE YOU SEEMINGLY CONTRADICTED THE ENTIRE POINT OF BOUNDARY CONDITIONS AND UNIQUENESS!

23:55 <UmbralRaptor> Relatedly, I probably failed that test.

23:57 UmbralRaptop has joined #kspacademia

23:57 <bofh> UmbralRaptor: like now that I'm done WTFing at the horrible abuse of notation I'm EXTRA-WTFING at his choice of example

23:58 <egg|zzz|egg> bofh: would the notation be improved by CJK

23:59 <egg|zzz|egg> bofh: also the cat has liked my tweet, is this peer review,

23:59 UmbralRaptor has quit [Ping timeout: 182 seconds]