#kspacademia on 2018-04-26 — irc logs at esper.irclog.whitequark.org

00:42 e_14159 has quit [Ping timeout: 198 seconds]

00:44 e_14159 has joined #kspacademia

01:07 UmbralRaptor has quit [Remote host closed the connection]

01:13 UmbralRaptop has joined #kspacademia

04:18 <UmbralRaptop> cat https://twitter.com/3PSboyd/status/989355778065686528

04:18 <kmath> <3PSboyd> Someone's a shoulder cat now. https://t.co/CmtQAh7RRq

05:33 egg|phone|egg has quit [Remote host closed the connection]

06:45 <egg> UmbralRaptop: hm, I don't plan on being there, but I guess I could go

06:46 <egg> UmbralRaptop: and Iskierka doesn't have a border to cross so that's easier

07:02 <egg> UmbralRaptop: ah right GIR2.AN.BAR seems to be another way of writing patru

07:29 <egg> !wpn UmbralRaptop

07:29 * Qboid gives UmbralRaptop a neap soliton

07:29 <egg> !wpn whitequark

07:29 * Qboid gives whitequark a transitive death

07:29 <egg> um

08:41 <egg> whitequark: why is your github profile pic still a black square

09:25 TonyC has joined #kspacademia

09:26 egg|phone|egg has joined #kspacademia

09:27 TonyC1 has quit [Ping timeout: 186 seconds]

09:47 egg|cell|egg has joined #kspacademia

09:47 egg|phone|egg has quit [Read error: Connection reset by peer]

09:49 egg|phone|egg has joined #kspacademia

09:51 egg|mobile|egg has joined #kspacademia

09:51 egg|phone|egg has quit [Read error: -0x1: UNKNOWN ERROR CODE (0001)]

09:51 egg|cell|egg has quit [Read error: Connection reset by peer]

09:53 <egg|work|egg> !wpn whitequark

09:53 * Qboid gives whitequark a Trojan state machine

09:57 <egg|work|egg> if i18n is internationalization, is i12n internationale

09:57 <egg|work|egg> s/2n/2e/g

09:57 <Qboid> egg|work|egg meant to say: if i18n is internationalization, is i12e internationale

11:18 <egg|work|egg> !wpn котя and the котяchrome kitten

11:18 * Qboid gives котя and the котяchrome kitten a nilpotent Balmer barber

11:19 <egg|work|egg> the nilpotent barber vanishes if he shaves himself sufficiently many times

11:29 egg|phone|egg has joined #kspacademia

11:29 egg|mobile|egg has quit [Read error: Connection reset by peer]

11:29 egg|cell|egg has joined #kspacademia

11:31 egg|phone|egg has quit [Read error: Connection reset by peer]

11:31 egg|cell|egg has quit [Read error: Connection reset by peer]

11:31 egg|phone|egg has joined #kspacademia

11:32 egg|cell|egg has joined #kspacademia

11:34 egg|mobile|egg has joined #kspacademia

11:34 egg|cell|egg has quit [Read error: -0x1: UNKNOWN ERROR CODE (0001)]

11:34 egg|phone|egg has quit [Ping timeout: 186 seconds]

12:13 egg|phone|egg has joined #kspacademia

12:13 egg|mobile|egg has quit [Ping timeout: 186 seconds]

12:14 egg|cell|egg has joined #kspacademia

12:14 egg|phone|egg has quit [Read error: Connection reset by peer]

12:15 egg|phone|egg has joined #kspacademia

12:15 egg|cell|egg has quit [Read error: Connection reset by peer]

12:17 egg|phone|egg has quit [Read error: Connection reset by peer]

13:51 <UmbralRaptop> @_@ https://twitter.com/FadAstra/status/989299350911180800

13:51 <kmath> <FadAstra> @arxiv Two groups wrote papers in 8 hours based upon #GaiaDR2 https://t.co/qN1S5CO0pg

14:01 kmath has quit [Ping timeout: 186 seconds]

15:00 APlayer has joined #kspacademia

15:06 <UmbralRaptop> Accurate https://twitter.com/ctrlcreep/status/989516747907706881

15:20 UmbralRaptop has quit [Quit: Bye]

15:21 UmbralRaptop has joined #kspacademia

15:35 <egg|work|egg> !wpn UmbralRaptop

15:35 * Qboid gives UmbralRaptop a contravariant polynomial

15:35 <egg|work|egg> !wpn whitequark

15:35 * Qboid gives whitequark a walrus

15:36 * egg|work|egg wonders whether котя will eat the walrus

15:36 <UmbralRaptop> !wpn egg|work|egg

15:36 * Qboid gives egg|work|egg an isothermal tarrasque

15:52 UmbralRaptor has joined #kspacademia

15:54 UmbralRaptop has quit [Ping timeout: 182 seconds]

17:38 * egg pets whitequark

17:44 <whitequark> hi

17:49 <egg> how are the cats

17:51 <egg> whitequark: if the cats were sequenced, what sort of things could we tell from their genomes?

18:00 <whitequark> still in hk

18:00 <egg> whitequark: do hk cats still flee?

18:01 <whitequark> i haven't seen any in a long while

18:11 <egg> hmm

18:12 <egg> whitequark: there are apparently cat cafes in hk

18:12 <egg> so you could look there i guess :-p

18:15 <whitequark> too shy

18:54 <SnoopJeDi> oh neat, today's speaker is one of the LIGO Nobel recipients and was also chair of COBE's science working group \o/

18:55 <UmbralRaptor> !

19:38 <APlayer> Are there any powers of two larger than 2⁰ that have an arbitrary first digit and only zeroes after that?

19:38 <APlayer> I don't think there are, are there?

19:38 <APlayer> Larger than 2³, even

19:41 <egg> whitequark: you or the cats

19:47 <whitequark> egg: me

19:47 <whitequark> APlayer: that means a power of two divisible by a power of ten

19:47 <whitequark> which is clearly absurd

19:47 <APlayer> Not just by "a" power of ten, but a specific power of ten

19:48 <APlayer> But alright, point taken, makes sense

19:48 <whitequark> 10=2*5, and no power of 2 is divisible by 5

19:48 * APlayer needs to fix his knowledge of primes, division and related subjects

19:49 <SnoopJeDi> it takes some practice before it's natural, imo

19:49 <SnoopJeDi> powers of 2 in particular are something people have dealt with often, though

19:51 <whitequark> i dunno, i don't think i touched that knowledge since primary school

19:52 <APlayer> But what if you have 2⁴ котяs?

19:52 <SnoopJeDi> whitequark, surely you've thought about powers of 2

19:52 <SnoopJeDi> or do you mean explicitly dealing with it in a classroom sense

19:53 <SnoopJeDi> vs adjacent to some unrelated task

19:54 <egg> there's nothing specific to 2 here, any number not divisible by 10 will do

19:56 <SnoopJeDi> yes, that's rather obvious

19:57 <SnoopJeDi> but, like most things mathematicians treat with, it's only obvious in hindsight, heh.

20:00 <egg> bofh: is there a difference in behaviour between those snippets? my compiler generates the former usually, but the latter if I do some intrinsics trickery to produce the thing that gets movqed https://hastebin.com/vayesamona.rb

20:04 <egg> (or whitequark or anyone who feels like looking at x86-64 nonsense)

20:05 <egg> wait I'm missing a chunk from the first one

20:06 <egg> yeah nevermind

20:11 <egg> bofh: https://hastebin.com/eradenabez.rb

20:12 <egg> (aside from the random mulsd at the beginning of the first one aaargh)

20:13 <egg> (and the mov rcx, 0xfffffff000000000 is also irrelevant)

20:14 <egg> mul vs. imul, and correspondingly different constants and shifts? Ꙩ_ꙩ why

20:15 <egg> oh derp _mm_cvtsi128_si64 returns a signed integer of course

20:15 * egg stabs egg

20:15 <APlayer> Congratulations! You now have egg on a stick

20:26 <UmbralRaptor> s/котяs/котяс/

20:26 <bofh> egg: there's no reason to use intrinsics for scalar code imho.

20:28 <bofh> also those asm snippets in both cases are essentially ~equivalent perf-wise, wide imul and wide mul have ~identical thruput/latency on p. much any Intel/AMD.

20:28 <bofh> (the only case where the difference matters is in imuls small enough that the constant fits into its immediate field. but that's blatantly obviously not the case here).

20:28 <iximeow> in the last haste there's two mul in the first, only one in the second?

20:28 * iximeow rereads

20:30 <egg> bofh: well msvc refuses to emit sse2 pand etc. (even movq!) without intrinsics so I use that

20:31 <iximeow> ah nvm i see you mentioned that mulsd is unrelated

20:31 <egg> bofh: sadly the whole of C + Y / 3 is in the critical path and that I think I have to do with plain x86-64 stuff afaict

20:31 <egg> bofh: unless you can see a way to do it in sse2 or 3?

20:32 <egg> but without a 64-bit mul it seems infeasible :-/

20:33 * egg pokes _mm_mul_epu32 in the 32

20:33 <UmbralRaptor> mul_apin64

20:34 <bofh> egg: so like, even if you do it in sse2 it won't be faster, C + Y / 3 is fast in integer x86_64 >_>

20:34 <bofh> like, integer mul is 3 cycles latency *at most* since Nehalem, I think it might even be 2 now.

20:34 <bofh> and that's for all integer muls, incl. 64x64->128.

20:35 <egg> bofh: yes but you pay one cycle either way for the movqs

20:35 <bofh> yeah but that's all of 1 cycle, if that's actually seriously hurting your overall function perf, then your function is fast enough.

20:36 <bofh> :P

20:36 <egg> bofh: that's certainly noticeable for things like the clobbering to 16 bits

20:36 <bofh> for clobbering to 16 bits, sure, but that's b/c pand/andps makes sense to use there.

20:38 <egg> bofh: and for extracting the sign you need a couple of ands, and doing that in sse2 puts only one of them on the critical path

20:39 <egg> bofh: comparing https://github.com/eggrobin/Principia/blob/cbrt-benchmarks/benchmarks/cbrt.cpp#L114-L186 on my skylake laptop,

20:39 <egg> BM_EggCbrt 31594 ns 30762 ns 21333 +1.00000000000000000e+00; ∛2 = +1.25992104989487319e+00; -∛-2 = +nan

20:39 <egg> BM_EggSignedIntrinsicCbrt 32308 ns 32645 ns 24889 +1.00000000000000000e+00; ∛2 = +1.25992104989487319e+00; -∛-2 = +1.25992104989487319e+00

20:39 <egg> BM_EggSignedCbrt 33797 ns 32087 ns 22400 +1.00000000000000000e+00; ∛2 = +1.25992104989487319e+00; -∛-2 = +1.25992104989487319e+00

20:39 <bofh> wait, how on *earth* is extracting the sign on the critical path? you do it at the start of the function in the same path as the imul

20:39 <egg> bofh: wait what? don't I need to extract the sign before I do the linear approximation stuff?

20:40 <bofh> and then at the very end either OR it back in, or just do if (sign) x = -x before the return.

20:40 <egg> yeah I or it back in

20:40 <bofh> egg: uh the start of your f'n shouold be extract sign, absolute value, linear approx.

20:40 <bofh> and those can *all* be done in x86_64 from the initial movq.

20:43 <egg> bofh: oh the x86-64 stuff is superscalar too?

20:44 <bofh> I mean there are no data dependencies on the signbit once you extract it until the extreme end of the function...

20:45 <egg> bofh: yeah obviously

20:45 <egg> bofh: then again there's no reason not to do it in sse2 instrinsics (because then I get an m128i that I can just or at the end, instead of having to separately movq it back up)

20:46 <egg> bofh: but where to do abs is a good questino

20:46 <egg> s/tino/tion/ even

20:49 <egg> bofh: ah wait I have a dependency on abs y further down the line

20:49 <egg> bofh: so I'm better off doing that bitand in sse2 than movqing abs y back up too

20:54 <bofh> so like after the abs at the start of the code you *only* have a dependency on abs y, not y.

20:55 APlayer has quit [Read error: Connection reset by peer]

20:55 APlayer has joined #kspacademia

20:56 Technicalfool has joined #kspacademia

20:58 <egg> bofh: yes, but y comes to me in some xmm register, and I need abs y in one too

20:59 APlayer has quit [Ping timeout: 182 seconds]

20:59 <egg> bofh: so if I compute abs y from y with a pand/pandn I'm fine, otherwise I induce a (mild) dependency on the first movq and I need an additional movq that I wouldn't otherwise

21:00 <bofh> okay, I see your point.

21:05 <egg> bofh: amusingly (with haswell, and with the caveats of iaca occasionally being confused), the version with intrinsics is 120 cycles like the positives-only version (in practice and on skylake it's slightly slower, but noticeably less so than doing every integer operation through movqs, see above)

22:04 <egg> live, uh, lack of cat? https://www.youtube.com/watch?v=O4zKbwnojuY

22:46 <SnoopJeDi> um so apparently LIGO is limited at low frequencies by Brownian motion in the reflective optical coating on the test masses

22:55 <whitequark> uh

22:55 <SnoopJeDi> well, that's the way he phrased it anyhow. I liked it better as he rephrased it: losses in the material correspond directly with thermal noise

22:57 <SnoopJeDi> I didn't realize A+/Voyager were a thing, but apparently they're planning to implement a "squeezed light" (goddammit optics) technique this year to push down the radiative forcing noise

22:58 <egg> bofh: lolwtf, adding branches for rescaling to prevent over/underflow (and correctly handle subnormal values as a side effect) makes the benchmark (main branch) *faster*?!

22:58 <SnoopJeDi> surprising number of orders of magnitude left for a terrestrial GW observatory (if one believes the claims, but I think that's reasonable)

22:59 <egg> bofh: at least you're right that it doesn't slow things down at all :-p

22:59 <egg> all hail speggulative eggsecution

23:06 <Iskierka> for as long as we're allowed to use it

23:08 <egg> bofh: it's not like I tried doing anything smart either https://github.com/eggrobin/Principia/blob/cbrt-benchmarks/benchmarks/cbrt.cpp#L178-L188 (I need to refine and prove the values of smol and big, they're likely way to big and smol respectively)

23:35 <UmbralRaptor> Ellied: what sort of spectral resolution do your glasses get?