UmbralRaptor changed the topic of #kspacademia to: https://gist.github.com/pdn4kd/164b9b85435d87afbec0c3a7e69d3e6d | Dogs are cats. Spiders are cat interferometers. | Космизм сегодня! | Document well, for tomorrow you may get mauled by a ネコバス. | <UmbralRaptor> egg|nomz|egg: generally if your eyes are dewing over, that's not the weather. | <ferram4> I shall beat my problems to death with an engineer. | We can haz pdf
<egg|anbo|egg>
WTF unity's euler angle convention has ranges of 2π on all three angles
<egg|anbo|egg>
or maybe they're in [0, π/2] ∪ [3π/2, 2π]
egg|anbo|egg has quit [Remote host closed the connection]
<mofh>
I'm not sure which of those two is *more* confusing for an Euler angle impl.
egg|anbo|egg has joined #kspacademia
<egg|anbo|egg>
mofh: I can at least understand how you get to [0, π/2] ∪ [3π/2, 2π]
<egg|anbo|egg>
but yes, it's mad
<egg|anbo|egg>
(pitch/yaw/roll, so pitch has to be a half-turn centred on 0, if you make it go from 0 to π it no longer maps to usual aeronaval conventions there; then combine that with "all angles in [0, 2π]", and you get that mess)
<_whitenotifier-d13c>
[Principia] pleroy opened pull request #2392: Normalize the quaternions that we get from Unity - https://git.io/JeyHL
<whitequark>
SilverFox: well, a normal memory is a map from addresses to contents
<whitequark>
a CAM is a map from contents to addresses
<SilverFox>
im guessing CAM is content address memory or similar?
<whitequark>
so if a normal memory can be physically implemented by a multiplexer, then a CAM is implemented by a bunch of comparators and a one-hot to binary decoder
<whitequark>
(for example)
<whitequark>
yes, CAM is content-addressable memory
<egg|anbo|egg>
!choose food|not yet
<galois>
egg|anbo|egg: Your options: food, not yet. My choice: food
<egg|anbo|egg>
hm
<SilverFox>
yeah in machine code you basically just track where things are stored in ram via addresses, so all references to variable i are to go to address 0x01 or something
<SilverFox>
so cache would be that i -> address 0x01 and then retrieval of that data
<SilverFox>
?
<whitequark>
i have no idea what this question means
<whitequark>
a CPU doesn't care about variable names
<SilverFox>
right names are just superfluous it only cares about addresses, but with cache it wants to check if whatever variable i is, is still in cache eh
<SilverFox>
so it does that by searching the cache map for variable i?
<SilverFox>
what about address-address content mapping? where you index the cache via the ram address, and then retrieve contents from cache?
<SilverFox>
var i should in theory be at memory address XY because there's no reason for it to move around right, (i know it can happen in complex dynamic systems but whatever), so you can just look for thing of address XY in cache, and if cache miss then you have the address in ram easily at your disposal
<SilverFox>
and really the only cost there is time
<SilverFox>
or am I just being retarded
<whitequark>
the CPU doesn't know what variables are
<whitequark>
in general
<whitequark>
most CPUs have instructions that can refer to registers (usually only directly) and memory (directly or indirectly)
<whitequark>
on the CPUs that have caches, those address spaces are completely disjoint, so when talking about caches we only talk about memory
<SilverFox>
disjoint?
<whitequark>
you can't read or write a register's value using a memory access
<whitequark>
on some small CPUs registers are memory-mapped. 8051 and AVR, for example
<whitequark>
this tends to be a phenomenally bad idea if you want to scale things up, so more modern architectures just don't do it period
egg|anbo|egg has quit [Remote host closed the connection]
egg|anbo|egg has joined #kspacademia
<whitequark>
anyway. caches. the reason caches exist is that you can make memory that will quickly read at any address you give it, and that memory is super expensive (SRAM)
<whitequark>
you can also make memory that's super cheap but actually reading from it requires a complicated dance. it still gives you high *throughput* (in *some cases* you can get data from it almost as fast as you can from an equivalent SRAM array) but it also gives you, in general, high *latency* (unlike an SRAM array sometimes you have to spend hundreds of times as much time waiting for a read to become
<whitequark>
possible as actually performing the read)
<whitequark>
so, DRAM.
<whitequark>
by looking at real programs you can conclude that most programs access the same memory locations over and over, so if we could somehow access recently-used data faster, the programs would get much faster overall, even though the worst-case speed stays just as bad
<whitequark>
a cache is composed out of many small "windows" (cache lines) into the next level of memory hierarchy (next-level cache or main memory or it could be even a hard drive)
<whitequark>
when a cache receives a read request, it compares the address to see if it falls into any of the windows it currently maintains. (by making the windows power-of-2, you can strength reduce the in-bounds comparison to an equality comparison of just the high bits).
<whitequark>
if the request does indeed fall into one of the windows, it uses the data stored in that window to fulfill it. otherwise, it finds an unused window (it might have to evict the data from a used one first), fills it by making a window-sized read request to the next level of hierarchy, and then it already knows what to do.
<SilverFox>
okay im starting to get an idea here
<SilverFox>
there are 3 levels of L cache before RAM, then harddisk page file, eh
<whitequark>
you can have as many as you want in any sort of sequence, really
<SilverFox>
right, just setting the stage for example
<whitequark>
plenty of systems have L4 cache
<whitequark>
on something like eDRAM even
<whitequark>
(useful if your main memory could be even slower than DRAM, maybe because you're accessing it over some sort of network like InfiniBand)
<whitequark>
now, regarding what you said earlier about the only cost being time
<whitequark>
that's not actually the case if you have writes.
<SilverFox>
well I wasnt talking about writes, because of course write time is variable sometimes depending on medium and can be very long or a tedious process
<whitequark>
suppose you have two windows in one cache that refer to the same region in the next hierarchy level
<SilverFox>
if I dont understand reading from cache/ram im not going to understand writing to it
<whitequark>
if you have writes, those can get out of sync, and then very bad things will happen
<SilverFox>
I just want to understand the reading process before I get into writes
<whitequark>
sure, sounds good
<SilverFox>
you mentioned windows
<SilverFox>
is this like the columns and rows of DRAM?
<SilverFox>
or, SSDs?
<SilverFox>
one of the two
<whitequark>
no
<whitequark>
just in an abstract sense
<whitequark>
"a range of linear addresses"
<SilverFox>
ah, right
<SilverFox>
like how a "word" can be of any size, but is a singular thing in abstract
<SilverFox>
you said the cache compares the address with a table or lookup of some sorts, *what* address is being compared?
<SilverFox>
you have a byte from ram you need to get because we're getting variable i and ++ it or something
egg|anbo|egg has quit [Remote host closed the connection]
egg|anbo|egg has joined #kspacademia
<whitequark>
this is actually really complicated because you could either implement it in a simple way which is very expensive in silicon, or in tremendously hard to explain way that actually works well
<SilverFox>
well, lets go with simple, im sure they started there
<whitequark>
the simple way is to have n windows of size 2^k, each of which can point to any address range aligned to 2^k
<whitequark>
so if you have a w-bit address, you need n (w-k)-bit comparators
<whitequark>
and you compare the top (w-k) bits of the address with every window (cache line)
<whitequark>
note: no one in the entire history of computing has implemented this, afaik
<whitequark>
too expensive
<whitequark>
i mean, for general purpose memory accesses
<whitequark>
this is called "fully associative cache" and you sometimes see it used in really small caches elsewhere, like in TLBs
<SilverFox>
how can no one implement it if its implemented in really small caches
<SilverFox>
Ah
<whitequark>
the next simplest thing (which *is* common in really simple CPU caches for general purpose memory) is a direct-mapped cache
<SilverFox>
direct-mapped to what
<whitequark>
in it, you have 2^n windows of size 2^k, each of which can point to any address range aligned to 2^(n+k)
<whitequark>
so you use 2^n (w-(n+k)) bit comparators
<whitequark>
the downside is that, while in a fully associative cache, any aligned 2^k-sized range can go into any cache line, in a direct-mapped cache it can go into one fixed cache line
<SilverFox>
yeah mate im lost here
<whitequark>
i don't really want to bother explaining it in detail
<whitequark>
it's an optimization that doesn't matter in the general scheme of things because the fundamental behavior is the same
<whitequark>
it's just cheaper to implement in silicon at the cost of making some access patterns slower
<whitequark>
and the whole thing being harder to understand
<SilverFox>
im lost on the whole thing, you brought in k like what is k
<SilverFox>
and what is w
<whitequark>
i said above what's w
<whitequark>
k determines the size of a single cache line
egg|anbo|egg has quit [Remote host closed the connection]
<whitequark>
the smaller k is, the finer is the granularity of the cache, and the more expensive the cache is for a specific size
<whitequark>
on many modern systems k=6, so you have 64 byte cache lines
<whitequark>
er, k=9, since 64 bytes is 512 bits
egg|cell|egg has quit [Read error: Connection reset by peer]
egg|anbo|egg has joined #kspacademia
egg|cell|egg has joined #kspacademia
<whitequark>
actually, scratch that, my bit calculations are all wrong, ugh
<SilverFox>
the calcamalations dont matter, still dont get it
<whitequark>
don't get what
<SilverFox>
Yes
<whitequark>
ok i give up. find a book on computer architecture or something, this is a waste of time
<SilverFox>
oof
<SilverFox>
whitequark, watched an indian guy on youtube, I get direct mapping now
<whitequark>
\o/
<whitequark>
sorry i couldn't be of more help.
<whitequark>
i always found cache hierarchy super confusing myself
<whitequark>
or rather mostly cache implementations
<SilverFox>
naw im just rarted
<SilverFox>
but makes sense that caches have the same page size as ram, could be rather dumb trying to fit things where they dont fit
<kmath>
<bofh453> And finally, this horror across the road: #FDL2019 https://t.co/4MacGAqcFC
<SilverFox>
dude also says that pages are just straight copied because chances are the thing you want is in the same vicinity in ram because programs are localized
<SilverFox>
also he calls pages in cache "frames"
<SilverFox>
which maybe you were referring to as windows
<whitequark>
i'm not going to watch a 10 minute video
<SilverFox>
okay
<SilverFox>
what about a 5 minute one?
<whitequark>
i can read a transcript
<SilverFox>
okay well in this video, for the direct mapping, its like; cache has 128 entries, 0-127, RAM has 128 rows, with 32 columns, the index of the item in cache is what row it is in ram, and then there's a "tag" that is the column, and that way we can just tag x cache index to get the ram location
<SilverFox>
and can do the reverse, search via tag then cache index, to get item from cache
<whitequark>
aha yes that makes sense
<whitequark>
the reason cache line size matches RAM row size is that you can read an entire row in a single burst
<SilverFox>
yeah its easier that way
<whitequark>
it's not a hard requirement though, you might not e.g. know the row size when making the cache
<SilverFox>
you might not know it because two ram sticks can configure differently?
<whitequark>
something like that
<whitequark>
your CPU might not even include a DRAM controller at all
<SilverFox>
so a bit-implementation for comparison reasons would be using the "tag" as the high level bits, and then the lower level bits are the cache index?
<whitequark>
yep
<SilverFox>
aight sick I get this now
<SilverFox>
god bless the indians
<SilverFox>
and russians for maths
<SilverFox>
both of them are the backbone of academia
<SilverFox>
whitequark, so how is this much different then my initial proposal of doing a 2D map of RAM address + item?
<SilverFox>
it doesnt seem much different
<whitequark>
i don't know, because your proposal wasn't clear enough for me to understand
<kmath>
<LeaksPh> This argument should hold unless there is a huge conspiracy where, for each of the planets, all processes (water de… https://t.co/rkshcNd2Sb
<SilverFox>
so you have a list of X : Y, where X = ram address and Y is the item you want to check cache for, you go down list X comparing the RAM address there to the thing you want, and then bam you get Y
<whitequark>
that sounds like a general description of how CAM works
<SilverFox>
my method sounds like it'd take out at least a good chunk of cache or whatever the map is stored on, just for indexing
<whitequark>
i have no idea what that means
<SilverFox>
where do you store the tag
<whitequark>
i think usually you have separate tag memory and data memory. tag memory being CAM, and data memory being just normal SRAM
<SilverFox>
Ah
egg|anbo|egg has quit [Read error: Connection reset by peer]
egg|anbo|egg has joined #kspacademia
egg|anbo|egg has quit [Remote host closed the connection]