VITAS changed the topic of #spacedock to: Problems?: https://github.com/KSP-SpaceDock/SpaceDock/issues | Matrix/Riot Chat: https://im.52k.de +spacedock:52k.de Feel free to ask for help, we only bite a little bit! | If you want to help, please check https://github.com/KSP-SpaceDock/SpaceDock-Backend/issues/5 :) | <VITAS> inet users have the attentionspan of a squirrel...
Darklight[m] has quit [Quit: Idle timeout reached: 10800s]
DasSkelett[m] has joined #spacedock
<DasSkelett[m]> <span class="d-mention d-user">VITAS</span> what's your opinion on a 30% database size increase to get a decent performance boost?
DasSkelett[m] has quit [Quit: Idle timeout reached: 10800s]
godarklight[m] has joined #spacedock
<godarklight[m]> I thought it was negligable?, I didn't actually check though
DasSkelett[m] has joined #spacedock
<DasSkelett[m]> https://github.com/KSP-SpaceDock/SpaceDock/pull/281 <span class="d-mention d-user">HebaruSan</span> calculated the percentages
Darklight[m] has joined #spacedock
<Darklight[m]> Oh, I just realised that my benchmark wouldnt have tested the one thing we needed to test, search_mods
<DasSkelett[m]> search_mods wasn't involved in #281
<DasSkelett[m]> `search_mods()` wasn't involved in #281
<godarklight[m]> Err I was wrong sorry, yeah all the long endpoints are search_mods and there is quite an improvement there
<godarklight[m]> get_mods() does an empty search_mods()
<godarklight[m]> I should bench the one where weigh result is removed
<godarklight[m]> I can't imagine it would make a big difference unless sqlalchemy is lazy loading
<DasSkelett[m]> Well okay, `weigh_result()` does access `Mod.created` once, which was one of the newly indexed columns. But that's only a part of why `search_mods()` is so slow.
Darklight[m] has quit [Quit: Client limit exceeded: 6]
HebaruSan[m] has joined #spacedock
<HebaruSan[m]> Yeah separate tests of the score PR would be nice
<HebaruSan[m]> Yeah separate tests of the score PR would be nicve
<DasSkelett[m]> sqlalchemy is lazy-loading, as in it only requests the rows when it finally needs them. I spent some time confirming this myself.
<godarklight[m]> I mean in terms of "each mod in weigh results is another trip to the database" sense
<DasSkelett[m]> The current implementation does fetch all rows with all columns in one go, saves them in one big array and then iterates over it to calculate the scores.
<godarklight[m]> I thought I confirmed that wasn't the case, let me check ;)
<HebaruSan[m]> Doesn't the sqlalchemy stuff do some on-demand loading? I thought that was why #276 failed
<HebaruSan[m]> I.e., I was storing lazily loaded thunks in an array and trying to use them out of their proper context
<DasSkelett[m]> Well okay, maybe sqlalchemy doesn't load all rows at once in this case, but they're still stored in one big Python array/list in the end. Otherwise `sorted()` couldn't work.
<HebaruSan[m]> Yeah, sorting the whole array by definition requires loading all the data
* godarklight[m] posted a file: postgres.zip (139KB) < https://matrix.52k.de/_matrix/media/r0/download/52k.de/NlcDdNJqnrYIKhqZWnGMoGYk >
<godarklight[m]> I was right, I remember seeing this
<godarklight[m]> This is a single visit to the popular page
<godarklight[m]> It's worth looking at and then crying about :P
<godarklight[m]> My comment is "fucking object mappers", but I understand why it exists :P
<HebaruSan[m]> Wow that is what is live in prod right now 🤮
<HebaruSan[m]> Can sqlalchemy turn that off and just return all the data, if we know we're going to need it?
<HebaruSan[m]> Then again, we won't need it, if DasSk's PR goes thru
<DasSkelett[m]> Actuakky, that's not something I fix in my PR.
<HebaruSan[m]> Your PR does make it so we don't need to load every row in the db tho
<HebaruSan[m]> So on-demand loading can still make sense
<DasSkelett[m]> That's acceessing every single mod version there is, just to get the count of them, because we are using `len()` instead of `.count()`:
<godarklight[m]> Popular has to load all the mods and then uses the weighted_results thing
<HebaruSan[m]> Hmm, I may be thinking of your PR as already having `index=True`
<HebaruSan[m]> Are we planning to go with that?
<godarklight[m]> That's up to vitas
<HebaruSan[m]> No, I mean the score PR, separate from mine
<godarklight[m]> I think I want to test the score pr
<HebaruSan[m]> Currently I think it's not indexed, so it gets a performance boost by moving the sort from Python to postgres
<HebaruSan[m]> With an index the sort would be free
<godarklight[m]> The "not calling the database 10000 times per page" also would help, if it does that :P
<DasSkelett[m]> Okay guys, slowly. <span class="d-mention d-user">HebaruSan</span> yes, I would go with `index=True`.
<DasSkelett[m]> @godarklight, look at the log, that's not fetching every mod one for one (which indeed happens in a single call, right at the top), the spammy part is loading all mod **versions** of one mod after another, just to get the count (see my linked code lines).
<godarklight[m]> Ah ok
<DasSkelett[m]> *edit:* ~~Okay guys, slowly. <span class="d-mention d-user">HebaruSan</span> yes, I would go with `index=True`.
<DasSkelett[m]> @godarklight, look at the log, that's not fetching every mod one for one (which indeed happens in a single call, right at the top), the spammy part is loading all mod **versions** of one mod after another, just to get the count (see my linked code lines).~~ -> Okay guys, slowly. <span class="d-mention d-user">HebaruSan</span> yes, I would go with `index=True`.
<DasSkelett[m]> @godarklight, look at the log, that's not fetching every mod one for one (which indeed happens in a single call, right at the top), the spammy part is loading all mod **versions** and **media** of one mod after another, just to get the count (see my linked code lines).
<godarklight[m]> Holy quackin' duckshits
<godarklight[m]> I'll show you in a sec
<godarklight[m]> That looks much more healthier
<godarklight[m]> I don't know if celery did the precalculate though, but the order shouldn't matter for speed
<DasSkelett[m]> You have to wait 10 Minutes as of the current commit
<godarklight[m]> That's like a 1000x increase right? :P
<DasSkelett[m]> Pffft
<godarklight[m]> Heh, maybe not that much, but yeah, I like this hehe
<DasSkelett[m]> When we want to use 5 days (or preferably 1 day as Rocky suggested), we have to find a way to trigger it once after startup.
<HebaruSan[m]> Alembic?
<DasSkelett[m]> After restarts too
<HebaruSan[m]> It's *kind of* a data migration thing
<godarklight[m]> I'd do it every at least hourly, it doesn't matter because I assume the function will run quickly enough (I assume 20 seconds?)
<godarklight[m]> And you can trigger it from a oneshot in the flask router probably
<DasSkelett[m]> It has also be triggered after normal server restarts, not only after database migrations (which it does anyway).
<DasSkelett[m]> But I'd like to only trigger it once, not 6 times for every gunicorn instance. Is this possible in Flask?
<godarklight[m]> Does flask have somewhere obvious to put it, is there something like after_request? where we can call the celery function
<HebaruSan[m]> Why after a restart? Wouldn't the previous values still be in the db?
<DasSkelett[m]> *edit:* ~~But I'd like to only trigger it once, not 6 times for every gunicorn instance. Is this possible in Flask?~~ -> But I'd like to trigger it only once, not 6 times for every gunicorn instance. Is this possible in Flask?
<godarklight[m]> Maybe check a redis key
<DasSkelett[m]> > Why after a restart? Wouldn't the previous values still be in the db?
<DasSkelett[m]> Actually... You might have a point.
<HebaruSan[m]> I think it would be a good fit for alembic if we can make it work. Get it done as part of the upgrade, before the server goes live
<godarklight[m]> That way no dumb cron tasks
<godarklight[m]> It wouldn't need to be in redis, but somewhere shareable all instances have access to
<godarklight[m]> Also I forget how celery tasks are called but I think it is something to do with the .delay, my memory is hazy >.<
DasSkelett[m] has quit [Quit: Client limit exceeded: 6]
HebaruSan[m] has quit [Quit: Client limit exceeded: 6]
DasSkelett[m] has joined #spacedock
<DasSkelett[m]> Mhh, looks like you can't do `.count()` on backrefs, so the ModVersion spam has to stay for now :/ At least it will happen in the background, so yeah.
godarklight[m] has quit [Quit: Client limit exceeded: 6]
HebaruSan[m] has joined #spacedock
<HebaruSan[m]> I think we may be able to do something else about that
<DasSkelett[m]> Remove it from the score 😄
Darklight[m] has joined #spacedock
<Darklight[m]> Did the eager load thing help ^, I am not nearly familiar enough with postgres
<Darklight[m]> Oh
<Darklight[m]> Hey I can test that, I will go back to master and add it in
<HebaruSan[m]> Hmm, it does have `.join(Mod.versions)`, I must have been thinking of something else
<DasSkelett[m]> Ouch, the alembic upgrade script will be quite big, since I not only have to provide a skeleton for the `Mod` table, but also for all the tables that jave any backrefs that we access during `get_mod_score()`.
<HebaruSan[m]> Can't just call the existing code?
<DasSkelett[m]> Hmm, maybe importing the classes of the other tables works, since they don't change during the upgrade.
Darklight[m] has quit [Quit: Client limit exceeded: 6]
HebaruSan[m] has quit [Quit: Client limit exceeded: 6]
<DasSkelett[m]> Mmpf, now I can calculate the scores in alembic, however for whatever reason they don't get commited to the db (I do call `session.commit()`.
<DasSkelett[m]> Mmpf, now I can calculate the scores in alembic, however for whatever reason they don't get commited to the db (I do call `session.commit()`).
HebaruSan[m] has joined #spacedock
<HebaruSan[m]> I think maybe you have to do it through the `op` object
<DasSkelett[m]> Ha, I remember that answer! ;-)
<DasSkelett[m]> But he seams to alter `player.team` without using `op` in this answer.
<HebaruSan[m]> That looks important
<DasSkelett[m]> Oh no, he's creating the `Team` table there.
<HebaruSan[m]> Oh oh yeah
DasSkelett[m] has quit [Quit: Client limit exceeded: 6]
<HebaruSan[m]> Should your `class Mod` have a score property?
<HebaruSan[m]> Otherwise maybe it doesn't know that's an SQL column
DasSkelett[m] has joined #spacedock
<DasSkelett[m]> No, because the column doesn't exist yet. Sqlalchemy would throw an error that it can't find that column. Had that before.
<HebaruSan[m]> But you don't use it till after `op.add_clumn`, right?
<HebaruSan[m]> But you don't use it till after `op.add_column`, right?
<DasSkelett[m]> Sqlalchemy still tries to load it.
<HebaruSan[m]> I think that's basically it, assigning to `mod.score` isn't going to do anything SQL-related if it's not defined as as column
<DasSkelett[m]> But I thought it worked in #274 🤔
<HebaruSan[m]> Can you define a class property inside `upgrade`?
<HebaruSan[m]> `Mod.score = Column(Float, default=0, nullable=False, index=True)` ?
<DasSkelett[m]> Hehe, had the same idea, trying this right now.
HebaruSan[m] has quit [Quit: Client limit exceeded: 6]
* DasSkelett[m] uploaded an image: unknown.png (4KB) < https://matrix.52k.de/_matrix/media/r0/download/52k.de/urptoztjNOiRliYDaZKPfHVh >
<DasSkelett[m]> Yep, that's it 🎉
<DasSkelett[m]> Thanks!
DasSkelett[m] has quit [Quit: Client limit exceeded: 6]
HebaruSan[m] has joined #spacedock
<HebaruSan[m]> *edit:* ~~I think that's basically it, assigning to `mod.score` isn't going to do anything SQL-related if it's not defined as as column~~ -> I think that's basically it, assigning to `mod.score` isn't going to do anything SQL-related if it's not defined as a column
DasSkelett[m] has joined #spacedock
<DasSkelett[m]> Now I just have to update the score everywhere we change something that affects it.
<HebaruSan[m]> Mostly download
<HebaruSan[m]> Want me to do that?
<DasSkelett[m]> Oh no, shouldn't be take that long, thanks for the offer though.
<DasSkelett[m]> But I'd welcome if you can check that I didn't miss anything after I pushed it ^^
* HebaruSan[m] uploaded an image: unknown.png (11KB) < https://matrix.52k.de/_matrix/media/r0/download/52k.de/rZHDVDoSezsNajPZqUsIsxzI >
<HebaruSan[m]> > But I thought it worked in #274 🤔
<HebaruSan[m]> <span class="d-mention d-user">DasSkelett</span>
<HebaruSan[m]> The class def in #274 has the new column defined
<HebaruSan[m]> Maybe that's what you meant
<DasSkelett[m]> Uhm, hmm..
<DasSkelett[m]> Tested moving the column declaration in to the class definition, and it does indeed work. Why do I remember getting an error there?
<DasSkelett[m]> *edit:* ~~Oh no, shouldn't be take that long, thanks for the offer though.~~ -> Oh no, shouldn't take that long, thanks for the offer though.
HebaruSan[m] has quit [Quit: Client limit exceeded: 6]
Darklight[m] has joined #spacedock
<Darklight[m]> I am a little confused, why is this not a column?
<DasSkelett[m]> Why is what not a column?
godarklight[m] has joined #spacedock
<godarklight[m]> Nevermind, I am only loosely reading, distracted elsewhere now >.<
<DasSkelett[m]> Yeah it's probably easier to look at the changes in the PR after my next push.
Darklight[m] has quit [Quit: Client limit exceeded: 6]
HebaruSan[m] has joined #spacedock
<HebaruSan[m]> Well on the upside this is definitely helping me to understand how sqlalchemy works
godarklight[m] has quit [Quit: Client limit exceeded: 6]