User's banner
Avatar

RoundSparrow

RoundSparrow@lemmy.ml
Joined
245 posts • 767 comments

“Finnegans Wake is the greatest guidebook to media study ever fashioned by man.” - Marshall McLuhan, Newsweek Magazine, page 56, February 28, 1966.

I have never done LSD or any other illegal drugs, but I have read FInnegans Wake: www.LazyWake.com

Lemmy tester, “RocketDerp” is my username on GitHub

Direct message

ild rants. i’ve seen you do this many times. you need to step back, relax, and not take technical feedback so personally

I’ve stepped back and watched them ignore the issue since May when all the servers were crashing. Every single Lemmy server was falling over while they ignored the PostgreSQL problems.

The mistakes are obvious and huge. These are not minor topics.

our comments there are exceptionally aggressive. you accuse the developers of “hazing” you

I think they are hazing the entire World Wide Web, Reddit users, etc. How else can you explain such basic SQL problems that they have allowed to go on for so long?

Its’ as if the mere concept of Redis or Memcache never existed… and that nobody ever heard of JOIN performance problems. If it isn’t extreme social hazing, what is it?

permalink
report
parent
reply

having a meltdown on github doesn’t help anybody.

I’m glad for you that mental control is so trival and you aren’t near death in your life from your brain damage.

Go outside and take a breath

I just got back from dinner ant the months of hazing I’ve witnessed hasn’t gone away. The level of social games being played with PostgreSQL in this project are levels beyond anything I’ve encountered in my 50+ years alive. And I’ve first hand seen Bill Gates and his team do all kinds of odd things to groups.

I am at a total loss to explain why such fundamentals of basic relational database are avoided in this project. If it isn’t social hazing, what is it?

permalink
report
parent
reply

Of course, that would be an insane amount of work, especially if it would get ignored, but something to consider!

I already did an insane amount of work to populate a Lemmy database with over 10 million posts. It is so incredibly slow out of the box that the normal API would take days to accomplish this. i had to rewrite the SQL TRIGGER logic to allow bulk inserts.

Here is my work on that:

DROP TRIGGER site_aggregates_post_insert ON public.post;


/*
TRIGGER will be replaced with per-statement INSERT only
*/
CREATE TRIGGER site_aggregates_post_insert
   AFTER INSERT ON public.post
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION site_aggregates_post_insert();


DROP TRIGGER community_aggregates_post_count ON public.post;


/*
TRIGGER will be replaced with per-statement INSERT only
*/
CREATE TRIGGER community_aggregates_post_count
   AFTER INSERT ON public.post
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION community_aggregates_post_count();


DROP TRIGGER person_aggregates_post_count ON public.post;


/*
TRIGGER will be replaced with per-statement INSERT only
*/
CREATE TRIGGER person_aggregates_post_count
   AFTER INSERT ON public.post
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION person_aggregates_post_count();



/*
TRIGGER will be replaced with per-statement INSERT only
no Lemmy-delete or SQL DELETE to be performed during this period.
*/
CREATE OR REPLACE FUNCTION public.site_aggregates_post_insert() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN
   UPDATE site_aggregates SET posts = posts +
      (SELECT count(*) FROM new_rows WHERE local = true)
      WHERE site_id = 1
      ;

   RETURN NULL;
END
$$;


CREATE OR REPLACE FUNCTION public.community_aggregates_post_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN
        UPDATE
            community_aggregates ca
        SET
            posts = posts + p.new_post_count
        FROM (
            SELECT count(*) AS new_post_count, community_id
            FROM new_rows
            GROUP BY community_id
             ) AS p
        WHERE
            ca.community_id = p.community_id;

    RETURN NULL;
END
$$;


/*
TRIGGER will be replaced with per-statement INSERT only
no Lemmy-delete or SQL DELETE to be performed during this period.
*/
CREATE OR REPLACE FUNCTION public.person_aggregates_post_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN
        UPDATE
            person_aggregates personagg
        SET
            post_count = post_count + p.new_post_count
        FROM (
            SELECT count(*) AS new_post_count, creator_id
            FROM new_rows
            GROUP BY creator_id
             ) AS p
        WHERE
            personagg.person_id = p.creator_id;

    RETURN NULL;
END
$$;


/*
***********************************************************************************************
** comment table
*/


DROP TRIGGER post_aggregates_comment_count ON public.comment;


/*
TRIGGER will be replaced with per-statement INSERT only
*/
CREATE TRIGGER post_aggregates_comment_count
   AFTER INSERT ON public.comment
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION post_aggregates_comment_count();


-- IMPORTANT NOTE: this logic for INSERT TRIGGER always assumes that the published datestamp is now(), which was a logical assumption with general use of Lemmy prior to federation being added.
CREATE OR REPLACE FUNCTION public.post_aggregates_comment_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN

        UPDATE
            -- per statement update 1
            post_aggregates postagg
        SET
            comments = comments + c.new_comment_count
        FROM (
            SELECT count(*) AS new_comment_count, post_id
            FROM new_rows
            GROUP BY post_id
             ) AS c
        WHERE
            postagg.post_id = c.post_id;


        UPDATE
            -- per statement update 2
            post_aggregates postagg
        SET
            newest_comment_time = max_published
        FROM (
            SELECT MAX(published) AS max_published, post_id
            FROM new_rows
            GROUP BY post_id
             ) AS c
        WHERE
            postagg.post_id = c.post_id;

        UPDATE
            -- per statement update 3
            post_aggregates postagg
        SET
            newest_comment_time_necro = max_published
        FROM (
            SELECT MAX(published) AS max_published, post_id, creator_id
            FROM new_rows
            WHERE published > ('now'::timestamp - '2 days'::interval)
            GROUP BY post_id, creator_id
             ) AS c
        WHERE
            postagg.post_id = c.post_id
            AND c.creator_id != postagg.creator_id
            ;

    RETURN NULL;
END
$$;


DROP TRIGGER community_aggregates_comment_count ON public.comment;

CREATE TRIGGER community_aggregates_comment_count
   AFTER INSERT ON public.comment
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION public.community_aggregates_comment_count();


CREATE OR REPLACE FUNCTION public.community_aggregates_comment_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN

        UPDATE
            community_aggregates ca
        SET
            comments = comments + p.new_comment_count
        FROM (
            SELECT count(*) AS new_comment_count, community_id
            FROM new_rows AS nr
            JOIN post AS pp ON nr.post_id = pp.id
            GROUP BY pp.community_id
             ) AS p
        WHERE
            ca.community_id = p.community_id
            ;

    RETURN NULL;

END
$$;


DROP TRIGGER person_aggregates_comment_count ON public.comment;

CREATE TRIGGER person_aggregates_comment_count
   AFTER INSERT ON public.comment
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION public.person_aggregates_comment_count();


CREATE OR REPLACE FUNCTION public.person_aggregates_comment_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN

        UPDATE
            person_aggregates personagg
        SET
            comment_count = comment_count + p.new_comment_count
        FROM (
            SELECT count(*) AS new_comment_count, creator_id
            FROM new_rows
            GROUP BY creator_id
             ) AS p
        WHERE
            personagg.person_id = p.creator_id;

    RETURN NULL;
END
$$;


DROP TRIGGER site_aggregates_comment_insert ON public.comment;

CREATE TRIGGER site_aggregates_comment_insert
   AFTER INSERT ON public.comment
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION public.site_aggregates_comment_insert();


CREATE OR REPLACE FUNCTION public.site_aggregates_comment_insert() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN

   UPDATE site_aggregates
      SET comments = comments +
         (
            SELECT count(*) FROM new_rows WHERE local = true
         )
      WHERE site_id = 1
      ;

    RETURN NULL;
END
$$;

With this in place, 300,000 posts a minute can be generated and reaching levels of 5 million or 10 million don’t take too long.

permalink
report
parent
reply

ok if you want to end now, I’ll be around tomorrow too.

I think we both suspect that the URL vs the config file is using a different database name. And it didn’t see the data we restored and started from scratch.

permalink
report
parent
reply

it would answer how this happened… but we do need to find the syntax for port in lemmy.hjson

(And then open a bug that the documentation isn’t exactly clear on that page I linked!)

permalink
report
parent
reply

As I said, I could see one of the communities I subscribed on the stdout

but only one, right? See federation is kind of automagic in how if one single post came in for that community it could very well go create the community itself. On an empty database.

Now did you upgrade lemmy-ui and maybe run into problems there?

I think the safe thing to do at this point is work with postgresql and try to make sense of the data in 15 and deduce why we think Lemmy started as a virgin instance. But it’s going to take some time. And I need to take a couple breaks… I will be around for the next 5 or 6 hours, but need a 15 minute break and then about 30 or 40 minutes break to travel to dinner (but I’ll be online once I arrive).

pg_dumpall against 15 will give us EVERYTHING - and we grep through that and see if we can figure out if somehow two different databases got created. That’s what I think might have happened. I normally create a half-dozen different databases in 15 for testing federation locally (lemmy-alpha, beta, gamma, etc).

I guessing the “/lemmy” at the end isn’t exactly how the .config worked prior to us shifting over to the URL scheme.

permalink
report
parent
reply

I’ve never switched a system from config.hjson or wahtever file over to URL - maybe the /lemmy on the end is wrong?

permalink
report
parent
reply

It looks to me like Lemmy found an empty database and issued all the migrations of a new install…

So that database URL we gave it was wrong, or the restore we did was wrong parameters, etc.

And, like I mentioned, some confusion already happens with your federation status as I think it rushes out to register itself as a new server with the Lemmy network. And some data got in…

So… I’m not sure what to do figure this out. We could do a pg_dumpall of your PostgreSQL 15 data and then sift through it and see if we can make sense of how this happened?

permalink
report
parent
reply

I can see only one of my Subscribed communities in there, though.

Ok, what probably happened there was incoming federation triggered it to create the community on your ‘empty’ database. So again, we want to stop lemmy_server service now to try and stop further data going into this empty one.

permalink
report
parent
reply