So here we are at the last of my series of four blogs. If you’ve picked up the other three: Blog 1 of 4: Aggregation, Blog 2 of 4: Checking text within a larger string and Blog 3 of 4: Primary key considerations, thanks for reading! And watch this space, I’m preparing some more, so please check back soon to read those too.

As I said before, I’ve always admired the desire of the PostgreSQL community to share knowledge and expertise. Following in that tradition I’d like to share a few pearls of PostgreSQL wisdom that I’ve picked up over the years as a PostgreSQL developer. Now let’s get started with blog 4.

4 of 4: Moving data around

In my career, I’ve seen different ways of moving data from one table to another. For example, the table where the original data is stored may be a temporary storage that is used later in a process driven by external events. So you need to store the data on event 1, use it on event 2 and remove it from the table where it was stored by event 1. But to determine which data was used in event 2, you need to store it again somewhere else. Typically it might look like this:

FUNCTION event2 (bunch of parameters)
1) select records from data stored in event1 (table1)
2) do some analysis/processing on this data in combination with the parameters
3) insert as 'evidence' the data and outcome for later analysis into table2
4) delete the records from table1
5) return for example a status of the analysis/processing

For me, this raises the following questions:

Can we improve the subsequent insert and delete?
What if event2 never comes, what happens to the data of table1?

To answer to question 1, yes we definitely can improve it! As I hinted in my previous blog “Primary Key Considerations” we can use a Common Table Expression “CTE” for both. For the first question we could write a job to move data from table1 to table2 but give an indication that event2 did not occur. When event2 is called, we do a similar move. For the function: “event2” lines 3) and 4) we will have the following structure:

WITH
    deletions AS
    (
    DELETE  FROM table1 t1
    WHERE   t1.external_identification = parameter_session_id
    RETURNING
            *
    ,       'DONE'               as event2_status   -- how did the data got into table2
    ,       (now() - entry_time) as duration        -- indcating the time between events
    )
INSERT  INTO table2
SELECT  *
FROM    deletions

To answer the second question, the WHERE-clause should be changed to e.g. a date/time based condition. Also the hardcoded 'DONE' as event2_status and duration (e.g. leave null) should be changed accordingly.

The reason why you would want to remove this from table1 is that event2 needs to be fast and the more records table1 contains, the slower event2 will eventually become. Another advantage is that a data analyst only needs to focus on the data in table2.

And what if you needed to retreive only a few records for maintenance? Here again we could put the “CTE” into action. If we wrote it as a select of those few records you are looking for in a big table and use this prepared selection as a base to do another query (DML e.g.), it’s likely the optimiser might make a decision you don’t want it to. It could end up in a join (hash/merge/nested loop) that’s far from ideal. I solved such a situation by using the MATERIALIZED keyword.

WITH
    lost_in_space AS
    MATERIALIZED
    (
    SELECT  *         -- star in heaven ;-)
    FROM    galaxy    -- possibly partitioned
    WHERE   ...       –- conditions where partitioning is not based upon ..
                      -- .. but result in a small number of records
    )
SELECT  ...
FROM    table1
JOIN    lost_in_space ON (...)
WHERE   ...

As the size of the “galaxy” increased over time (years), the batch became very slow. By only adding the keyword “MATERIALIZED”, the processing was reduced from hours to minutes. Ta dah!

And there you have it. That’s the end of my run of four short blogs on pearls of PostgreSQL coding wisdom. I hope you enjoyed them. Thanks for reading!

What are other people reading?

New minor releases of PostgresPURE, Q2 2023

by Michel Schöpgens | Jun 27, 2023 | PostgresPURE, PostgresPURE releases

June 20, 2023, the PostgreSQL community has made available the new minor releases of PostgreSQL 11 through 15. For us, this means that based on these new minor releases, we have to start working on compiling the new releases of PostgresPURE. Meanwhile, we have done...

Cortex 2023.1 is released

by Michel Schöpgens | May 19, 2023 | Cortex, Cortex releases

Introducing the new release of Cortex and future developments. It took some time and we had some delay but finally the new release of Cortex is there. The delay was mainly caused by use cases given by our clients, which gave us a lot of inspiration to extend and improve the capabilities of Cortex. …

Oracle to PostgreSQL migrations – A combination of science and art

by Michel Schöpgens | Mar 15, 2023 | Cortex, Home

For me, migrating Oracle databases to PostgreSQL is a combination of science and art. Science is designed to be objective and guided by data and art is subjective and heavily influenced by feelings and opinions. Traditionally, art and science are treated as two separate disciplines, but when they are studied together, it becomes clear how one influences the other.

Blog

PostgreSQL Expert Insight Series – Blog 4 of 4: Moving data around

4 of 4: Moving data around

What are other people reading?

New minor releases of PostgresPURE, Q2 2023

Cortex 2023.1 is released

Oracle to PostgreSQL migrations – A combination of science and art

Blog Categories

Want to know more?

Thanks for visiting us!