Hello from me
I’ve always admired the desire of the PostgreSQL community to share knowledge and expertise. Following in that tradition I’d like to share a few pearls of PostgreSQL wisdom that I’ve picked up over the years as a PostgreSQL developer. But before I start of, I’d like to point out that there are two ways to interpret the title PostgreSQL developer.
- Working at the core, extensions etc. that contribute to PostgreSQL as database to be used. Most of the time this is written in C.
- Working with the above to achieve the demands from the business. Creating tables, indexes, functions (SQL-language) but looking at performance of that as well.
I consider myself to be among the developers working with PostgreSQL.
Ok, let’s get started. My aim, over the course of this series of four blogs, is to help point you in the right direction when you’re faced with a problem. I can’t promise I’ll be able to give you the whole answer, but they should help get you started.
Now let’s get to the structure of this series. To start with, there are four subjects I would like to share with you. All four blogs have the background of approaching a task in an alterative way.
- Aggregation
- Checking specific text within a larger string
- Primary Key considerations
- Moving data around
I’ll support these subjects with examples from real life situations I’ve encountered. Of course, I’ve anonymised them or taken it to a higher level of abstraction. Now, onto our first blog.
1 of 4: Aggregation
You might think it’s pretty simple to get the last time something happened within a group of one or more attributes. You just write the query with a group by
for the non-aggregated columns, right? However, this may come at a cost as the query has to fetch all records related to the non-aggregated columns. Surely there’s a more efficient and faster way to do that?
Ok, so let’s assume that we have a table that contains the GPS points and the time being at that location. Now we can ask the question: When did you arrive at the end of the journey?
We could make a loop with (individual) cursors, but this is not my preference. Not only is it quite intensive to write in plpgsql, but it is also, in my opinion, harder to read and maintain, and probably much slower. We have a database that analyses the query for alternative execution plans. So let’s use that knowledge of the geniuses that have debated and implemented these strategies in the core of PostgreSQL itself. A big “thank you” to the contributors to the internal (open-source) code of PostgreSQL.
Back to the functional question from above…
The usual query would look like this:
SELECT trk.track_name
, MAX(trp.location_time)
FROM routes.track trk
JOIN routes.track_point trp USING (id_track)
WHERE trp.location_time
BETWEEN '2020-01-01'
AND '2020-06-01'
GROUP BY
trk.track_name
An alternative way could look like this:
SELECT DISTINCT ON (trk.track_name)
trk.track_name
, trp.location_time
FROM routes.track trk
JOIN routes.track_point trp USING (id_track)
WHERE trp.location_time
BETWEEN '2020-01-01'
AND '2020-06-01'
ORDER BY
trk.track_name
, trp.location_time DESC
The crux of this alternative is to put the non-aggregated columns in a distinct on (...)
‑clause and with additional ordering on the aggregated column.
This alternative only works if you want to find a minimum or maximum of the aggregated column and the order will then be ASC(ending) or DESC(ending) respectively. Now, the observant reader will have noticed there is a drawback – it only works for one aggregated column.
I can’t say that the latest version is faster, but it could be. At least, that was one of the lessons I learned over the years. So it’s worth a try if you come across such a functional question.
Another idea is to use a window-function with partitioning on the non-aggregated columns and using distinct if needed. I admit I’ve haven’t tried this out – but it would be something like:
SELECT DISTINCT
trk.track_name
, MAX(trp.location_time) OVER (PARTITION BY trk.track_name)
...
I haven’t investigated this alternative further because it feels more complex to understand its functional meaning. For future developers it’s harder to understand and maintain this code.
They will no doubt ask themselves, why was it done this way – was there a reason for it? In fact, even the creator of this code has forgotten the reason, so for that reason alone, perhaps it is too complex!
Make sure you check out Blog 2 of 4: Checking text within a larger string.
What are other people reading?
Cortex 2023.1 is released
Introducing the new release of Cortex and future developments. It took some time and we had some delay but finally the new release of Cortex is there. The delay was mainly caused by use cases given by our clients, which gave us a lot of inspiration to extend and improve the capabilities of Cortex. …
Oracle to PostgreSQL migrations – A combination of science and art (Part I)
For me, migrating Oracle databases to PostgreSQL is a combination of science and art. Science is designed to be objective and guided by data and art is subjective and heavily influenced by feelings and opinions. Traditionally, art and science are treated as two separate disciplines, but when they are studied together, it becomes clear how one influences the other.
PostgresPURE is updated for all minor releases
February 9, 2023, the PostgreSQL community has made available the new minor releases of PostgreSQL 11 through 15. For us, this means that based on these new minor releases, we have to start working on compiling the new releases of PostgresPURE. Meanwhile, we have done...