Posts with fundamental knowledge about Presto internals

The fundamentals: MPP and data distribution

8 minute read

One can say that Presto is a MPP (Massively Parallel Processing) kind of application. Well, I have never seen a data warehouse which did not follow this approach. Teradata, Netezza, Vertica and even Hive and many many more, all of these belong to this class of software. It is not only typical for data warehouses, but also for any distributed application which is processing vast amount of data, doing non-trivial and very costly computation on it.

Read more...

The fundamentals: join algorithms

9 minute read

In previous post I explained how join works from the user point of view. Now it is the right time to go one step deeper and learn how things are actually calculated. This a very broad topic, so today we are going to just touch upon every join algorithms used in Presto. To understand what they do and when they are used

You know already that on one axis join can be INNER, OUTER, SEMI etc. Join execution is an independent axis. It means that t...

Read more...

The fundamentals: types of joins in SQL

5 minute read

Joins are one of the most important parts of each database and SQL itself. So it was obvious to me that this topic is going to appear very often on this blog. Hovewer, I thought that my first technical article will be about something more advanced, like join reordering or at least cross join elimination. Although, when thinking about what I could write about I realized that I would need to explain the basic terminology, so we could find a common domain language. Probably, most of the things below you know already, but even so I hope this is going to help you structurize your knowledge and find the missing things.

It...

Read more...