There is a new cool feature in Presto that will speed up some inequality joins.
Posts about internals
One can say that Presto is a MPP (Massively Parallel Processing) kind of application. Well, I have never seen a data warehouse which did not follow this approach. Teradata, Netezza, Vertica and even Hive and many many more, all of these belong to this class of software. It is not only typical for data warehouses, but also for any distributed application which is processing vast amount of data, doing non-trivial and very costly computation on it.
Cross join has a bad reputation. It is not that nobody likes it all the time. For example It is OK to use it from time to time. There are even some queries where there is no other way. All of it is totally acceptable, and nobody would complain if it would be only like that. However, cross join has a habit to occur at the least appropriate moment. And once it comes, nothing remains the same. Query usually becomes an order of magnitude slower, and this not something any of you would like to dream of.
Since the 0.162 and 0.157t (Teradata) version of Presto, there is a feature called unnecessary cross join elimination. Si...
In previous post I explained how join works from the user point of view. Now it is the right time to go one step deeper and learn how things are actually calculated. This a very broad topic, so today we are going to just touch upon every join algorithms used in Presto. To understand what they do and when they are used
You know already that on one axis join can be
Join execution is an independent axis. It means that t...
Joins are one of the most important parts of each database and SQL itself. So it was obvious to me that this topic is going to appear very often on this blog. Hovewer, I thought that my first technical article will be about something more advanced, like join reordering or at least cross join elimination. Although, when thinking about what I could write about I realized that I would need to explain the basic terminology, so we could find a common domain language. Probably, most of the things below you know already, but even so I hope this is going to help you structurize your knowledge and find the missing things.