We’ve finally launched our new platform, and, while things are not perfect, we’re in a place that gives me a little time off to do something other than work work.
I’ve been using a lot more SQL lately to work with our Postgres instance, and it’s made me realize how much better I could be - more efficient queries (read: understanding how the queries work), shorter queries, a better understanding of how to pull out all of the data I want in the fewest number of queries.. the list goes on.
In order to address this, I’m continuing a book I took a pause on. It’s called Curious Moon, a fun take on learning SQL, where you’re placed into the role of a intern at a space consulting agency whose task is to wade through some data on Enceladus after the former DBA dropped a bunch of data…
If you’re skeptical that such a book could be useful, know that the author advocates against ORMs and for using
also starting with importing and cleaning up data from a CSV rather than downloading a pre-formed DB. Point is: this is not an “easy mode” book,
rather it tries to instill what I would consider best practices based on my own experience.
stepping off the soapbox
With that said, I’m cataloging some of the tidbits I’ve learned from the book here:
like but like, case insensitive
WHERE title ~* '^A\d.*?': how did I not know that I could use regex like this before?
to_tsvector(col): create a fully text-searchable index on a column. Use
where col @@ to_tsquery('txt'), for example, to search the field. More operators and functions available; of note
websearch_to_tsquery function could be interesting if building a “web searchable” app that accepted more natural web queries. The internals are also pretty interesting:
to_tsvectorfunction internally calls a parser which breaks the document text into tokens and assigns a type to each token. For each token, a list of dictionaries (Section 12.6) is consulted, where the list can vary depending on the token type. The first dictionary that recognizes the token emits one or more normalized lexemes to represent the token. For example,
ratbecause one of the dictionaries recognized that the word
ratsis a plural form of
rat. Some words are recognized as stop words (Section 12.6.1), which causes them to be ignored since they occur too frequently to be useful in searching …
Note: After playing around with this a bit, behavior is not exactly as expected. Example:
drop view customers; create view customers as select to_tsvector(concat(u.first_name, ' ', u.middle_name, ' ', u.last_name)) as name from users u; select * from customers where name @@ to_tsquery('joe'); name --------------------- 'camel':2 'joe':1 'boateng':2 'joe':1 'joe':1 'ledford':2 (3 rows)
After using it a bit more, I’ve realized that this field is not necessarily meant to be used to read, but to simply query on. For example, query on
name but only as a way to get to other columns in the rows, like
group by: I’ve used count and group by before, but this solved a small problem I’ve been mulling over in the back of my head:
count the number of occurrences of a common attribute among rows. For example, if you had services that could be part of a package offering, and those services referenced a package, you can find the number services sharing the same package with:
select count(1) as services, package_id from services group by package_id;
Note: You can also use
having to narrow results on the count, e.g.
select count(1) ... having count(1) > 1;
concat to search even more!
to_tsvector( concat(events.description,’ ‘,events.title) ) as search
\H: output as HTML!
Sargeable: “Search ARGumentABLE” queries can be optimized. Basically, can the query planner optimize search if you had an index on a column. For example, if the column
description had an index on it,
final% could be optimized, but
%final would result in a sequential scan; the former is sargeable, the latter is non-sargeable.
Apparently I can use aliases in
order by but not
where clauses because the
where clause is evaluated before the
select statement, but
order by is evaluated after.. which makes sense. Filter then order.
Data science and analytics is about finding patterns and exploiting relationships that, at first, don’t seem apparent. Before you do any of that, however, you have to have good data. The whole process is like digging for gold: the first problem is knowing where to look.