people
Taming of the Subselect: Three ways to use this troublesome SQL Query
Senior Software Engineer/Engagement Manager
Taming of the Subselect: Three ways to use this troublesome SQL Query Senior Software Engineer/Engagement Manager

Subselects can be used in many ways: filtering, retrieving specific data, and creating custom joins. Writing effective SQL using subselects, though, can be frustrating. While SQL is ubiquitous in software engineering and is a valuable skill for testing, data analysis and reporting, nowadays a lot of SQL is abstracted in ORM frameworks such as Hibernate and ibatis, and many projects have adopted No-SQL databases. This means some developers have little or no experience in SQL. While this is not a bad trend, in and of itself, there are still many solid business cases for storing data in a traditional, relational database. With that in mind, lets explore three ways to use this troublesome SQL Query.

The following is a simplified example from the music domain that models Albums, Tracks, Recordings, and People to illustrate the uses of the subselect. Note: the recording table is versioned.

1. WHERE subselect

The most common way subselects are used is to filter data in the WHERE clause. Using subselects in this way allows for a subset of data to be selected from a table in order to filter the values in another table.

The following is an example of getting all the recordings that appear on albums with titles that start with ‘Best of ABBA’.

SELECT *
FROM subselect.recording r
WHERE recording_id in
(SELECT recording_id
FROM subselect.track t, subselect.album a
WHERE t.album_id=a.album_id AND a.title like 'Best of ABBA%')
ORDER BY recording_id, version;

Where clause subselect result set
Note: the same results could be obtained by joining the track and album tables to the recording table.

The features of this query:

  • Filters the recording data down to only those recordings from albums where the title starts with ‘Best of Abba’.
  • This subselect is quick and returns a single column of data that will be used to filter the recording results.
  • The subselect data does not show up in the final results
  • The subselect must only return one column of data
  • Good readability

2. JOIN subselect

JOINs offer another interesting way to filter data via subselect. In this case the subselect is joined to the query. This has the effect of filtering data while providing the query with values returned in the select statement of the inner query. Using subselects in this way allows the query to take multiple rows of similar data and transforming it into a columns.

The following is an example of getting all the recordings that appear on albums with titles that start with ‘Best of ABBA’, but ensuring that the latest version of the recording is returned.

SELECT r.recording_id, foo.current_version, r.title, r.artist, r.duration, t.position, a.title, a.artist
FROM subselect.recording r
JOIN subselect.track t ON t.recording_id=r.recording_id
JOIN subselect.album a ON t.album_id=a.album_id
JOIN (SELECT recording_id, MAX(version) AS current_version FROM subselect.recording r GROUP BY recording_id ORDER BY recording_id) foo
ON foo.current_version = r.version AND foo.recording_id=r.recording_id
WHERE a.title like 'Best of ABBA%' ORDER BY a.album_id;

Join clause subselect for data transforms

The features of this query:

  • Allows for multiple selected columns to be exposed to the main query
  • Provides easy filtering for versioned data
  • Filters data like a where clause
  • Specific use causes

3. SELECT subselect

The SELECT statement is another place we can use a subselect. Unlike the first two subselect examples, the SELECT clause’s subselects do not filter, but they do allow the results to be enhanced. The main thing to watch out for when using subselects is that data returned from the subselect must be returned as one column and one row. This can be done using an aggregation function.

The following combines the join clause subselect and the select clause subselect to return a report of all the current recordings that appear on albums with titles that start with ‘Best of ABBA’, while returning a pipe delimited list of persons attributed to recording.

SELECT
r.title,
r.artist,
r.duration,
r.recording_id,
a.title,
t.position,
(SELECT string_agg(name, ' | ') FROM subselect.person p, subselect.recording_person rp WHERE rp.person_id=p.person_id AND rp.recording_id=r.recording_id)
FROM subselect.recording r
JOIN subselect.track t ON t.recording_id=r.recording_id
JOIN subselect.album a ON t.album_id=a.album_id
JOIN (SELECT recording_id, MAX(version) AS current_version FROM subselect.recording r GROUP BY recording_id ORDER BY recording_id) foo
ON foo.current_version = r.version AND foo.recording_id=r.recording_id
WHERE a.title like 'Best of ABBA%' ORDER BY a.album_id;

In the above case, we are creating a result set that look more like a data report. The SELECT clause subselect is highly inefficient with respect to the JOIN subselect, but it provides us with rolled, delimited data that maybe more efficient than using an application to call back into the database for each row.

Note: it’s important to pick a delimiter that should not appear in the data.

The features of this query:

  • Allows for aggregated data in the select statement that is not easily joined.
  • Very inefficient. A subselect is called for each row returned from the outer query
  • All results must be rolled up into one row and one column.
  • Has very specific, but limited use case

Subselects are a very powerful tool to keep in your engineering toolbox, but knowing their uses and fallbacks are the difference between a fast useful query and a query that will leave you waiting and frustrated.

All examples where written in Postgres and may use functions that are native to Postgres.

Source can be found at https://github.com/djchi82/TamingOfTheSubselect

Alden Hart
Chief Technology Officer
Alden Hart Chief Technology Officer

Alden has more than 25 years of experience in technology development and product management.  He has designed and developed dozens of telecommunications, transaction processing, financial services, avionics, motion control, and other real-time and mission critical systems. He served as the VP of Software Development of Cybercash, the first Internet-based payment system, from start-up through its IPO.  Since then, he has focused on helping both Fortune 1000 and growth stage technology firms architect, build, and deploy first-of-their-kind technology platforms. He works frequently with private equity firms to evaluate potential investment and M&A opportunities.  Alden is also know for his ongoing work in open source motion control and desktop manufacturing; the TinyG project is used by major desktop 3D printers, milling machines, and other innovative products.  He holds a B.S. in Electrical Engineering from Duke University.

Rick Garvin
Managing Director
Rick Garvin Managing Director

Rick focuses on product development acceleration and product management for growth oriented technology companies.  Most often, he performs product & technology assessments and executes turnaround and technology acceleration for companies that create products or use technology as a key component of their service offerings. He frequently works as Virtual CTO, VP of Engineering, or VP of Product Development. He began his career at BTG working in signals intelligence and worked his way up to become VP of Engineering.  After BTG, Rick worked as the CTO for two technology start-ups that had successful exits. Rick holds a B.S. in Computer Science from the University of Maryland.

Frank Oelschlager
Managing Director
Frank Oelschlager Managing Director

Frank brings over 25 years of experience in bringing technology and products to market and building and scaling high performance technical teams.  Prior to joining Ten Mile Square, Frank served as the CTO or VP of Engineering for five successful startups, including SOAlogix, which was sold to SAP. At Ten Mile Square, Frank serves as Virtual CTO and Product Strategist for growth and Fortune 1000 companies. His clients work in a number of different markets, including eCommerce, information security, media, and education.  He also works frequently with clients, venture capital, and private equity firms to evaluate potential investment and M&A opportunities.

Hyun Ho Ro
Engagement Manager
Hyun Ho Ro Engagement Manager

Ho has more than 10 years experience engineering, developing, managing systems and projects. He currently manages projects at FINRA where he provides guidance on project delivery practices. Ho has worked with SiteSmith/MFN, TeleComputing, and UUNET. He is known for extreme golf and riding centuries on his bike. He has BS in Computer Information Systems from American University.

Jason Mao
Engagement Manager
Jason Mao Engagement Manager

Jason brings over 15 years of experience in DevOps where he upholds high engineering standards in operation, security, and development. He has a BS in Computer Science and a MBA from the University of Maryland. At Ten Mile Square, Jason applies systems architecture and engineering expertise to help CTOs and VPs of Engineering reduce cycle time and further modernize their infrastructure into highly scalable and performant technology assets. Prior to joining Ten Mile Square, Jason oversaw IT security responsibilities at NASA and held critical engineering roles in start-ups such as Sourcefire (now Cisco) and SunRocket.

Bill Lenoir
Sr. User Experience and Product Planning Consultant
Bill Lenoir Sr. User Experience and Product Planning Consultant

Bill has spent two decades on user experience design and management, interface design and development, and information architecture. He focuses on problems where the man-machine dialogue is dysfunctional or unclear. Bill has worked with small internet start-ups as well as large, established organizations such as Scripps Networks and FINRA. He is also known as a board game geek. He has a BA from American University in International Studies and also conducted post-graduate work there in International Economics.

Ryan Van Fleet
Ryan Van Fleet
Engagement Manager
Ryan Van Fleet
Ryan Van Fleet Engagement Manager

Ryan is an engagement manager and senior software developer with more than ten years of experience in architecting and build out high performance enterprise applications. As an Engagement Manager, Ryan manages teams that build and deploy large scale SaaS applications. He brings technical expertise in several key areas that are important to our clients, including enterprise transform and load, digital archiving, integration of COTS and open source technologies with custom developed components, and operations development and automation. Prior to joining Ten Mile Square, Ryan worked for T-Rex Corporation as a software engineer to support the management of Disability Cases for the Social Security Administration, including the initial proof of concept installation and operations of an enterprise correspondence system. Previous to T-Rex, Ryan worked for Tessella serving various big pharmaceutical corporations supporting drug discovery and digital archiving. Ryan was also a COTS engineer on a digital archive system that maintains and disseminates the digital records for the Executive Office of the President and the U.S. government. Ryan holds a B.S. in Physics from the University of Maryland, and a Degree in Music Business from Berklee School of Music.