šŸ’¾ Database Optimization Tips for Improved Query Performance

13/10/2023

wchr

wachira

@__wchr

šŸ’¾ Database Optimization Tips for Improved Query Performance

Hey, there programmer!

Got a database that's slower than a slug on sedatives?

Or it's giving tortoise pace energy?

Suffering from joins that make you want to split?

Well, I've got some tips to make you SQL sing!

But before we dig, in a disclaimer:

This guide assumes you've got basic SQL skills down. If you don't know your SELECT from your WHERE, you might want to brush up first. I'll be slinging around terms like indexing, partitioning and materialized views - so make sure you're fluent in SQL or this could get messy!
These practices can help you optimize your database queries and improve performance in real-world applications. However, their applicability may vary depending on the specific use case and database design.

Now let's optimize! I'll be dropping knowledge bombs to make your queries fast and nimble. With my tips, you can tune and tweak your database into a lean, mean-speed machine. Never again will you have to watch those painful loading screens when you run a query - with these tricks, you'll be cruising faster than a cheetah on an open highway!

So rev up those SQL skills and prepare for a high-speed joyride through the query optimization fast lane! This will take your database's performance from zero to sixty in no time flat. Now let's put the pedal to the metal, here's tip number one.

Create an index on huge tables (SQL)

When dealing with large tables, creating appropriate indexes can significantly improve query performance.

Use EXISTS() instead of COUNT() (SQL)

Using EXISTS() is more efficient than COUNT() when you need to check for the existence of related records.

SELECT specific columns over all columns in the table (SQL)

Here are some reasons why selecting specific columns is better than using SELECT *:

  • Performance - Retrieving all columns requires more I/O and memory even if they aren't needed. O(n) vs O(1).
  • Security - Exposing unnecessary sensitive data in columns is a risk. Select just what is needed.
  • Accuracy - All columns may include outdated data if not regularly referenced. Select ensures freshness.
  • Maintainability - Query won't break if new columns are added. SELECT * requires rework when tables change.
  • Readability - Makes the query intent clear by showing exactly what data is needed.

Query using SELECT *:

Better to be specific:

Avoid exposing unnecessary sensitive data:

Only include the required columns:

So in summary, listing columns explicitly improves the performance, security, accuracy, and maintainability of queries while also clarifying intent. SELECT * should be avoided.

Avoid subqueries in WHERE Clause (SQL)

There are a few reasons why it's better to avoid subqueries in the WHERE clause when possible:

  • Performance - Subqueries in the WHERE clause can often perform poorly since the subquery will be executed for each row in the outer query [O(NĀ²) ā€” Quadratic process]. This leads to repetitive execution and slow performance. JOINs are usually faster since they allow combining data in a single pass.
  • Readability - Complex nested subqueries can make the query hard to understand and maintain. JOINs often express the intent in a clearer way.
  • Limitations - Some databases put restrictions on subqueries in WHERE clauses, like not allowing correlation to outer query columns. JOINs don't have these limitations.

Here are some example queries that could be improved by removing the subquery:

Slow query with subquery:

Faster version with JOIN:

Query limited by correlation issue:

Version avoiding limitation with JOIN:

So in summary, avoiding subqueries in WHERE when possible by using JOINs can improve performance, and readability, and remove limitations.

Avoid SELECT DISTINCT where possible (SQL)

Here are a few reasons why it's better to avoid SELECT DISTINCT where possible:

  • Performance - DISTINCT requires sorting and eliminating duplicates, which adds overhead. O(nlogn) sorting complexity.
  • Readability - DISTINCT can obscure the query's intent and make it less clear.
  • Maintainability - Using DISTINCT may cover up poorly structured data or queries. Better to fix the root cause.
  • Accuracy - DISTINCT eliminates duplicates but also removes valid repeated values. This can distort data.

Better alternatives:

  • Use GROUP BY instead of DISTINCT to aggregate data. O(n) complexity.
  • Filter in the WHERE clause to remove duplicates.
  • Return unique column values only, like primary keys.
  • Use UNION to combine multiple distinct queries.

Example with DISTINCT:

Better with GROUP BY:

DISTINCT obscures the root issue:

Better to unique constraint and join:

So in summary, DISTINCT can hurt performance, understanding, and data accuracy. Better to use specific filtering, grouping, constraints, and joins instead.

Avoid SELECT DISTINCT where possible (SQL)

Here are some reasons why using WHERE instead of HAVING can be better:

  • Performance - WHERE filters rows before aggregation, reducing work. O(n) vs O(n^2).
  • Readability - WHERE filters on raw data, which is more intuitive. HAVING filters after aggregation.
  • Simplicity - WHERE works with all query types. HAVING only with GROUP BY queries.
  • Accuracy - HAVING conditions apply to grouped/aggregated data, not raw data. Can distort analysis.

Example of inefficient HAVING:

More efficient with WHERE:

HAVING obscures intent:

Clearer with WHERE:

So in summary, WHERE filters earlier, improves performance, keeps queries simple, and makes intent more clear. HAVING has niche uses for aggregated data but WHERE is better for general filtering.

Create joins with INNER JOIN(not WHERE) (SQL)

Here are some reasons why using INNER JOIN is better than filtering in the WHERE clause:

  • Readability - JOINs clearly show the connections between tables. WHERE clauses mix filters with join logic.
  • Maintainability - It's easier to modify joins when they are separated out. JOIN logic in WHERE is harder to change.
  • Flexibility - JOINs support outer joins and different types of connections. WHERE is limited to implicit inner joins.
  • Performance - JOINs allow the optimizer to better select the join order and strategy.

Example join in WHERE:

More readable with JOIN:

JOINs are flexible:

JOINs give the optimizer control over the execution plan. WHERE locks it in.

šŸ§  execution plan is a detailed blueprint of how the database engine will execute a specific SQL query

So in summary, JOINs improve readability, maintainability and flexibility while also allowing the optimizer to choose the best performance plan. WHERE clauses mix concerns and limit options.

Use LIMIT to sample query results (SQL)

Here are some reasons why using LIMIT to sample query results is beneficial:

  • Performance - LIMIT caps the result set size, reducing query time and resources needed. O(n) vs O(log n) or O(1) with LIMIT.
  • Responsiveness - Limits ensure queries return within a predictable time frame, keeping the app responsive.
  • Cost - Sampling reduces compute and memory costs for expensive queries on big data.
  • Accuracy - Full result sets with duplicate data can bias results. Sampling gives a more precise analysis.
  • Prototyping - LIMIT lets you quickly test queries during development before applying to the full dataset.

Query without LIMIT:

Faster sampling with LIMIT:

LIMIT improves responsiveness:

LIMIT makes queries affordable:

So in summary, LIMIT allows fast sampling from large data sources. This improves performance, cost, accuracy, and development speed.

Use UNION ALL instead of UNION (SQL)

Here are some reasons why UNION ALL is preferable to UNION:

  • Performance - UNION ALL just concatenates results. UNION sorts/deduplicates. O(n) vs O(n log n).
  • Accuracy - UNION removes duplicates, altering the true combined dataset. UNION ALL keeps all values.
  • Intention - UNION ALL makes it clear no deduplication is intended. UNION implies wanting distinct rows.
  • Flexibility - UNION ALL allows counting duplicates if needed. UNION always removes them.

Query using UNION:

Faster with UNION ALL:

UNION ALL keeps duplicates:

So in summary, UNION ALL improves performance by skipping deduplication. It also provides a more accurate combined dataset and clearer intent compared to UNION.

Use UNION where instead of WHERE ... or ... query (SQL)

Here are some reasons why using UNION can be better than OR conditions in WHERE clauses:

  • Performance - UNION results can be indexed/optimized separately. OR conditions often limit optimizations.
  • Readability - UNION makes it clear multiple separate queries are being combined. OR mixes unrelated conditions.
  • Maintainability - It's easier to modify one part of a UNION query without affecting others. OR clauses are tightly coupled.
  • Flexibility - UNION can utilize different WHERE, JOINs, sorting etc. in each query. OR has fixed logic.

Query with OR condition:

More optimized with UNION:

OR conditions can get complex:

Easier to maintain with UNION:

So in summary, UNION allows splitting the query into separate optimized parts, improving readability, flexibility and maintainability.

Raw Queries vs ORMs (General)

When Raw SQL trumps ORM abstractions

Object-relational mapping (ORM) frameworks simplify database communication within your application. However, for highly complex query logic or performance-critical sections, writing custom SQL statements is often more efficient. ORM abstractions may not optimize query execution, and hand-tuned SQL offers total control for maximum efficiency. When speed is crucial, customized SQL can outperform generic ORM queries.

When ORM abstractions trump Raw SQL

ORMs are not all that bad, they provide substantial advantages over raw SQL for database access in applications. By abstracting away the gritty details of interacting with a relational database, ORM enables reusable, database-agnostic code focused on business logic rather than data access. It enforces consistent style and safety best practices like input validation and SQL injection prevention that are easy to overlook with raw SQL. ORM handles many concerns like relationships, lazy loading, caching and migrations automatically to reduce boilerplate code. The tradeoff is some reduction in control and performance vs. tailored SQL queries.

However, for most applications, the productivity and portability gains of ORM far outweigh the costs, leading to cleaner and more maintainable code. With its emphasis on convention over configuration and abstraction of data access, ORM produces simpler code that focuses on what matters - delivering the required business functionality in a safe, consistent way with minimal hand-written data plumbing.

Optimizing database performance with custom SQL functions (General)

Unless you are using these databases SQLite, Microsoft Access, Microsoft SQL Server Express, Firebird and HSQLDB, most modern databases support UDFs(User Defined Functions).

Custom functions can be used to improve application performance, some examples include:

  • Summarization: pre-aggregating into a summary function is faster than querying individually.
  • Caching: One can establish a bespoke function tailored to the caching of results derived from a particularly intricate and resource-intensive database query.

Conclusion

That's it for now, I will be updating this blog on how you can make database interaction with your application more performant.

On another note, I built a cool CLI tool that will hopefully make your developer experience working with SQL databases much better

Check it out