Category: Short and Sweet

“I don’t really get the point of a right join if you can just do the same thing with a left join”

February 4, 2025

I was talking to a friend of mine and they are learning some SQL and they said something that I have seen come up multiple times in learning SQL.

They said “Yeah, I need to study the join types more. They make sense to me but I want to be able to not reference my notes” and also “I don’t really get the point of a right join if your can do the same thing with a left join by just switching the table name.”

These are great points, and common questions that occur when first learning SQL.

Before I talk about LEFT/RIGHT JOINs, I wanted to mention that in my experience, idiomatic SQL does not contain RIGHT and OUTER joins, ignore them.

That might be contentious among folks, but unless you have a large amount of code written in a specific way where a RIGHT or OUTER join is the only thing that makes sense, fine, you’ve found the way, but generally you probably just want a subquery or CTE to encapsulate your logic and maintain “one way” of thinking about the problem.

The second piece of the puzzle that helped me a lot was understanding that all JOINS are just specialized versions if CROSS JOINs that help communicate your intent.

If we start with the following example, we’re making sure that every row in left and every row in right will touch. A classic CROSS JOIN.

SELECT *
FROM left AS l 
CROSS JOIN right AS r

Now, a Cartesian product is rarely useful (unless you are making a numbers table!) but its the basis for the next step, a predicate (or WHERE clause) which allows us to restrict to only useful rows on the left hand side of the query.

SELECT *
FROM left AS l 
CROSS JOIN right AS r 
WHERE 
    l.id = r.left_id AND 
    r.left_id IS NOT NULL

Some knowledgeable folks in the audience might be confused why I would filter on r.left_id IS NOT NULL (given the original condition) but some SQL dialects match on NULLs!

This CROSS JOIN is equivalent to a LEFT JOIN that looks like

SELECT *
FROM left AS l 
LEFT JOIN right AS r ON 
    l.id = r.left_id

The same applies to a RIGHT JOIN – just swap the main characters.

The INNER JOIN is even easier, it just makes sure there’s no unmatching values on both sides of the equation.

Now, this isn’t hugely different, but being able to call out LEFT LEFT LEFT LEFT INNER LEFT when you are reading through a long procedure is a critical part of reading SQL, swapping between RIGHT LEFT INNER all the time will absolutely make a difference in the mental model of your reviewer, and expect them to call you out if you use a RIGHT JOIN for anything but fun.

I hope you learned something from one of my early SQL insights, hopefully it can help you understand why there are multiple join types.

Choose A Simple System Over Most Other Things

January 31, 2025

Over the last decade, I’ve spent my time fixing performance issues, untangling conceptual models, cleaning up pipelines, and generally taking things apart so I can put them back together to go fast. Through all of it, one truism remains: systems that are simple to reason about are easier to maintain, debug, and scale.

I’ve been told this is obvious. Yet, every company I walk into has a mess of code—a tangled, aging rat’s nest where yesterday’s quick fix has hardened into today’s critical infrastructure. Grain by grain, small oversights compound into load-bearing pearls of complexity that nobody dares touch.

Often, the original developers are still around. They understand the improvements that could be made but lack the time, authority, or appetite to rewrite the system that’s keeping the lights on. And when I dig deeper, it’s rarely a personal failure—it’s a systemic one. The complexity is tolerated, even defended, because addressing it is seen as an unjustifiable cost.

So what can you do? The only real solution is prevention.

Guardrails Against Complexity:

Treat any one-off complexity as a blinking red warning light.
Before solving a problem with an overly complex solution, ask: Can we avoid solving this entirely?

Technical debt isn’t just about writing bad code—it’s about making bad bets. And the easiest way to win is to stop playing losing hands before they’re dealt.

Reading about Python’s Poetry

October 9, 2023

Poetry bills itself as “Python packaging and dependency management made easy” – I will dive in a bit more…

Installing and configuring

Poetry requires Python 3.8+ (that’s a lot of missing Python)
Has the classic “fun” insecure installer approach by default curl -sSL https://install.python-poetry.org | python3 – not my favorite, but it looks like Poetry has its own bootstrapping problem right now.
Uses the pyproject.toml format for configuration, with the tool.poetry, tool.poetry.dependencies, and build-system are the most important starting fields.
- Poetry has a pretty well-thought-out use of the pyproject format.

Commands and Usage

Initializing an existing project is as easy as poetry init it will ask you questions about your package and get your boostrapped.
You can add packages with poetry add and poetry remove, first blush it feels like I am using cargo.
- Upon upgrade/change Poetry removes old packages.There are a few GitHub issues about it, so if you are Windows and you want a faultless experience you might want to skip Poetry for now.
- This will also resolve the package versions to ensure compatibility – a clear positive knowing your packages work out of the box together, but I have burned by Conda before, so a package solver always gets some side-eye.
poetry install grabs the packages and installs them in your environment.
poetry update Gets the latest versions of the dependencies and write them to your lock file.
poetry run runs the command within the current virtualenv.
- This combines with the tool.poetry.scripts section of the pyproject file – you can define a file to run and then poetry run special-command to run your special-command.
poetry shell spawns a shell in the virtual environment (really useful for testing random stuff).
There’s a few more commands around lock files and more esoteric needs for the build system, so I will stop there for now.

Other interesting differences from a simplified pip env

Poetry is much more active in managing environments than a simple pip+venv setup, and actively takes steps to activate/validate the version of Python and your current environment when running code.
There’s a bit more on the type of build tools you can emit, versioning, and dependency groups which you generally wouldn’t have in the simpler tooling modes.

Final Thoughts

Overall my first blush with Poetry is that it’s a very cool tool (if you are not on Windows) and that it definitely seems that once it’s set up. It seems like you’d have more luck getting new packages added to existing projects without the “fun” of Python packaging issues arising suddenly in the wild (or hopefully your full featured test suite)/

Because of the file deletion issue (and me on Windows most of the time) I am still going to be sticking to pip+venv. Any add/remove command has about a 50% chance of going south for the boxes I am using.

A Simple 5x Speed Up With My Django Testing

January 10, 2023

the statistics in pycharm showing that the built-in method of _hashlib.pbkdf2_hmac is taking 36.4% of the time — PyCharm Profile Stats

More than a third of the time was taken with a hashlib function. My current testing regime doesn’t take long (about 3 seconds on my slow machine) but any iteration time is precious when you are working on your side project.

Before I wax poetic, here’s the changes you’d make to your project’s settings.py

 
# This is only needed if you don't already have a PASSWORD_HASHERS list in your settings 

PASSWORD_HASHERS = [
    'django.contrib.auth.hashers.PBKDF2PasswordHasher',
    'django.contrib.auth.hashers.PBKDF2SHA1PasswordHasher',
    'django.contrib.auth.hashers.Argon2PasswordHasher',
    'django.contrib.auth.hashers.BCryptSHA256PasswordHasher',
]

# DO NOT USE INSECURE HASHING OUTSIDE OF DEBUG MODE OR YOU WILL GET HACKED
# All of your data will be stolen and all of your good works undone 
# Avoid having your company added to this list https://haveibeenpwned.com/
if DEBUG:
    PASSWORD_HASHERS.insert(0, 'django.contrib.auth.hashers.MD5PasswordHasher')

So what’s happening here?

Context

If you’re fairly new to Django you might not know the settings.py file controls the general configuration of your application, and defines arbitrary values available from the settings module (a really useful feature!)

The DEBUG value is set for test environments (earlier in the file) based on environment variables that I control, if you want to learn more check out the DEBUG documentation.

Most values that could exist in your settings file have sane defaults, but it can be a bit confusing that not everything is there at once. If you dont have a PASSWORD_HASHERS list in your settings Django will pick whatever the “right” option is.

In this case we’re defining the standard items and then inserting a new default hash option in the list (during DEBUG mode only.)

This sets PASSWORD_HASHERS to reference a very fast and very weak (full list here).

Using the PyCharm test UI I found my slow machine testing went from 2987ms to 526ms, and improvement of >5x!

The observer effect is in play for the statistics but we still show the entire hashing process gone from the stats:

PyCharm profiler statistics showing no hashing algorithm in the top items and a much faster result — PyCharm Profile Stats … Much Better

It’s worth repeating – don’t run insecure hashing such as md5 algorithms in production ever. It’s the difference between your password being cracked in seconds and making it impractical for decades or centuries.

It may seem weird, but being purposefully slow is an important feature of cryptographic hashes that you should not attempt to defeat.

If you want to learn more about how cryptographic hash functions work check out Practical Cryptography For Developers.

Fixing – Error: “PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

June 16, 2022

During my attempt to connect PyCharm to a SQL Server data source with Windows Authentication and ran into a slew of errors. Upon searching I noticed there was a dearth of resources on this issue.

Some posts mention messing with your certificate store but with a local sever I dont have the interest in configuring certificate based auth.

From experience I knew a security issue reported in SQL Auth generally is related to the certificates/encryption settings on the connection, often turning it off works fine (don’t do this in production.)

I found nothing useful in the SSL/TLS – clearly the standard “uncheck encryption” trick isn’t managed here.

Looking further I found the Driver: section, here the MSSQL driver settings I needed are available.

picking advanced/driver settings

Click the Advanced tab. clicking the advanced tab in the driver settings

Update integratedSecurity to true and encrypt to false. changing pycharm driver settings

This fixed my authentication issues with the local server. connected window

Removing Microsoft’s News In The Taskbar

June 8, 2021

So in a “not shit crap again” post – MSFT releases a “News” update to take up space on your taskbar and just allow Microsoft to serve you more ads.

The boiling of the frog of the modern treatment of privacy is so annoying, so here’s the registry path to disable it.

Cool that you get to open the registry instead of just get presented a “Would you like a new informative tool by Microsoft?” It’s almost like they know that NOBODY wants this.

Remember – dont just randomly run registry files – they are plain text documents interpreted in a special way – so crack them open and make sure you feel comfortable running them.

You can review/download the data here:

How do companies handle blue green deployments with their SQL Server Database?

May 5, 2021

An interesting discussion/question in the SQL Community Slack today arose around how to implement blue/green deploys.

If you’re not familiar – blue/green refers to a deployment strategy with at least two hosts of your services, where you host in the green, deploy to the blue, and slowly drain the green to the blue until it becomes the green.

This has consequences in terms of keeping the lights on for both services, potentially rolling back the traffic to the green node (if the blue deployment fails some tests) and identifying things like dead code/data paths.

I was pondering how to answer the question in anything but the most generic way when this youtube video by Kevin Feasel was posted and it’s such a great resource I’m reposting it.

Thanks @reid and Kevin!

Fixing Unicode Conversion Issues in XML documents, TRY_CONVERT returns question mark instead of NULL

April 7, 2021

An interesting question asked by @danthesqlman in #sqlhelp (sqlcommunity.slack.com)

Having issues with Unicode in my XML, tried using a try_convert(varchar,fieldname) but not returning NULL.
Set it to have a test on my box, and weird results.
declare @n nvarchar(10) = N’ניקודות‎’
select try_convert(varchar(10),@n)
This doesn’t return NULL, but ?????????
I’m curious what would I be doing wrong, or how can i locate unicode within XML easily

And then when people suggested individual character shredding –

XML documents in a table over 200k rows, 2mb xml each, could take hours to parse 1 character at a time

There were a few suggestions, (my initial crap one was just dumping it to C#), but after a few jokes back and forth about how SQL Server was just returning normal question marks for TRY_CONVERT and how silly that was the idea came up… why not just:

Replace all question marks with something unique? (I suggested a GUID)
Run the conversion and then do a reverse replace, updating the data in place.
Profit!

For a simple code example…

DECLARE @magic_value UNIQUEIDENTIFIER = NEWID()
SELECT 
  TRY_CONVERT
  (
    VARCHAR(100),
    REPLACE(tar.name,'?',@magic_value)
  ) 
FROM target_table AS tar

Any new question marks that exist in the output would be characters that failed the conversion process.

The test ran in ten minutes instead of a few hours… great!

Also a fun followup on weird SQL Server homoglyph conversion issues in general – https://bertwagner.com/posts/how-unicode-homoglyphs-can-thwart-your-database-security/

Disable powershell update nag in one line

March 20, 2021

[System.Environment]::SetEnvironmentVariable("POWERSHELL_UPDATECHECK",0,[System.EnvironmentVariableTarget]::User)

To be clear – I think you should be updating your PowerShell regularly, however the HUGE WHITE BLOCK ACROSS MY ENTIRE SCREEN EVERY TIME I LAUNCH VISUAL STUDIO CODE ISN’T GREAT.

Hated that caps? Yeah, that’s basically my eyes every time I see this nag window inverting the colors across my ennntiiirrreee screen.

I checked the PS repo and some one liners posted didnt work (and used SetEnvironmentVariableTarget which was not a method I had?), so I wanted to make this easy in case you are getting frustrated with the PowerShell update version check message and you want it to go away and didnt want to crack open the environment variables.

Now go update your PowerShell 🙂

Useful Django Bits

January 30, 2021

I have been busy working on some other non-SQL related side projects recently, and I wanted to note some of the pieces of code I have been appreciating recently.

https://github.com/PaesslerAG/django-currentuser is a simple plugin that allows you to reference your current user context in your models various functions. This greatly simplified some user management functions within my codebase, as I could express it all in the model.

https://www.django-rest-framework.org/ is a powerful framework on top of Django that allows you to build a straightforward rest framework. Django doesn’t have object based permissions and I have been building out the next version of my codebase with it, its definitely a lot more pluggable than anything I designed.

https://pythoncircle.com/post/439/server-access-logging-in-django-using-middleware/ an easy way to track user access – one migration adds a log to your table, and you get whatever you want out of each request flow. Be careful that you follow your GDPR/CCPA guidelines!

https://github.com/pennersr/django-allauth is something I have been investigating but it seems a bit much for my goals, I will come back and update more about this soon.