Press "Enter" to skip to content

Select Indistinct

Wrapping my head around object level permissions with Django

A quick aside, if you read my SQL stuff – I have been working on a side project with python, Django (a web framework), with a focus on bringing some sanity to recipes.

For the site I needed to implement a system where a user owns recipes, and other users cant modify them.

In Django there’s a built in auth model that supports both user registration and model level permissions(yay!), but it means that naively all users could edit all recipes(boo!)

I did some quick searching because I was hopeful that I could reuse some of the well thought out components already available.

The big options I found were django-guardian, Rules, django-rest, or figuring it out myself.

Checking in with Rules, it looks like its not so much a Django specific implementation as much as a way to implement rules + predicates to get things done, seems interesting but its not clear from the older examples provided how its supposed to work with a modern Django – I am going to skip it for now.

Checking in with django-guardian, I am seeing better examples that are relevant, but truthfully I need 1 owner as a permission.

On the face of it, I feel like a lot of these systems are really complex for my purposes – its cool to be the ultimate system but I just need a system that provides:

  1. Users which have permission to objects.
  2. Permissions means you have all roles on the object (edit/delete.)
  3. Each object that is ownable on creation must have a creator/author, who when its assigned can automatically edit/delete the thing.

Since I don’t need a complex permission system, I don’t care about meta-magic, and I just want to design a straight forward set of gates to manage, I decided to go off on my own and work on implementing an author that’s checked.

I added an author field to my model, the current user sets it in the model if its not already set, and then I use the is_current_authenticated_user() function from the django-currentuser package to tell if this matches the current request.

I also implemented a simple map behind the scenes that connects forms to models to pages so that the add/edit functions can just do their work and pass the correct set of forms back to the correct set of pages – I will have to keep looking because it wasnt clear how to infer the model’s form class automatically without some mightier python than I current posses.

I disable the form fields for niceties in the UI if you don’t own the object (so you dont accidentally try and edit it), but to be sure I also prevent saving on the backend if things don’t match up, and there’s an added bonus of free CSRF built into Django.

And so after all that I have straight forward solution without even touching the rules code – it took me more time to noodle over third party options than just set down and write it.

Recipe Scraping with Python

I am working on a new project with the aim of scraping a wide variety of data sources, with goals as various as arrest data and recipe websites.

I am going to catalog the project’s process through the various issues, and try to encapsulate as much as possible so that if there are components other’s want to use they can.

Here’s the process I am taking:

  1. Decide on a language.
    • What languages are well supported for web scraping?
    • Plenty, but of them I am the most familiar with python.
  2. Decide on feature set.
    • We are going with a headless HTML scraper for now, javascript will be supported in future versions.
    • We need to populate some basic datapoints for a recipe, but we’ll progressively enhance sites that don’t have the things we want.
  3. Build initial targeted list of sites to scrape
    • This was done by my partner in crime, more on that later.
    • >200 sites listed to target specific delicious recipes.
  4. Find some useful resources for scraping sites with python.
  5. Construct simple loop for testing/validatign data.

This Sunday’s execution:

  1. Get an isolated VM up.
  2. Install python3 and python3-venv on debian.
  3. Find there’s actually a few good tools which implement this in a more general way: https://github.com/hhursev/recipe-scrapers
  4. Test that out on a specific site (sorry, none of those links just yet!)
  5. Find that recipe-scrapers actually supports it… automagically, even though its not on their supported site list.
  6. Well… that was easy.

So I didn’t expect a great package to be available out of the box for the problem at hand, so kudos to these wonderful devs.

For my next post I will test combining spidering and downloading to create a cohesive “cookbook” from a target site.

Converting TIMESTAMP to VARCHAR in TSQL or Replacing fn_varbintohexstr

Just a quick one in case I forget – I had a scenario where I needed to replace fn_varbintohexstr in converting an existing product to something more Azure SQL DB friendly.

I did some googling and didn’t like the results, especially since most of them are returning NVARCHAR(MAX) instead of what they generally should – just VARCHAR(length of data) and in the case of my needs (timestamps) the value was always 18 characters.

The key was finding the conversion to binary first, and then the use of the ,1 param for convert.

select convert(varchar(18), convert(varbinary(18), row_version), 1)

That’s all folks.

Dynamic UNPIVOT Reporting on Cardinality.

Today I was diving into some extremely wide tables, I wanted to take a quick look at things like “How many unique values does this table have in every column?”.

This can be super useful if you have a spreadsheet of results or a schema without effective normalization and you want to determine which rows are the “most unique” – or have high cardinality.

The Github gist is embedded at the bottom of the page, but I will run you through the code in case you want an explanation of how it works

The procedure takes a schema and a table/view name and combines them – I could ask the user to do this for me, but there’s parts where its useful to break them up and I dont want to mess with split logic :p

The procedure is defined as a temporary stored procedure as I didnt want to leave a trace when I disconnect.

I then define the dynamic UNPIVOT statement, which we will feed our dynamic column listing from.

Gather our columns (twice) – once for the COUNT(DISTINCT), and once for the use in the columns we want to UNPIVOT.

Here’s an example of running it against a tsqlt testresult table:


As we can see, the tsqlt testresult table has one class, two messages and results, and unique everything else (so that’s where I will focus.)

I also utilize a @magic_value variable in the code (not shown) which I use to deal with the fact that a NULL value in an UNPIVOT statement wont count as an occurrence. I want to disambiguate from NULL and any particular value that might occur, so using something like -1 or some string NULL would be inappropriate.

That’s it for today!

Grab the full code below –

Postgres Plugin Support Added in Azure Data Studio!!

I know I have been writing a lot about ADS recently, but this is even bigger than the Notebook announcement.

A Postgres plugin has been announced in the insider release of ADS, and it just works!

If the term Postgres is unfamiliar – PostgreSQL is one of the preeminent open source database solutions and is showing wide adoption due to its quality and of course, price.

For those of you in the Postgres world, you already have probably bounced between a few SQL clients and if you are like me, not been totally impressed with the results.

For those of you who are SQL Server Data professionals and live in the comfortable world of SSMS and ADS, you may not know that a significant part of the world still runs queries on the command line, or is forced to rebuild their own tool to meet their needs in each language de jour.

I have been working on some side projects with Docker, Postgres, and a few other pieces of tech, so this announcement was perfect for me to test it out.

Configuration

Just like my post on Notebooks, I am going to download the most recent release from the Insider’s build, but I am going to keep the screenshots a little more brief.

Make sure you enable those preview features!

Jump to the extensions section and find the Postgres plugin. (Sidebar button)

I used a local connection to a docker container hosting my Postgres instance and Azure Data Studio connected the first try!

Intellisense works great!

Notebooks instantly inferred my context from the Postgres connection! Sweet!

Even the explain button pulls the query plan ūüėÄ

I love this direction for Azure Data Studio and I hope we see more open source databases supported in the future!

Update: T-SQL Notebooks in Azure Data Studio

It’s only been ten days since my first post about ADS Notebooks, and the team has already pushed out some super useful updates.

Now launching an Azure Data Studio Notebook now is almost instantaneous. You are not prompted to install the other kernels unless you select them.

When reopening a file, you no longer are prompted to re-install a kernel and attempt configuration.

Displays Azure Data Studio Notebooks behavior when reopening a notebook

Big thanks ADS team, Notebooks look to be shaping up to be a super cool tool!

T-SQL Notebooks in Azure Data Studio

Update: This blog post is already out of date due to the hard work of the ADS team! Check my Azure Data Studio Notebook Update for more details of why you can skip my warnings about your internet connection.

I have been waiting for word about the new Notebook functionality in Azure Data Studio, and when I heard it was available in the insider build, I jumped in to take a look.

A Jupyter Notebook is a web application that allows you to host programming languages, run code (often with different programming languages), return results, annotate your data, and importantly, share the source controlled results with your colleagues.

Alright, here’s what I did:

  • Grab the insider’s build from Github.
  • Install Azure Data Studio (ADS.)
  • Create a new Notebook (and have it download some stuff.)
  • Run some queries!

Keep in mind I am talking about the Insider’s Build as of as of Thursday March 7, 2019 – if you download it in the future changes are likely to have occurred.

If you want to follow along, make SURE you have a solid internet connection available, planes do NOT count.

Jump to the ADS Github page

Scroll to the latest insiders builds:

Extract or install, and launch Azure Data Studio.

Keep in mind you may want to disable updates when running the insider version, as it will try to update to the stable channel version. Thanks to Shawn Melton for the tip!

Install the tools by choosing File -> New Notebook.

The install process will warn you this will take awhile to download and configure the Jupyter environment (which includes python and a few other dependencies.)

Every time you create a new notebook, you are going to download the required dependencies from scratch – this is to ensure you have an isolated environment per notebook, but its worth keeping in mind for now.

Go get some coffee, its going to be a bit.

Maybe time for that second cup.

When our new notebook pops up, you should immediately be offered an option for a Code(TSQL) or a Text(markdown) box.

I wrote two example queries and added an animated gif (mostly because I was watching Chris on the SQL Tools team doing the same on the sqlbits presentation for ADS https://sqlbits.com/Sessions/Event18/Introducing_Azure_Data_Studio )

All the data you return is saved into your notebook, so you can see the same results as I did if you download my notebook, and you can replay it in your environment to see if it is the same.

Going to end it there for now, but I am excited to see what people will begin passing around in notebooks for debugging, training, and demonstration purposes!

Some issues I am going to be keeping my eyeballs on the next few weeks:


Removing non-numeric characters from a SQL Server field without UDFs or Regex

A great question came up in the SQL Community Slack today:

I saw this as a great modification on my earlier post, and wanted to show another way to implement the same solution.

Walking through the steps:

First, use sys.objects as our example “target” and assume the string has less than 2024 characters (spt..numbers isnt really reliable past that point.)

We substring out each character and compare it to the list of items we accept, and then collapse the list back in on itself.

At the end, I pull out sys.objects for comparison so you can see what sort or results you would get. Enjoy!

Short and Sweet – Implementing simple character masking without UDFs

I’ve been on a bit of a hiatus with the move to California and figuring out my new life here.

I needed some inline TSQL to mask a string based on a simple masking input, like ‘##xx##’ where # would return data and X would return X.

Here’s the code, actually a bit simpler than I thought it would be.

The first table would be the one you would be using (I used sys.objects as an example), and if you wanted to store a masking string then you’d just want to make sure it was joined and projected in the substring instead of the @formatstring.

How long did that last PowerShell command take?

Today¬†Thomas Rayner’s post on Get-History reminded me of a one liner I use to calculate time spent on the last command.

(Get-History)[-1].EndExecutionTime - (Get-History)[-1].StartExecutionTime
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 87
Ticks             : 870022
TotalDays         : 1.00696990740741E-06
TotalHours        : 2.41672777777778E-05
TotalMinutes      : 0.00145003666666667
TotalSeconds      : 0.0870022
TotalMilliseconds : 87.0022

 

You can select any property from the output and get just the TotalSeconds, but I like this simple output for when I have to leave some work in progress and I need to come back and check some time in the future.

If you are confused by this code and want further explanations, keep reading!

A typical Get-History call returns the following to Get-Member.

PS C:\temp> Get-History | Get-Member

   TypeName: Microsoft.PowerShell.Commands.HistoryInfo

Name               MemberType Definition
----               ---------- ----------
Clone              Method     Microsoft.PowerShell.Commands.HistoryInfo Clone()
Equals             Method     bool Equals(System.Object obj)
GetHashCode        Method     int GetHashCode()
GetType            Method     type GetType()
ToString           Method     string ToString()
CommandLine        Property   string CommandLine {get;}
EndExecutionTime   Property   datetime EndExecutionTime {get;}
ExecutionStatus    Property   System.Management.Automation.Runspaces.PipelineState ExecutionStatus {get;}
Id                 Property   long Id {get;}
StartExecutionTime Property   datetime StartExecutionTime {get;}

Ignoring the methods (as they are common) we see we have access to some useful properties:

  • The CommandLine¬†which contains the text typed in the console.
  • The ExecutionStatus¬†which tells you if your command was successful.
  • The StartExecutionTime¬†and EndExecutionTime, which store the start time and end time of your running command.
  • The Id, which is just a sequential integer indicating its order in the history.

As is common in PowerShell, even though we see the list of members and properties the information actually only represents one item in the list of objects the command returns, one for each history row.

If you want to see an example this, we can use PowerShell’s comma/unary operator to roll our multiple elements into one bag of elements.

PS C:\temp> $many = Get-History # Assign all our Get-History rows to $many
PS C:\temp> $one = , (Get-History) # Prevent unrolling the items with the , 

PS C:\temp> $many.Count # I ran 18 previous commands when this was assigned
18
PS C:\temp> $one.count # This is a bag holding 18 items
1

This technique was new to me when I first started PowerShell, but a useful trick if you want to write a function that returns an object representing the entire list, instead of returning a series of elements. If you want to learn more about the Comma operator, check the documentation in about_Operators.

The next step is to ask for the first element of a list, which PowerShell supports with bracket notation (list)[element_number], with 0 being the first number (as all sane programming languages choose.)

We use dot notation to access the properties of EndExecutionTime and StartExecutionTime, once each for our specific list item, (list)[element_number].property.

Lastly, we subtract the properties from one another (list)[element_number].property Р(list)[element_number].property, which PowerShell determines behind the scenes is two dates being subtracted, performs the required math, and usefully returns a useful System.TimeSpan type representing the time between the two dates.

Special thanks to Contributing Editor John G Hohengarten for his thoughtful additions to this post.