December 2020 – Select Indistinct

A quick aside, if you read my SQL stuff – I have been working on a side project with python, Django (a web framework), with a focus on bringing some sanity to recipes.

For the site I needed to implement a system where a user owns recipes, and other users cant modify them.

In Django there’s a built in auth model that supports both user registration and model level permissions(yay!), but it means that naively all users could edit all recipes(boo!)

I did some quick searching because I was hopeful that I could reuse some of the well thought out components already available.

The big options I found were django-guardian, Rules, django-rest, or figuring it out myself.

Checking in with Rules, it looks like its not so much a Django specific implementation as much as a way to implement rules + predicates to get things done, seems interesting but its not clear from the older examples provided how its supposed to work with a modern Django – I am going to skip it for now.

Checking in with django-guardian, I am seeing better examples that are relevant, but truthfully I need 1 owner as a permission.

On the face of it, I feel like a lot of these systems are really complex for my purposes – its cool to be the ultimate system but I just need a system that provides:

Users which have permission to objects.
Permissions means you have all roles on the object (edit/delete.)
Each object that is ownable on creation must have a creator/author, who when its assigned can automatically edit/delete the thing.

Since I don’t need a complex permission system, I don’t care about meta-magic, and I just want to design a straight forward set of gates to manage, I decided to go off on my own and work on implementing an author that’s checked.

I added an author field to my model, the current user sets it in the model if its not already set, and then I use the is_current_authenticated_user() function from the django-currentuser package to tell if this matches the current request.

I also implemented a simple map behind the scenes that connects forms to models to pages so that the add/edit functions can just do their work and pass the correct set of forms back to the correct set of pages – I will have to keep looking because it wasnt clear how to infer the model’s form class automatically without some mightier python than I current posses.

I disable the form fields for niceties in the UI if you don’t own the object (so you dont accidentally try and edit it), but to be sure I also prevent saving on the backend if things don’t match up, and there’s an added bonus of free CSRF built into Django.

And so after all that I have straight forward solution without even touching the rules code – it took me more time to noodle over third party options than just set down and write it.

I am working on a new project with the aim of scraping a wide variety of data sources, with goals as various as arrest data and recipe websites.

I am going to catalog the project’s process through the various issues, and try to encapsulate as much as possible so that if there are components other’s want to use they can.

Here’s the process I am taking:

Decide on a language.
- What languages are well supported for web scraping?
- Plenty, but of them I am the most familiar with python.
Decide on feature set.
- We are going with a headless HTML scraper for now, javascript will be supported in future versions.
- We need to populate some basic datapoints for a recipe, but we’ll progressively enhance sites that don’t have the things we want.
Build initial targeted list of sites to scrape
- This was done by my partner in crime, more on that later.
- >200 sites listed to target specific delicious recipes.
Find some useful resources for scraping sites with python.
- Why go further than the tutorial for the tool you are using? https://docs.scrapy.org/en/latest/intro/tutorial.html
Construct simple loop for testing/validatign data.
- For each site, download a recipe manually, format it to your expectations on the intermediate format, and wire a test to verify that we are not creating that correctly currently.
- Ensure that recipe correctly matches our expectations for our test.
- Find the right grain of complexity to store rules for custom sites – some sort of lookup + fallback scenario.
- Research the common formats for recipes –

This Sunday’s execution:

Get an isolated VM up.
Install python3 and python3-venv on debian.
Find there’s actually a few good tools which implement this in a more general way: https://github.com/hhursev/recipe-scrapers
Test that out on a specific site (sorry, none of those links just yet!)
Find that recipe-scrapers actually supports it… automagically, even though its not on their supported site list.
Well… that was easy.

So I didn’t expect a great package to be available out of the box for the problem at hand, so kudos to these wonderful devs.

For my next post I will test combining spidering and downloading to create a cohesive “cookbook” from a target site.

Month: December 2020

Wrapping my head around object level permissions with Django

Recipe Scraping with Python