Half Baked: A Journey Through Legacy Code
Some ideas come from a really good intention. Some times these ideas are implemented and they work! But some times, especially over time, small things start to change in the ecosystem these ideas were fleshed out for and bugs start to appear. To make this situation a bit more dramatic, the bugs are on a platform you weren’t even around for when it was designed, built, tested, and used for its first two years.
I’ve written bad software, you’ve written bad software. Some times people just write bad software. Not only that, some people don’t even document their bad software. This is a huge pet peeve of mine because I’m always trying to write my code with the intent of some one else reading it later.
The other day I was notified about a bug in a proprietary CMS my company had built before I started working for them. I’ve seen the software in action and noticed some pretty weird bugs from an end-user perspective, but didn’t really take a look under the hood. The client seemed happy with the software so not a lot of effort was going to be added to the project. Then the aforementioned bug popped up.
Some Pun About Insects in Baked Goods
The Art Director at the company sent me a message with two links to the pages that were broken. I know the application is built off of CakePHP 2.5, but haven’t really looked at any of the source code. I figured “it was Cake, I would pick it up pretty quick.”
So I looked at the URL, which looked a little something like this:
That’s a very nice, clean, descriptive URL and I like it!
Seeing as the error page was from Cake itself, and not Apache, I figured I’d check the logs to see what is causing the error. I did some grep-foo on the file and found that the route was calling the
ArticlesController::view() method. That doesn’t fit the convention for controllers and routes in CakePHP, so I decided to look at the routes files and see how
ArticlesController was being called. So there must be some sort of manual route set for that specific action, right? Nope. But there is an interesting
App method call that I should look into.
App::import('Slugger.Lib', 'SlugRoute'); SlugRoute::connect();
If you are unfamiliar with CakePHP 2.x, it predates the wide spread acceptance of PSR-1 or PSR-4 autoloading standards. It uses a static class called App that acts primarily as an autoloader.
I was able to find a
Plugin folder called
Slugger and so I decided to investigate the
connect() method. Based on the
App::import() method, its loading the
SlugRoute class from the
Slugger.Lib namespace. This would be the equivalent of using
use Slugger\Lib\SlugRoute; in a more modern PHP setting.
Plugins – Half Baked Ideas?
The sole method in this class is in fact the
connect() method which is very basic. It calls upon the model to grab all the columns from the slugs table and connects them all to the main application’s
Router class. An interesting line in this method goes like this:
$route = json_decode($slug['Slug']['route'], true); Router::connect('/' . $slug['Slug']['name'], $route);
So for every row in the slugs table we are decoding JSON from a column, and associating it with a route name from the database. Time to look at this table and what kind of data is in there.
mysql> SHOW COLUMNS FROM slugs; +----------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | name | varchar(255) | NO | | NULL | | | route | varchar(255) | NO | | NULL | | | model | varchar(255) | NO | | NULL | | | model_id | int(11) | NO | | NULL | | +----------+--------------+------+-----+---------+----------------+
This is an extremely basic table. When I look at some examples of data that are in these columns, I see that the
name column is a full route, such as
news/personnel/slug-of-some-article, a JSON encoded CakePHP 2.x route array, the name of the Model itself, and the ID to pass what I assume the controller’s
view() method takes as a parameter – both pieces of data by the way are within the encoded route array. There’s a lot of redundant data being saved around the application.
Gluten Free Snacks
I prefer a very dense cake. Something like a coffee cake, or a pound cake. Hell, fruitcake isn’t even that bad. These dense cakes though are a terrible metaphor for how CakePHP applications should be built out. So I’m going to cut the cruft and get to the point. We can make the application logically faster by consolidating data together.
“But Dave! We must normalize everything!” Not a bad point, but really do you think we need a whole table for slugs when we can just have a slug field in the row of the data we’re looking up? I mean it could work well, but the current implementation kind of makes my head hurt.
Think of the request to response the application is making with the plugin.
- Client asks for URI
SlugRoute::connect()method looks through all of the slugs in the database, decodes the JSON string into an array (completely ignoring other data in the
slugstable…), and adds them as a route to the
Router. (Holy O(n^2) Batman!)
- The Dispatcher matches
/news/personnel/some-slug-to-an-articleto one of the routes, and loads the objects and proceed with the logic.
So I’m thinking that we can probably skip that second step entirely. I think I want to pass the route directly into the
ArticlesController::view() method. This saves an extra call to the database per every request cycle. So what else is affected by this plugin?
Turns out, just about everything.
Let Them Eat Cake!
I’m going to start off with finding out exactly what models are using the
Slugger plugin. Hey, I found a use for that
mysql> SELECT DISTINCT model FROM slugs; +-----------------+ | model | +-----------------+ | Page | | Article | | Bio | | ... (more) | +-----------------+
Nice, so I’m going to assume all of these models will have to be refactored in some way. If some of them are solely using the
Slugger plugin for the route, I think it may be easier to explicitly determine the route in the routes configuration. Some of these I know are generated during the use of content-management system. I am considering for each model that has a “name” field, I will make a
slug field. From there I will make the plugin more functional wherein it just generates slugs. What you do with the slug is none of the plugin’s business.
Bringing Cake to Pi(e) Party
Disclaimer: I’ve run out of shitty puns for section titles and this has absolutely nothing to do with the number pi, but rather the event of making a pie for Pi Day (March 14th, by the way).
I must admit that I did take some time away from this project to work on other things. Last week though, I decided to re-open this wound and see if I can figure out the true problem occurring within the platform: creating a
Bio will create an unpublished
Article. But you know what? So will publishing the
Bio. That’s right, the system is arbitrarily creating two articles of literally the same content (ripped out of the
CakePHP has these great callback method for performing an action before and after a bunch of events in the CakePHP request lifecycle. Remember, every PHP application is born and dies within the request-response cycle (this may change with the likes of ReactPHP…we’ll see). You’re a mother and murderer all at once.
Naturally I went to look up the
afterSave() method as that’s probably where the
Article was being generating.
And it was.
I can tell right away that this is a new method introduced by an ex-colleague of mine due to the fact that there is some directive comments throughout the lines. Bless her, she did me (and other future developers looking at this project) a huge favour.
There is a huge problem here though, where for every single save whether the
Bio is published or not, it creates an
Article. This is where the bug truly lays. So just like the theme of this post, the idea is great but the implementation was slightly short sighted. I truly believe this could have been avoided if there was adequate documentation for the application in the first place.
The application introduces a very basic versioning system with single characters as flags for the status of content. When things become published, a copy of the record is made with a new status character flag. Throughout the entire application, there are
'W'‘s littered about, being used for checks and logic. Thankfully they are slightly documented in the CakePHP
AppModel inside of a public array with the key (???) being the flag and a description of what it means as the value. If you ask me, it should be the other way around seeing as the flag is the value being stored and checked throughout the application. Here’s an example:
public $statuses = array( 'A' => 'Active' );
The first real problem here, is that its a mutable array in the parent class of every single model. This is extremely unsafe, as they can be changed at any time. Thankfully (but also not really because I had no idea what these characters even meant) this array is never actually used.
So I fleshed out these statuses and made wonderful constants, found all the magic instances of these statuses and replaced them with my wonderful constants.
const STATUS_ACTIVE = 'A';
Immutability is an absolutely beautiful thing. Not only that, each instance of magic characters now makes sense to read. Granted, I have not gone through all of the models and controller to which use these magic characters.
Finally, I introduced a very basic conditional to check the appropriate status of a
Bio that will only initiate the creation of an
Article once the
Bio has been published. Phew.
So despite my journey taking a wicked left turn when it should have been a subtle right, I was able to track down the issue with the application. I was able to leave a little bit of good behind for the next folks who have to read through it. I hope I left a positive impact on that codebase.
I tell the juniors I work with the same thing all the time: “Write code like somebody else is going to read it.” Mostly because some one else will. Nobody is perfect, bugs arise, and knowing the flow of an application will cut down on time (see: cost) when debugging and fixing an issue. Coding standards exist for a reason and should be followed so that when some one else in the company takes a look at a legacy project, they aren’t confused and baffled.
Document everything. Give methods and constants meaningful names. Make a chart, or a diagram about the problem the application solves. These are all things that are shouted at every young developer from the moment they start writing code. Take five minutes now to save fifteen later.