Preface
In 2019 there was a ruling in the USA that said e-commerce websites need to file sales tax for the state that an order is being shipped too. I work for an e-commerce company that is based in Canada, but sells into the USA and collects USD. Because our finance team is small, and not very well versed in filing USA taxes, we contracted the help of another company under our corporate umbrella that already sells and ships in the USA and can file our taxes for us.
Great!
My boss comes to me with these details and an XML document we were to conform our order data into. We were given the login details to a remote SFTP server where the XML document would be sent to. That's it. So, if we break it down the original scope of the project is:
- Generate an XML Document
- Upload it to an SFTP Server
So from my understanding, there was a system at the other end of that SFTP server that would process the XML documents I was going to be generating. There was no staging or testing environment at the time, either, so we had to test this up until the processing of the actual XML documents in Production.
Redundancy & Design Decisions
My boss was very adamant about making sure we had a fail-safe system, document backup redundancy, and utilizing every AWS feature that was being sold to us from our account rep. So we embarked on what every software developer does at some point in their career and over-engineered a system that would get us our XML Documents. We were enamored with AWS Lambda, the "serverless" code execution, SQS Queues, the AWS branded RabbitMQ services, and of course, S3 - object storage.
We are an e-commerce company and our platform of choice is Magento. I have a love-hate relationship with Magento that should be saved for another blog post. From Magento, now Adobe Commerce, we needed to make sure we were capturing the order details of US customers. Magento has its own framework, and within that framework they have an events systems. I am going to caveat right here that the Event system in Magento is deprecated and should not be used. Why it shouldn't be used is for another blog post on another day.
So I create an Observer
for the sales_order_created
event in Magento. This code will check all incoming orders into the system and check to see if the order came from our US Webstore. If it does, we will queue the Order ID in a database table to be processed later. This system works fine enough - other than Magento deprecating their events system - so we'll have to find a new way around this eventually.
Magento also has a CRON scheduler and runner within its framework. So here I would create two discrete CRON Jobs. One job would start processing the queue table record, generate the XML Document and persist it to disk, and store the file path of that document back into the queue table record. The second CRON job would look for those file paths, open the filem upload it to AWS S3, and then remove the record when it was all said an done. This allowed us to asynchronously generate XML documents and upload them into a new system and forget any of it ever happened.
AWS S3 allows us to run code on the Put Object event in the form of an AWS Lambda. So we decided once the XML document was in S3, we would send it to the AWS SQS Queue. We then created another Lambda that would run on a schedule where it would read messages off the SQS Queue and upload the XML Document to the SFTP Service.
Why do it like this? Because we didn't want to hammer the shoddy SFTP service with large volumes of connections or documents. This also let us requeue the message if there was a failure sending the XML Document to the SFTP service. Was this a requirement for the project? No. It's a system we invented for requirements we imposed on ourselves. The stakeholders did not know or care about this at all.
We Tested on Prod
Early in the project there was no staging or testing environment. I reached out to the other team and mentioned this may be a good thing to set up to ensure we can send all of our testing orders on our Staging environment into a testing system of theirs and make sure we have everything up to snuff. However, this did not happen for a while. We built our staging infrastructure and tested it all up into pushing the XML Documents into an SFTP service of our own we hosted on AWS EC2. We had no idea if these documents would even be valid until we deployed to production and started seeing orders flow in.
So that's what we did.
This is when the feedback loop was established between our teams. The orders would flow in, they would fail the sanity checks in their accounting service, and we would have to make the code changes to adjust. My plea to have them set up a staging environment was finally acknowledged after a short while of having these email discussions back and forth. We were able to disable the service in Production, make our changes, test on staging, and deploy to production when we were confident.
The big problem with this is that we were a small software team with a myriad of problems and projects to work on. This small document push project was very low in my personal priorities. It was sold to me, and I think my boss at the time, as a small project that we could set and forget. I don't think my perspective was ever properly relayed to the other team. This could have been something we discussed at the onset of the project. Every issue on their end would be super high priority, and we would receive extremely rude emails insisting we fix the problem right away.
I had to explain to them, many times, that we had a development process. We created tickets, prioritized those tickets, deployed them to lower environments and tested, and then scheduled releases. I'm not sure what other software teams they would have worked with in the past, but we could, and refused to, just fix things in production.
Added Complexity
A short time later, the business decided to embark on selling on Amazon Marketplace. One thing they do not tell you about being a Marketplace seller is that you need to file your sales taxes yourself. We exclusively sell these products to American customers so of course, we ned to tap into our tax reporting system we designed above.
Before we started the journey of actually automating this process, unbeknownst to me, the other team was manually curating these orders and their financial data and reporting the taxes themselves. This easily became a grueling process as we started to see the volume of sales increase. This is the kind of problem software development should provide a solution for.
Moving the Amazon Marketplace orders into this SFTP service was handed off to a few different developers below me. I was able to guide them and help them make design decisions about the process. Luckily, we were able to tap into our over-engineered AWS proprietary system of backing up the XML documents on S3, and sending them over to the SFTP using Lambda. All we had to do now is generate the XML document and put it in the S3 bucket.
Let's just add another S3 bucket and another Lambda. Buy more and more into the AWS ecosystem. This time, we would upload the Sales Report from the Amazon Seller Central account - which a daily generated report in the form of a TSV (not a CSV, a TSV), parse that file out into discrete XML documents and then upload to our backup bucket. The complexity of the TSV is that each order item is a row in the data, so the code would need to parse each row, and combine each matching order_id
into a single Order
object that is then transformed into the valid XML.
The valid XML was a problem. Again all we had was examples of the thousands of automatically generated XML documents and that original sample document we built the whole original system off of. Luckily though, this time we have a staging environment for their systems so that we can actually test. The problem is that we don't have direct access to it so each failure has to be discussed, one after the other, for each XML document we send.
- "We need the state to be in all caps"
- "We need a Shipping type"
- "We need a shipping charge"
- "We need a shipping date"
- "There can not be any
null
nodes in the document." - "Those
null
nodes have to exist, but have a blank space in their value"
And it goes on and on.
Have Meetings
Being a software developer, having worked on this particular project off and on for almost four years, you could say that I was frustrated. I started thinking back about what could have made this entire thing less painful. "This was supposed to be a small project!" I yell to myself as I upload yet another TSV into the staging environment, waiting for the email from the other side to tell me if it worked or not. And I think I've concluded that there were some pretty big missteps along the way that felt very small at the time.
The first thing is that I didn't know who the stakeholders were. Literally, I did not know their names or titles. We were dropping documents off in a little black box and that was that. If I could go back to 2019 I would have liked to arrange a meeting with them. These are the questions I would have asked them from the onset:
- Who are you?
- What do you do?
- What is this system for?
- How does this system work?
- How much control do we have other the system?
- How do you propose we go about testing the system?
- What is the priority for this project to your team?
- How would you like to structure feedback for my team?
- What are the requirements for the data?
I think if we had a kickoff meeting where I could ask those questions and gather those requirements, we could have ended up with a much better relationship and delivered a much better experience for the other team. All of those questions can be fielded, answered, and documented within an hour. From there, my team can work on the design of the system as we see fit.
There seems to be this belief that meetings are not important. I disagree - for the most part. Meetings should have an objective, they should have rigid timelines, and they should benefit all parties involved. I have absolutely been a part of "Could have been an email" meetings, but I have also sent those emails and found they were not as effective. The goal with a project kickoff meeting is to get an understanding of the common goal and establish the stakeholders expectations.
These questions being answered would have likely given us more insight into the overall goal of the system, when we needed to add more complexity to that same system. Whether it's a special surcharge to shipping to Colorado, including our coupon or reward points discounts, or adding a whole other channel of revenue all together. We would be in a much better position to make those additions with less friction.
Another thing this meeting could facilitate is a clear explanation of how the Software Development Lifecycle for my team actually works. So instead of every issue becoming urgent "PLEASE FIX NOW" emails, we could temper expectations that we need to:
- Find the bug in the code
- Write the changes
- Test the changes
- Schedule a release for the changes
Understanding different business units, how they organize their work, and how they determine something as "done" is extremely important to the success of a project. I think I truly understand this now, as after four years on this simple project:
- Generate an XML Document
- Upload it to an SFTP Server
We are still routinely dealing with issues.
Conclusion
I think I am a more mature developer now than I was in 2019. I would push to take the time to have these kickoff meetings, to understand the other party and their motives more. It is a crucial part of being a software developer that I often see overlooked when I talk about the profession with my colleagues. Too often are we talking about languages, frameworks, and new shiny tools. At the end of the day, I am here to solve a problem, and build a system for someone else to use. I should make sure I know as much about the problem and the system before I start writing a single line of code.