(Replying to PARENT post)
re: Write Your Tests
I've never been successful with this. Sure, write (backfill) as many tests as you can.
But the legacy stuff I've adopted / resurrected have been complete unknowns.
My go-to strategy has been blackbox (comparison) testing. Capture as much input & output as I can. Then use automation to diff output.
I wouldn't bother to write unit tests etc for code that is likely to be culled, replaced.
re: Proxy
I've recently started doing shadow testing, where the proxy is a T-split router, sending mirror traffic to both old and new. This can take the place of blackbox (comparison) testing.
re: Build numbers
First step to any project is to add build numbers. Semver is marketing, not engineering. Just enumerate every build attempt, successful or not. Then automate the builds, testing, deploys, etc.
Build numbers can really help defect tracking, differential debugging. Every ticket gets fields for "found" "fixed" and "verified". Caveat: I don't know if my old school QA/test methods still apply in this new "agile" DevOps (aka "winging it") world.
(Replying to PARENT post)
- Get a local build running first.
Often, a complete local build is not possible. There are tons of dependencies, such as databases, websites, services, etc. and every developer has a part of it on their machine. Releases are hard to do.
I once worked for a telco company in the UK where the deployment of the system looked like this: (Context: Java Portal Development) One dev would open a zip file and pack all the .class files he had generated into it, and email it to his colleague, who would then do the same. The last person in the chain would rename the file to .jar and then upload it to the server. Obviously, this process was error prone and deployments happened rarely.
I would argue that getting everything to build on a central system (some sort of CI) is usefull as well, but before changing, testing, db freezing, or anything else is possible, you should try to have everything you need on each developer's machine.
This might be obvious to some, but I have seen this ignored every once in a while. When you can't even build the system locally, freezing anything, testing anything, or changing anything will be a tedious and error prone process...
(Replying to PARENT post)
[1] https://www.safaribooksonline.com/library/view/working-effec...
(Replying to PARENT post)
FWIW, if you want to have a look at a reasonably complex code base being broken up into maintainable modules of modernized code, I rewrote Knockout.js with a view to creating version 4.0 with modern tooling. It is now in alpha, maintained as a monorepo of ES6 packages at https://github.com/knockout/tko
You can see the rough transition strategy here: https://github.com/knockout/tko/issues/1
In retrospect it would've been much faster to just rewrite Knockout from scratch. That said, we've kept almost all the unit tests, so there's a reasonable expectation of backwards compatibility with KO 3.x.
(Replying to PARENT post)
I work on my own side projects, read lots of other people's code on github and am always looking to improve myself in my craft outside of work, but I worry it's not enough.
(Replying to PARENT post)
I'd love to hear a more balanced view on this. I think this idea is preached as the gospel when dealing with legacy systems. I absolutely understand that the big rewrite has many disadvantages. Surely there is a code base that has features such that a rewrite is better. I'm going to go against the common wisdom and wisdom I've practiced until now, and rewrite a program I maintain that is
1. Reasonably small (10k loc with a large parts duplicated or with minor variables changed).
2. Barely working. Most users cannot get the program working because of the numerous bugs. I often can't reproduce their bugs, because I get bugs even earlier in the process.
3. No test suite.
4. Plenty of very large security holes.
5. I can deprecate the old version.
I've spent time refactoring this (maybe 50 hours) but that seems crazy because it's still a pile of crap and at 200 hours I don't think it look that different. I doubt it would take 150 hours for a full rewrite.
Kindly welcoming dissenting opinions.
(Replying to PARENT post)
I am now working on a node.js app and I find it really hard to make any changes. Even typos when renaming a variable often go undetected unless you have perfect test coverage.
This is not even a large code base and I find it already hard to manage. Maybe i have been using typed languages for a long time so my instincts don't apply to dynamic languages but I seriously wonder how one could maintain a large JavaScript codebase.
(Replying to PARENT post)
Then the upper management appointed a random guy to do a "Big Bang" refactor: it has been failing miserably (it is still going on, doing way more harm than good). Then it all started to go really bad... and I quit and found a better job!
(Replying to PARENT post)
A huge issue with sticking to an old codebase for such a long time is that it gets older and older. You get new talent that doesn't want to manage it and leave, so you're stuck with the same old people that implemented the codebase in the first place. Sure they're smart, knowledgable people in the year 2000, but think of how fast technology changes. Change, adapt, or die.
(Replying to PARENT post)
I've done this sort of work quite a number of times and I've made mistakes and learned what works there.
It's actually the most difficult part to navigate successfully. If you already have management's trust (i.e., you have the political power in your organization to push a deadline or halt work), you're golden and all of the things mentioned in the OP are achievable. If not, you're going to have to make huge compromises. Front-load high-visibility deliverables and make sure they get done. Prove that it's possible.
Scenario 1) I came in as a sub-contractor to help spread the workload (from 2 to 3) building out a very early-stage application for dealing with medical records. I came in and saw the codebase was an absolute wretched mess. DB schema full of junk, wide tables, broken and leaking API routes. I spent the first two weeks just bulletproofing the whole application backend and whipping it into shape before adding new features for a little while and being fired shortly afterwards.
Lesson: Someone else was paying the bills and there wasn't enough visibility/show-off factor for the work I was doing so they couldn't justify continuing to pay me. It doesn't really matter that they couldn't add new features until I fixed things. It only matters that the client couldn't visibly see the work I did.
Scenario 2) I was hired on as a web developer to a company and it immediately came to my attention that a huge, business-critical ETL project was very behind schedule. The development component had a due date three weeks preceding my start date and they didn't have anyone working on it. I asked to take that on, worked like a dog on it and knocked it out of the park. The first three months of my work there immediately saved the company about a half-million dollars. Overall we launched on time and I became point person in the organization for anything related to its data.
Lesson: Come in and kick ass right away and you'll earn a ton of trust in your organization to do the right things the right way.
(Replying to PARENT post)
1) "Do not fall into the trap of improving both the maintainability of the code or the platform it runs on at the same time as adding new features or fixing bugs."
Thanks. However, in many situations this is simply not possible because the business is not there yet so you need to keep adding new features and fix bugs. And still, the code base has to be improved. Impossible? Almost, but we're paid for solving hard problems.
2) "Before you make any changes at all write as many end-to-end and integration tests as you can."
Sounds cool, except in many cases you have no idea how the code is supposed to work. Writing tests for new features and bugfixes is a good advice (but that goes against other points the OP makes).
3) "A big-bang rewrite is the kind of project that is pretty much guaranteed to fail.
No, it's not. Especially if you're rewriting parts of it at a time as separate modules
My problem with the OP is really that it tells you how to improve a legacy codebase given no business and time pressure.
(Replying to PARENT post)
> Before you make any changes at all write as many end-to-end and integration tests as you can.
I'm beginning to see this as a failure mode in and of itself. Once you give people E2E tests it's the only kind of tests they want to write. It takes about 18 months for the wheels to fall off so it can look like a successful strategy. What they need to do is learn to write unit tests, but You have to break the code up into little chunks. It doesn't match their aesthetic sense and so it feels juvenile and contrived. The ego kicks in and you think you're smart enough you don't have to eat your proverbial vegetables.
The other problem is e2e tests are slow, they're flaky, and nobody wants to think about how much they cost in the long run because it's too painful to look at. How often have you see two people huddled over a broken E2E test? Multiply the cost of rework by 2.
(Replying to PARENT post)
> "add a single function to increment these counters based on the name of the event"
While the sentiment is a good one, I would warn against introducing counters in the database like this and incrementing them on every execution of a function. If transactions volumes are high, then depending on the locking strategy in your database, this could lead to blocking and locking. Operations that could previously execute in parallel independently now have to compete for a write lock on this shared counter, which could slow down throughput. In the worst case, if there are scenarios where two counters can be incremented inside different transactions, but in different sequences (not inconceivable in a legacy code), then you could introduce deadlocks.
Adding database writes to a legacy codebase is not without risk.
If volumes are low you might get away with it for a long time, but a better strategy would probably just to log the events to a file and aggregate them when you need them.
(Replying to PARENT post)
(Replying to PARENT post)
https://alwaystrending.io/articles/software-engineer-enterta...
(Replying to PARENT post)
Highly disagree about the order of coding. That guy wants to change the platform, redo the architecture, refactor everything, before he starts to fix bugs. That's a recipe for disaster.
It's not possible to refactor anything while you have no clue about the system. You will change things you don't understand, only to break the features and add new bugs.
You should start by fixing bugs. With a preference toward long standing simple issues, like "adding a validation on that form, so the app doesn't crash when the user gives a name instead of a number". See with users for a history of simple issues.
That delivers immediate value. This will give you credit quickly toward the stakeholders and the users. You learn the internals doing, before you can attempt any refactoring.
(Replying to PARENT post)
The idea is a good one but the specific suggested implementation .. hasn't he heard of statsd or kibana?
(Replying to PARENT post)
I have done a number of serious refactoring myself and god tests will save me a huge favor despite I have to bite teeth for a few days to a few weeks.
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
This type of situation is usually a red flag that the company's management doesn't understand the value of maintaining software until the absolutely have to. That, in itself, is an indicator of what they think of their employees.
(Replying to PARENT post)
(Replying to PARENT post)
My actions are usually these:
* Fix the build system, automate build process and produce regular builds that get deployed to production. It's incredible that some people still don't understand the value of the repeatable, reliable build. In one project, in order to build the system you had to know which makefiles to patch and disable the parts of the project which were broken at that particular time. And then they deployed it and didn't touch it for months. Next time you needed to build/deploy it was impossible to know what's changed or if you even built the same thing.
* Fix all warnings. Usually there are thousands of them, and they get ignored because "hey, the code builds, what else do you want." The warning fixing step allows to see how fucked up some of the code is.
* Start writing unit tests for things you change, fix or document. Fix existing tests (as they are usually unmaintained and broken).
* Fix the VCS and enforce sensible review process and history maintenance. Otherwise nobody has a way of knowing what changed, when and why. Actually, not even all parts of the project may be in the VCS. The code, configs, scripts can be lying around on individual dev machines, which is impossible to find without the repeatable build process. Also, there are usually a bunch of branches with various degrees of staleness which were used to deploy code to production. The codebase may have diverged significantly. It needs to be merged back into the mainline and the development process needs to be enforced that prevents this from happening in the future.
Worst of all is that in the end very few people would appreciate this work. But at least I get to keep my sanity.
(Replying to PARENT post)
I'm loath to give examples so as not to constrain your thinking, but, for example, imagine a bunch of hairy Perl had been built to crawl web sites as part of whatever they're doing, and it just so happens that these days curl or wget do more, and better, and less buggy, than everything they had built. (think of your own examples here, anything from machine vision to algabreic computation, whatever you want.)
In fact isn't this the case for lots and lots of domains?
For this reason I'm kind of surprised why the "big bang rewrite" is, written off so easily.
(Replying to PARENT post)
Code base that is non-existent, as the previous attempts were done with MS BI (SSIS) tools (for all things SSIS is not for) and/or SQL Stored procedures, with no consistency on coding style, documentation, over 200 hundred databases (sometimes 3 per process that only exist to house a handful of stored procedures), and a complete developer turn over rate of about every 2 years. with Senior leadership in the organization clueless to any technology.
As you look at a ~6000 lines in a single stored procedure. You fight the urge to light the match, and give it some TLC ( Torch it, Level it, Cart it away) to start over with something new.
Moral of the story, As you build, replace things stress to everyone to "Concentrate of getting it Right, instead of Getting it Done!" so you don't add to the steaming pile.
(Replying to PARENT post)
(Replying to PARENT post)
I really mean it, a whole lot of programmers simply dont read the codebase before starting a task. Guess the result, specially in terms of frustration.
(Replying to PARENT post)
^ Yes and no. That might take forever and the company might be struggling with cash. I would instead consider adding a metrics dashboard. Basically - find the key points: payments sent, payments cleared, new user, returning user, store opened, etc. THIS isn't as good as a nice integration suite - but if a client is hard on cash and needs help - this can be setup in hours. With this setup - after adding/editing code you can calm investors/ceos'. Alternatively, if it's a larger corp it will be time strapped - then push for the same thing :)
(Replying to PARENT post)
I completely agree with the sentiment that scoping the existing functionality and writing a comprehensive test suite is important - but how should you proceed when the codebase is structured in such a way that it's almost impossible to test specific units in isolation, or when the system is hardcoded throughout to e.g. connect to a remote database? As far as I can see it'll take a lot of work to get the codebase into a state where you can start doing these tests, and surely there's a risk of breaking stuff in the process?
(Replying to PARENT post)
The key is an engaged business unit, clear requirements, and time on the schedule. Obviously if one or more of these things sounds ridiculous then the odds of success are greatly diminished. It is much easier if you can launch on the new platform a copy of the current system, not a copy + enhancements, but I've been on successful projects where we launched with new functionality.
(Replying to PARENT post)
(Replying to PARENT post)
Freezing a whole system is practically impossible. What you usually get is a "piecewise" freeze. As in: you get to have a small portion of the system to not change for a given period.
The real challenge is: how can you split your project in pieces of functionalities that are reasonably sized and replaceable independently from each other.
There is definitely no silver bullet for how to do this.
(Replying to PARENT post)
edit: I'm being a little snarky here, but the assumptions here are just too much. This is all best-case scenario stuff that doesn't translate very well to the vast majority of situations it's ostensibly aimed at.
(Replying to PARENT post)
At my last gig we used this exact strategy to replace a large ecommerce site piece by piece. Being able to slowly replace small pieces and AB test every change was great. We were able to sort out all of the "started as a bug, is now a feature" issues with low risk to overall sales.
(Replying to PARENT post)
Really? Are there no circumstances under which this would be appropriate? It seems to me this makes assumptions about the baseline quality of the existing codebase. Surely sometimes buying a new car makes more sense than trying to fix up an old one?
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
I'm actually quite curious; how long does this process typically take you?
What are the most relevant factors on which it scales? Messiness of existing code? Number of modules/LOC? Existing test coverage?
(Replying to PARENT post)
> write as many end-to-end and integration tests as you can
and
> make sure your tests run fast enough to run the full set of tests after every commit
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
---
1. Find out which functionality is still used and which functionality is critical
Management will always say "all of it". The problem is that what they're aware of is usually the tip of the iceberg in terms of what functionality is supported. In most large legacy codebases, you'll have major sections of the application that have sat unused or disabled for a couple of decades. Find out what users and management actually think the application does and why they're looking to resurrect it. The key is to make sure you know what is business critical functionality vs "nice to have". That may happen to be the portions of the application that are currently deliberately disabled.
Next, figure out who the users are. Are there any? Do you have any way to tell? If not, if it's an internal application, find someone who used it in the past. It's often illuminating to find out what people are actually using the application for. It may not be the application's original/primary purpose.
---
2. Is the project under version control? If not, get something in place before you change anything.
This one is obvious, but you'd be surprised how often it comes up. Particularly at large, non-tech companies, it's common for developers to not use version control. I've inherited multi-million line code bases that did not use version control at all. I know of several others in the wild at big corporations. Hopefully you'll never run into these, but if we're talking about legacy systems, it's important to take a step back.
One other note: If it's under any version control at all, resist the urge to change what it's under. CVS is rudimentary, but it's functional. SVN is a lot nicer than people think it is. Hold off on moving things to git/whatever just because you're more comfortable with it. Whatever history is there is valuable, and you invariably lose more than you think you will when migrating to a new version control system. (This isn't to say don't move, it's just to say put that off until you know the history of the codebase in more detail.)
---
3. Is there a clear build and deployment process? If not, set one up.
Once again, hopefully this isn't an issue.
I've seen large projects that did not have a unified build system, just a scattered mix of shell scripts and isolated makefiles. If there's no way to build the entire project, it's an immediate pain point. If that's the case, focus on the build system first, before touching the rest of the codebase. Even for a project which excellent processes in place, reviewing the build system in detail is not a bad way to start learning the overall architecture of the system.
More commonly, deployment is a cumbersome process. Sometimes cumbersome deployment may be an organizational issue, and not something that has a technical solution. In that case, make sure you have a painless way to deploy to an isolated development environment of some sort. Make sure you can run things in a sandboxed environment. If there are organizational issues around deploying to a development setup, those are battles you need to fight immediately.
(Replying to PARENT post)
(Speaking from experience from work)
(Replying to PARENT post)
I don't agree with this. People can't write proper coverage for a code base that they 'fully understand'. You will most likely end up writing tests for very obvious things or low hanging fruits; the unknowns will still seep through at one point or another.
Forget about refactoring code just to comply with your tests and breaking the rest of the architecture in the process. It will pass your 'test' but will fail in production.
What you should be doing is:
1. Perform architecture discovery and documentation (helps you with remembering things).
2. Look over last N commits/deliverables to understand how things are integrating with each other. It's very helpful to know how code evolved over time.
3. Identify your roadmap and what sort of impact it will have on the legacy code.
4. Commit to the roadmap. Understand the scope of the impact for your anything you add/remove. Account for code, integrations, caching, database, and documentation.
5. Don't forget about things like jobs and anything that might be pulling data from your systems.
Identifying what will be changing and adjusting your discovery to accommodate those changes as you go is a better approach from my point of view.
By the time you reach the development phase that touches 5% of architecture, your knowledge of 95% of design will be useless, and in six months you will forget it anyways.
You don't cut a tree with a knife to break a branch.
(Replying to PARENT post)
If it's code that has been running successfully in production for years, be humble.
Bugifxes, shortcuts, restraints - all are real life and prevent perfect code and documentation under pressure.
The team at Salesforce.com is doing a massive re-platforming right now with their switch to Lightning. Should provide a few good stories, switching over millions of paying users, not fucking up billions in revenue.
(Replying to PARENT post)
I don't disagree at all, but I think the more valuable advice would be to explain how this can be done at a typical company.
In my experience, "feature freeze" is unacceptable to the business stakeholders, even if it only has to last for a few weeks. And for larger-sized codebases, it will usually be months. So the problem becomes explaining why you have to do the freeze, and you usually end up "compromising" and allowing only really important, high-priority changes to be made (i.e. all of them).
I have found that focusing on bugs and performance is a good way to sell a "freeze". So you want feature X added to system Y? Well, system Y has had 20 bugs in the past 6 months, and logging in to that system takes 10+ seconds. So if we implement feature X we can predict it will be slow and full of bugs. What we should do is spend one month refactoring the parts of the system which will surround feature X, and then we can build the feature.
In this way you avoid ever "freezing" anything. Instead you are explicitly elongating project estimates in order to account for refactoring. Refactor the parts around X, implement X. Refactor the parts around Z, implement Z. The only thing the stakeholders notice is that development pace slows down, which you told them would happen and explained the reason for.
And frankly, if you can't point to bugs or performance issues, it's likely you don't need to be refactoring in the first place!