๐Ÿ‘คdarwhy๐Ÿ•‘8y๐Ÿ”ผ653๐Ÿ—จ๏ธ293

(Replying to PARENT post)

> Do not fall into the trap of improving both the maintainability of the code or the platform it runs on at the same time as adding new features or fixing bugs.

I don't disagree at all, but I think the more valuable advice would be to explain how this can be done at a typical company.

In my experience, "feature freeze" is unacceptable to the business stakeholders, even if it only has to last for a few weeks. And for larger-sized codebases, it will usually be months. So the problem becomes explaining why you have to do the freeze, and you usually end up "compromising" and allowing only really important, high-priority changes to be made (i.e. all of them).

I have found that focusing on bugs and performance is a good way to sell a "freeze". So you want feature X added to system Y? Well, system Y has had 20 bugs in the past 6 months, and logging in to that system takes 10+ seconds. So if we implement feature X we can predict it will be slow and full of bugs. What we should do is spend one month refactoring the parts of the system which will surround feature X, and then we can build the feature.

In this way you avoid ever "freezing" anything. Instead you are explicitly elongating project estimates in order to account for refactoring. Refactor the parts around X, implement X. Refactor the parts around Z, implement Z. The only thing the stakeholders notice is that development pace slows down, which you told them would happen and explained the reason for.

And frankly, if you can't point to bugs or performance issues, it's likely you don't need to be refactoring in the first place!

๐Ÿ‘คapeace๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Sound advice.

re: Write Your Tests

I've never been successful with this. Sure, write (backfill) as many tests as you can.

But the legacy stuff I've adopted / resurrected have been complete unknowns.

My go-to strategy has been blackbox (comparison) testing. Capture as much input & output as I can. Then use automation to diff output.

I wouldn't bother to write unit tests etc for code that is likely to be culled, replaced.

re: Proxy

I've recently started doing shadow testing, where the proxy is a T-split router, sending mirror traffic to both old and new. This can take the place of blackbox (comparison) testing.

re: Build numbers

First step to any project is to add build numbers. Semver is marketing, not engineering. Just enumerate every build attempt, successful or not. Then automate the builds, testing, deploys, etc.

Build numbers can really help defect tracking, differential debugging. Every ticket gets fields for "found" "fixed" and "verified". Caveat: I don't know if my old school QA/test methods still apply in this new "agile" DevOps (aka "winging it") world.

๐Ÿ‘คspecialist๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

I'd add a prerequisite to the top of this list:

- Get a local build running first.

Often, a complete local build is not possible. There are tons of dependencies, such as databases, websites, services, etc. and every developer has a part of it on their machine. Releases are hard to do.

I once worked for a telco company in the UK where the deployment of the system looked like this: (Context: Java Portal Development) One dev would open a zip file and pack all the .class files he had generated into it, and email it to his colleague, who would then do the same. The last person in the chain would rename the file to .jar and then upload it to the server. Obviously, this process was error prone and deployments happened rarely.

I would argue that getting everything to build on a central system (some sort of CI) is usefull as well, but before changing, testing, db freezing, or anything else is possible, you should try to have everything you need on each developer's machine.

This might be obvious to some, but I have seen this ignored every once in a while. When you can't even build the system locally, freezing anything, testing anything, or changing anything will be a tedious and error prone process...

๐Ÿ‘คcessor๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

This is a good high-level overview of the process. I highly recommend that engineers working in the weeds, read "Working Effectively with Legacy Code" [1], as it has a ton of patterns in it that you can implement, and more detailed strategies on how to do some of the code changes hinted at in this article.

[1] https://www.safaribooksonline.com/library/view/working-effec...

๐Ÿ‘คtaude๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

I mostly agree with this - bite-sized chunks is really the main ingredient to success with complex code base reformations.

FWIW, if you want to have a look at a reasonably complex code base being broken up into maintainable modules of modernized code, I rewrote Knockout.js with a view to creating version 4.0 with modern tooling. It is now in alpha, maintained as a monorepo of ES6 packages at https://github.com/knockout/tko

You can see the rough transition strategy here: https://github.com/knockout/tko/issues/1

In retrospect it would've been much faster to just rewrite Knockout from scratch. That said, we've kept almost all the unit tests, so there's a reasonable expectation of backwards compatibility with KO 3.x.

๐Ÿ‘คbmh_ca๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

How does one get better if they only ever work in code bases that are steaming piles of manure? So far I've worked at two places and the code bases have been in this state to an extreme. I feel like I've been in this mode since the very beginning of my career and am worried that my skill growth has been negatively impacted by this.

I work on my own side projects, read lots of other people's code on github and am always looking to improve myself in my craft outside of work, but I worry it's not enough.

๐Ÿ‘ค_virtu๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

> Do not ever even attempt a big-bang rewrite

I'd love to hear a more balanced view on this. I think this idea is preached as the gospel when dealing with legacy systems. I absolutely understand that the big rewrite has many disadvantages. Surely there is a code base that has features such that a rewrite is better. I'm going to go against the common wisdom and wisdom I've practiced until now, and rewrite a program I maintain that is

1. Reasonably small (10k loc with a large parts duplicated or with minor variables changed).

2. Barely working. Most users cannot get the program working because of the numerous bugs. I often can't reproduce their bugs, because I get bugs even earlier in the process.

3. No test suite.

4. Plenty of very large security holes.

5. I can deprecate the old version.

I've spent time refactoring this (maybe 50 hours) but that seems crazy because it's still a pile of crap and at 200 hours I don't think it look that different. I doubt it would take 150 hours for a full rewrite.

Kindly welcoming dissenting opinions.

๐Ÿ‘คkentt๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

How do people handle this in dynamic languages like JavaScript? I have done a lot of incremental refactoring in C++ and C# and there the compiler usually helped to find problems.

I am now working on a node.js app and I find it really hard to make any changes. Even typos when renaming a variable often go undetected unless you have perfect test coverage.

This is not even a large code base and I find it already hard to manage. Maybe i have been using typed languages for a long time so my instincts don't apply to dynamic languages but I seriously wonder how one could maintain a large JavaScript codebase.

๐Ÿ‘คmaxxxxx๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

I used to work on a messy legacy codebase. I managed to clean it, little by little, even though most of my colleagues and the management were a bit afraid of refactoring. It wasn't perfect but things kinda worked, and I had hope for this codebase.

Then the upper management appointed a random guy to do a "Big Bang" refactor: it has been failing miserably (it is still going on, doing way more harm than good). Then it all started to go really bad... and I quit and found a better job!

๐Ÿ‘คlbill๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Big bang rewrites are needed in order to move forward faster.

A huge issue with sticking to an old codebase for such a long time is that it gets older and older. You get new talent that doesn't want to manage it and leave, so you're stuck with the same old people that implemented the codebase in the first place. Sure they're smart, knowledgable people in the year 2000, but think of how fast technology changes. Change, adapt, or die.

๐Ÿ‘คOutsmartDan๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

All of this seems to focus on the code, after glossing over the career management implications in the first paragraph.

I've done this sort of work quite a number of times and I've made mistakes and learned what works there.

It's actually the most difficult part to navigate successfully. If you already have management's trust (i.e., you have the political power in your organization to push a deadline or halt work), you're golden and all of the things mentioned in the OP are achievable. If not, you're going to have to make huge compromises. Front-load high-visibility deliverables and make sure they get done. Prove that it's possible.

Scenario 1) I came in as a sub-contractor to help spread the workload (from 2 to 3) building out a very early-stage application for dealing with medical records. I came in and saw the codebase was an absolute wretched mess. DB schema full of junk, wide tables, broken and leaking API routes. I spent the first two weeks just bulletproofing the whole application backend and whipping it into shape before adding new features for a little while and being fired shortly afterwards.

Lesson: Someone else was paying the bills and there wasn't enough visibility/show-off factor for the work I was doing so they couldn't justify continuing to pay me. It doesn't really matter that they couldn't add new features until I fixed things. It only matters that the client couldn't visibly see the work I did.

Scenario 2) I was hired on as a web developer to a company and it immediately came to my attention that a huge, business-critical ETL project was very behind schedule. The development component had a due date three weeks preceding my start date and they didn't have anyone working on it. I asked to take that on, worked like a dog on it and knocked it out of the park. The first three months of my work there immediately saved the company about a half-million dollars. Overall we launched on time and I became point person in the organization for anything related to its data.

Lesson: Come in and kick ass right away and you'll earn a ton of trust in your organization to do the right things the right way.

๐Ÿ‘คbusterarm๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

The OP has so many reasonable, smart-sounding advice that doesn't work in the real world.

1) "Do not fall into the trap of improving both the maintainability of the code or the platform it runs on at the same time as adding new features or fixing bugs."

Thanks. However, in many situations this is simply not possible because the business is not there yet so you need to keep adding new features and fix bugs. And still, the code base has to be improved. Impossible? Almost, but we're paid for solving hard problems.

2) "Before you make any changes at all write as many end-to-end and integration tests as you can."

Sounds cool, except in many cases you have no idea how the code is supposed to work. Writing tests for new features and bugfixes is a good advice (but that goes against other points the OP makes).

3) "A big-bang rewrite is the kind of project that is pretty much guaranteed to fail.

No, it's not. Especially if you're rewriting parts of it at a time as separate modules

My problem with the OP is really that it tells you how to improve a legacy codebase given no business and time pressure.

๐Ÿ‘คsz4kerto๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

It's my turn to disagree with something in the article.

> Before you make any changes at all write as many end-to-end and integration tests as you can.

I'm beginning to see this as a failure mode in and of itself. Once you give people E2E tests it's the only kind of tests they want to write. It takes about 18 months for the wheels to fall off so it can look like a successful strategy. What they need to do is learn to write unit tests, but You have to break the code up into little chunks. It doesn't match their aesthetic sense and so it feels juvenile and contrived. The ego kicks in and you think you're smart enough you don't have to eat your proverbial vegetables.

The other problem is e2e tests are slow, they're flaky, and nobody wants to think about how much they cost in the long run because it's too painful to look at. How often have you see two people huddled over a broken E2E test? Multiply the cost of rework by 2.

๐Ÿ‘คhinkley๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

It is great to see more people sharing their strategies for managing legacy codebases. However, I thought it might be worth commenting on the suggestion about incrementing database counters:

> "add a single function to increment these counters based on the name of the event"

While the sentiment is a good one, I would warn against introducing counters in the database like this and incrementing them on every execution of a function. If transactions volumes are high, then depending on the locking strategy in your database, this could lead to blocking and locking. Operations that could previously execute in parallel independently now have to compete for a write lock on this shared counter, which could slow down throughput. In the worst case, if there are scenarios where two counters can be incremented inside different transactions, but in different sequences (not inconceivable in a legacy code), then you could introduce deadlocks.

Adding database writes to a legacy codebase is not without risk.

If volumes are low you might get away with it for a long time, but a better strategy would probably just to log the events to a file and aggregate them when you need them.

๐Ÿ‘คstephenwilcock๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Are there businesses building automation and tooling for working with legacy codebases? It seems like a really good "niche" for a startup. The target market grows faster every year :)
๐Ÿ‘คartursapek๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Sometimes your inner desires to rewrite it from scratch can be overwhelming.

https://alwaystrending.io/articles/software-engineer-enterta...

๐Ÿ‘คmfrisbie๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Agreed about the pre-requirements: Adding some tests, reproducible builds, logs, basic instrumentations.

Highly disagree about the order of coding. That guy wants to change the platform, redo the architecture, refactor everything, before he starts to fix bugs. That's a recipe for disaster.

It's not possible to refactor anything while you have no clue about the system. You will change things you don't understand, only to break the features and add new bugs.

You should start by fixing bugs. With a preference toward long standing simple issues, like "adding a validation on that form, so the app doesn't crash when the user gives a name instead of a number". See with users for a history of simple issues.

That delivers immediate value. This will give you credit quickly toward the stakeholders and the users. You learn the internals doing, before you can attempt any refactoring.

๐Ÿ‘คuser5994461๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

> add instrumentation. Do this in a completely new database table, add a simple counter for every event that you can think of and add a single function to increment these counters based on the name of the event.

The idea is a good one but the specific suggested implementation .. hasn't he heard of statsd or kibana?

๐Ÿ‘คSideburnsOfDoom๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Healthcare.gov is a good example although not legacy codebase. Anyway, I think fixing small bugs and writing tests are best way to learn how to work with legacy system. This allows me to see what components are easier to rewrite/refactor/add more logging and instrumentation. Business cannot wait months before a bug is fixed just for the sake of making a better codebase. But I agree on database changes should be minimal to none as much as possible. Also, overcommunicate with your downstream customers of your legacy system. They may be using your interface in an unexpected manner.

I have done a number of serious refactoring myself and god tests will save me a huge favor despite I have to bite teeth for a few days to a few weeks.

๐Ÿ‘คyeukhon๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

This should be one of the first tasks that any aspiring career programmer has. It's an essential experience in making a professional.
๐Ÿ‘คmoonbug๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Great advice. Writing integration tests or unit tests around existing functionality is extremely important but unfortunately might not always be feasible given the time, budget, or complexity of the code base. I just completed a new feature for an existing and complex code base but was given the time to write an extensive set of end-to-end integration tests covering most scenarios before starting my coding. This proved invaluable once I started adding my features to give me confidence I wasn't breaking anything and helped find a few existing bugs no one had caught before!
๐Ÿ‘คweef๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Yeah, I've done this. It's frustrating and easy to burn out doing it because progress seems so arbitrary. Legacy upgrades are usually driven by large problems or the desire to add new features. Getting a grip on the code base while deflecting those desires can be hard.

This type of situation is usually a red flag that the company's management doesn't understand the value of maintaining software until the absolutely have to. That, in itself, is an indicator of what they think of their employees.

๐Ÿ‘คdeedubaya๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

WRT architecture: In my experience, you would be lucky if you are free to change the higher level structure of the code without having to dive deeply into the low-level code. Usually, the low-level code is a tangle of pathological dependencies, and you can't do any architectural refactoring without diving in and rooting them out one at a time (I was pulling up ivy this weekend, so I was primed to make this comment!)
๐Ÿ‘คmannykannot๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

I was in this situation more than once.

My actions are usually these:

* Fix the build system, automate build process and produce regular builds that get deployed to production. It's incredible that some people still don't understand the value of the repeatable, reliable build. In one project, in order to build the system you had to know which makefiles to patch and disable the parts of the project which were broken at that particular time. And then they deployed it and didn't touch it for months. Next time you needed to build/deploy it was impossible to know what's changed or if you even built the same thing.

* Fix all warnings. Usually there are thousands of them, and they get ignored because "hey, the code builds, what else do you want." The warning fixing step allows to see how fucked up some of the code is.

* Start writing unit tests for things you change, fix or document. Fix existing tests (as they are usually unmaintained and broken).

* Fix the VCS and enforce sensible review process and history maintenance. Otherwise nobody has a way of knowing what changed, when and why. Actually, not even all parts of the project may be in the VCS. The code, configs, scripts can be lying around on individual dev machines, which is impossible to find without the repeatable build process. Also, there are usually a bunch of branches with various degrees of staleness which were used to deploy code to production. The codebase may have diverged significantly. It needs to be merged back into the mainline and the development process needs to be enforced that prevents this from happening in the future.

Worst of all is that in the end very few people would appreciate this work. But at least I get to keep my sanity.

๐Ÿ‘คalexeiz๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

This says, near the end, "Do not ever even attempt a big-bang rewrite", but aren't a LOT of legacy in-house projects completely blown out of the water by well-maintained libraries of popular, modern languages, that already exist? (In some cases these might be commercial solutions, but for which a business case could be made.)

I'm loath to give examples so as not to constrain your thinking, but, for example, imagine a bunch of hairy Perl had been built to crawl web sites as part of whatever they're doing, and it just so happens that these days curl or wget do more, and better, and less buggy, than everything they had built. (think of your own examples here, anything from machine vision to algabreic computation, whatever you want.)

In fact isn't this the case for lots and lots of domains?

For this reason I'm kind of surprised why the "big bang rewrite" is, written off so easily.

๐Ÿ‘คlogicallee๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Sometimes you get an entire septic tank full of...

Code base that is non-existent, as the previous attempts were done with MS BI (SSIS) tools (for all things SSIS is not for) and/or SQL Stored procedures, with no consistency on coding style, documentation, over 200 hundred databases (sometimes 3 per process that only exist to house a handful of stored procedures), and a complete developer turn over rate of about every 2 years. with Senior leadership in the organization clueless to any technology.

As you look at a ~6000 lines in a single stored procedure. You fight the urge to light the match, and give it some TLC ( Torch it, Level it, Cart it away) to start over with something new.

Moral of the story, As you build, replace things stress to everyone to "Concentrate of getting it Right, instead of Getting it Done!" so you don't add to the steaming pile.

๐Ÿ‘คiamNumber4๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Regarding instrumentation and logging - this can also be used to identify areas of the codebase that can possibly be retired. If it is a legacy application, there are likely areas that aren't used any longer. Don't focus on tests or anything in these areas and possibly deprecate them.
๐Ÿ‘คmatt_s๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

From what I've seen the most common mistake when starting working on a new codebase is to not read it all before doing any change.

I really mean it, a whole lot of programmers simply dont read the codebase before starting a task. Guess the result, specially in terms of frustration.

๐Ÿ‘คquadcore๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

> Before you make any changes at all write as many end-to-end and integration tests as you can.

^ Yes and no. That might take forever and the company might be struggling with cash. I would instead consider adding a metrics dashboard. Basically - find the key points: payments sent, payments cleared, new user, returning user, store opened, etc. THIS isn't as good as a nice integration suite - but if a client is hard on cash and needs help - this can be setup in hours. With this setup - after adding/editing code you can calm investors/ceos'. Alternatively, if it's a larger corp it will be time strapped - then push for the same thing :)

๐Ÿ‘คransom1538๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Any advice on what steps to take when the legacy codebase is incredibly difficult to test?

I completely agree with the sentiment that scoping the existing functionality and writing a comprehensive test suite is important - but how should you proceed when the codebase is structured in such a way that it's almost impossible to test specific units in isolation, or when the system is hardcoded throughout to e.g. connect to a remote database? As far as I can see it'll take a lot of work to get the codebase into a state where you can start doing these tests, and surely there's a risk of breaking stuff in the process?

๐Ÿ‘คlol768๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

I've been a part of several successful big-bang rewrites, and several unsuccessful ones, and saying that if you're smart they're not on the table is just flat out wrong.

The key is an engaged business unit, clear requirements, and time on the schedule. Obviously if one or more of these things sounds ridiculous then the odds of success are greatly diminished. It is much easier if you can launch on the new platform a copy of the current system, not a copy + enhancements, but I've been on successful projects where we launched with new functionality.

๐Ÿ‘คpc86๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Can't say I agree with the big bang rewrite part necessarily - at my last job, I found myself having to do significant refactors. The reason was that each view had its own concept of a model for interacting with various objects, which resulted in a lot of different bugs from one off implementations. My refactor had some near term pain of having to fix various regressions I created, but ultimately it led to much better long term maintenance.
๐Ÿ‘คBahamut๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

I agree with most of this, though I think it doesn't dive into the main problem:

Freezing a whole system is practically impossible. What you usually get is a "piecewise" freeze. As in: you get to have a small portion of the system to not change for a given period.

The real challenge is: how can you split your project in pieces of functionalities that are reasonably sized and replaceable independently from each other.

There is definitely no silver bullet for how to do this.

๐Ÿ‘คd--b๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

> How to Improve a Legacy Codebase When You Have Full Control Over the Project, Infinite Time and Money, and Top-Tier Developers

edit: I'm being a little snarky here, but the assumptions here are just too much. This is all best-case scenario stuff that doesn't translate very well to the vast majority of situations it's ostensibly aimed at.

๐Ÿ‘คalexwebb2๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

>Use proxies to your advantage

At my last gig we used this exact strategy to replace a large ecommerce site piece by piece. Being able to slowly replace small pieces and AB test every change was great. We were able to sort out all of the "started as a bug, is now a feature" issues with low risk to overall sales.

๐Ÿ‘คkevan๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

> Do not ever even attempt a big-bang rewrite

Really? Are there no circumstances under which this would be appropriate? It seems to me this makes assumptions about the baseline quality of the existing codebase. Surely sometimes buying a new car makes more sense than trying to fix up an old one?

๐Ÿ‘คsafek๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Another thing you can do is start recording all requests that cause changes to the system in an event store (a la event sourcing). Once you have this in place, you can use the event stream to project a new read model (e.g.a new, coherent, database structure).
๐Ÿ‘คmacca321๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

The biggest problem in improving legacy codebase is that the people who have involved with have been too long and are completely using old techinques and as a new developer you can not change them, they will change you which means its hard to improve.
๐Ÿ‘คjhgjklj๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

> Yes, but all this will take too much time!

I'm actually quite curious; how long does this process typically take you?

What are the most relevant factors on which it scales? Messiness of existing code? Number of modules/LOC? Existing test coverage?

๐Ÿ‘คrattray๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Genuinely would like to know how anyone has managed to do both of:

> write as many end-to-end and integration tests as you can

and

> make sure your tests run fast enough to run the full set of tests after every commit

๐Ÿ‘คjscn๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Thanks for posting, some excellent high-level advice.
๐Ÿ‘คbtbuildem๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Stick around that startup long enough and this a good set of things to do with your own code.
๐Ÿ‘คjefurii๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

I agree with everything said, but I think they assumed a well-maintained and highly functionality legacy codebase. In my experience, there are a few steps before any of those.

---

1. Find out which functionality is still used and which functionality is critical

Management will always say "all of it". The problem is that what they're aware of is usually the tip of the iceberg in terms of what functionality is supported. In most large legacy codebases, you'll have major sections of the application that have sat unused or disabled for a couple of decades. Find out what users and management actually think the application does and why they're looking to resurrect it. The key is to make sure you know what is business critical functionality vs "nice to have". That may happen to be the portions of the application that are currently deliberately disabled.

Next, figure out who the users are. Are there any? Do you have any way to tell? If not, if it's an internal application, find someone who used it in the past. It's often illuminating to find out what people are actually using the application for. It may not be the application's original/primary purpose.

---

2. Is the project under version control? If not, get something in place before you change anything.

This one is obvious, but you'd be surprised how often it comes up. Particularly at large, non-tech companies, it's common for developers to not use version control. I've inherited multi-million line code bases that did not use version control at all. I know of several others in the wild at big corporations. Hopefully you'll never run into these, but if we're talking about legacy systems, it's important to take a step back.

One other note: If it's under any version control at all, resist the urge to change what it's under. CVS is rudimentary, but it's functional. SVN is a lot nicer than people think it is. Hold off on moving things to git/whatever just because you're more comfortable with it. Whatever history is there is valuable, and you invariably lose more than you think you will when migrating to a new version control system. (This isn't to say don't move, it's just to say put that off until you know the history of the codebase in more detail.)

---

3. Is there a clear build and deployment process? If not, set one up.

Once again, hopefully this isn't an issue.

I've seen large projects that did not have a unified build system, just a scattered mix of shell scripts and isolated makefiles. If there's no way to build the entire project, it's an immediate pain point. If that's the case, focus on the build system first, before touching the rest of the codebase. Even for a project which excellent processes in place, reviewing the build system in detail is not a bad way to start learning the overall architecture of the system.

More commonly, deployment is a cumbersome process. Sometimes cumbersome deployment may be an organizational issue, and not something that has a technical solution. In that case, make sure you have a painless way to deploy to an isolated development environment of some sort. Make sure you can run things in a sandboxed environment. If there are organizational issues around deploying to a development setup, those are battles you need to fight immediately.

๐Ÿ‘คjofer๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

Delete it...

(Speaking from experience from work)

๐Ÿ‘คcrankyadmin๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

> Before you make any changes at all write as many end-to-end and integration tests as you can.

I don't agree with this. People can't write proper coverage for a code base that they 'fully understand'. You will most likely end up writing tests for very obvious things or low hanging fruits; the unknowns will still seep through at one point or another.

Forget about refactoring code just to comply with your tests and breaking the rest of the architecture in the process. It will pass your 'test' but will fail in production.

What you should be doing is:

1. Perform architecture discovery and documentation (helps you with remembering things).

2. Look over last N commits/deliverables to understand how things are integrating with each other. It's very helpful to know how code evolved over time.

3. Identify your roadmap and what sort of impact it will have on the legacy code.

4. Commit to the roadmap. Understand the scope of the impact for your anything you add/remove. Account for code, integrations, caching, database, and documentation.

5. Don't forget about things like jobs and anything that might be pulling data from your systems.

Identifying what will be changing and adjusting your discovery to accommodate those changes as you go is a better approach from my point of view.

By the time you reach the development phase that touches 5% of architecture, your knowledge of 95% of design will be useless, and in six months you will forget it anyways.

You don't cut a tree with a knife to break a branch.

๐Ÿ‘คkorzun๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

First and foremost, do not assume that everyone who ever worked on the code before is a bumbling idiot. assume the opposite.

If it's code that has been running successfully in production for years, be humble.

Bugifxes, shortcuts, restraints - all are real life and prevent perfect code and documentation under pressure.

The team at Salesforce.com is doing a massive re-platforming right now with their switch to Lightning. Should provide a few good stories, switching over millions of paying users, not fucking up billions in revenue.

๐Ÿ‘คpinaceae๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

do refactoring you should have known at the time and not the brand new fangled way to do things, that way each new way fades into the other.
๐Ÿ‘คjlebrech๐Ÿ•‘8y๐Ÿ”ผ0๐Ÿ—จ๏ธ0