Hey there,

I have been a hobbyist programmer for quite some years and have a few smaller projects under my belt: mostly smaller GUI applications that have a few classes at maximum, make use of one or two external libraries and are very thoroughly documented and commented.

Since I love the free software movement and philosophy, I wanted to start contributing to projects I like and help them out.

The thing is, the jump from “hobbyist” to “being able to understand super-efficient compact established repos”… seems to be very hard?

Like, looking into some of these projects, I see dozens upon dozens of classes, header files, with most of them being totally oblique to me. They use syntactic constructs I cannot decipher very well because they have been optimized to irrecognizability, sometimes I cannot even find the starting point of a program properly. The code bases are decades old, use half the obscure compiler and language features, and the maintainers seem to be intimately familiar with everything to the point where I don’t even know what’s what or where to start. My projects were usually like four source files or so, not massive repositories with hundreds of scattered files, external configurations, edge cases, factories of factories, and so on.

If I want to change a simple thing like a placement of a button or - god knows! - introduce a new feature, I would not even remotely know where to start.

Is it just an extreme difficulty spike at this point that I have to trial-and-error through, or am I doing anything wrong?

56 points

I’ve been a dev for 20+ years and yeah, learning a new repo is hard. Here’s some stuff I’ve learned:

Before digging into the code:

  • get the thing running and get familiar with exercising it: test happy path, edge cases, and corner cases. We’re not even looking at code yet; we’re just getting a feel for how it behaves.
  • next up, see if there’s existing documentation. That’s not an end-all solution, but it’s good to see what the people that wrote the thing say about it.

Digging into the code:

  • grep is your very best friend. Pick a behavior or feature you want to try and search for it in the codebase. User-facing strings and log statements are a good place to start. If you’re very lucky, you can trace it down to a line of code and search up and down from there. If you’re unlucky, they’ll take you to a localization package and you’ll have to search based on that ID.
  • git blame is also your very best friend. Once you’ve got an idea where you’re working, use the blame feature on github to tie commits to PRs. This will give you a good idea of what contributing to the PR looks like, and what changes you’ll have to make for an acceptable PR.
  • unit tests are also a good method of stealth documentation. You can see what different areas of the code look like in isolation, what they require, and how they behave.
  • keep your own documentation file with your findings. The act of writing things down reinforces those things in your mind. They’ll be easier to recall and work with.
  • if there’s an official channel for questions / support, make use of it. Try to strike a balance here: you don’t want to blow them up every five minutes, but you also don’t want to churn on a thing for days if there’s an easy answer. This is a good skill to develop in general: knowing when to ask for help, knowing when an answer will actually be helpful, and knowing when to dig for a few minutes first.

There’s no silver bullet. Just keep acquiring information until you’re comfortable.

permalink
report
reply
4 points

This is excellent advice and makes me feel less crazy…

permalink
report
parent
reply
4 points

grep is your very best friend.

This. And also, in many cases, an ‘adjacent’ grep may help. Say you want to move the “OK” button on one screen. Searching for the string “OK” would be overwhelming as that would be all over the shop.

But you notice there’s a “Setup…” button next to it. Searching for that could potentially cut down your search results by orders of magnitude. The more obscure the text, the better.

permalink
report
parent
reply
2 points

Yep! Good point.

permalink
report
parent
reply
42 points

I’m a software engineer by profession and passion and have been writing programs for well over 20 years now. I believe your experience is totally natural - at least I share the same feelings:

  1. Large code bases take time getting to know and understand: most definitely true. It takes time and effort and is an investment you need to make before being able to feel confident. You don’t need to fully comprehend every aspect of the project before you can contribute but you sure need to have a decent enough idea of how to build, test, run and deploy a particular feature. See point (2).

  2. Don’t let the size of the project intimidate you. Start small and expand your knowledge base as you go. Usually one good starting point is simply building the project, running tests and deploying it (if applicable.) Then try to take on simple tasks (eg from the project’s issue tracker) and deliver on those (even things like fixing the installation docs, typos, …) That’ll have the additional impact of making you feel good about the work that you’re doing and what you’re learning. I’m sure at this stage you will “know” when you’re confident enough to work on tasks which are a bit bigger.

  3. During (1) and (2), please please do NOT be tempted to just blindly copy-paste stuff at the first sign of trouble. Instead invest some time and try to understand things, what is failing and why it is so. Once you do, it’s totally fine to copy-paste.

After all, there’s no clear cut formula. Each project is a living and breathing creature and “not one of them is like another.” The only general guideline is patience, curiosity and incremental work.

permalink
report
reply
14 points

Point 3 is so important, not just for large open source projects but also for any project, from small to big, as a hobby or for your job.

Understanding the project will help a lot when fixing issue. You’ll find more easily the root cause of the issue instead of fixing the symptoms.

permalink
report
parent
reply
21 points

Im a programmer for work and honestly feel the same. Everyone says I’m doing a good job but I feel like it takes me forever to understand a new code base. I can’t just read a program and understand it. I need to copy and paste bits and try copy a feature to get my head around it. Like if there’s a button on the GUI then I follow it right the way through creating my own button. But I don’t know if there’s a better way to learn than that

permalink
report
reply
-4 points

When I come across pieces of code I don’t understand just by reading them I like to run them through ChatGPT and ask it what it does.

It does a really good job at explaining them and you can even ask follow up questions and it will go into more detail.

It’s essentially StackOverflow but nobody calls you an idiot for asking stupid questions.

permalink
report
parent
reply
16 points

I would be careful with this advice. If you are asking AI for an explanation of code, you may not have the experience to differentiate when it is correct and confidently incorrect.

permalink
report
parent
reply
9 points

Also be wary about sharing confidential code. At work I don’t use ChatGPT unless it’s for extremely general questions.

permalink
report
parent
reply
1 point

The good thing about code is that explanations can easily be followed up with a quick search in the documentation once you know the terms to look it up.

But you are correct, as with everything related to ChatGPT, don’t let it bullshit you.

permalink
report
parent
reply
17 points
*

Is it just an extreme difficulty spike at this point that I have to trial-and-error through, or am I doing anything wrong?

I would say this is the biggest ‘aha’ moment for pretty much any developer - the first time you go from “I built this myself” to “A team built this and has supported it for 10+ years”. Not only can a team of three or four write a lot of code in ten years - they’ll optimize the Hell out of it. It’s ten years worth of edge case bugs, attempts to go faster, new features, etc. And it’s ‘bumpy’ because some of it was done by Dev A in their own style, some of it by Dev B, and so on. So you’ll find the most beautiful implementation for problems that you haven’t even considered before next to “Hello World” level implementation on something else.

The biggest thing you can do to help yourself out is make sure you’re clear on their branching strategy. When you’re the only one working on your code, it’s cool to push to main and occasionally break things and no harm no foul. But for a mature code base, a butterfly flapping its wings on that obscure constructor can have a blast radius of ‘okay, we have to rebase to the last stable commit’. When in doubt, ‘feature/(what you’re working on)’; but there might be more requirements than that, and it’s okay to ask. Some teams have feature requests tracked by number, on a kanban board, some put it in their username, etc.

Get the code pulled down, get it running on your machine (no small task), git checkout -b from wherever you’re pulling a branch off of (hopefully main or master, but again, it’s okay to ask) and then, figure out what the team’s requirements are for PRs. Do they have any testing environments, besides building it locally? Do they use linting or some other process to enforce style on PR reviews?

And then…don’t move a button. (Unless that button actually needs moved!) But try to mimic something that already exists. Create a second button in the new location. Steal from the codebase - implement something small in a way that has been done before. After the new button works - then remove the old button and see what happens.

The longer you deal with a codebase (and the attendant issues and feedback) the more you’ll feel yourself drawn to certain parts of the code that you’re familiar with.

Anyway, hope that advice helps! But most of all, don’t be scared. You will break things unintentionally. Your code will break things. If there’s not a process in place to catch it before it happens, that’s not your fault; that’s the senior dev/owners fault. But do try to limit the damage by using good branching strategies, only PRing after linting/testing, and otherwise following the rules.

permalink
report
reply
15 points

I’ve been working with software for 15 years and still feel like this when faced with a new codebase - it simply doesn’t want to make sense to me. As others have stated, codebases are living things, and are as much a map of previous developers minds as the are about being functional. The older a project is, the more convoluted and obscure the structure becomes due to changes, adaptations, new features and changing contributors.

Some developers seem to enjoy making their code obscenely difficult to understand, either because it actually makes sense to them that way, or because it makes them feel smarter. These projects are better left alone for the sake of your own sanity. If you encounter dozens of header files, walk away. C (or C++) are high performance languages, and projects are using that language for a reason. If you have no experience with them, the result is very unlikely to make any sense to you.

I’ve also found it quite difficult to find any project small enough to help on. The large projects have many contributors, and any manageable bugs are quickly fixed, leaving only the stuff that no one wants to touch.

Is there some sort of hobby you enjoy, where an open source tool is (or could be) used? The more obscure the better! Having some prior understanding of the subject usually makes understanding the codebase a little easier.

permalink
report
reply
13 points

Some developers seem to enjoy making their code obscenely difficult to understand, either because it actually makes sense to them that way, or because it makes them feel smarter.

Be wary about this mindset. This type of explanation sets you up for conflicts with existing developers. Several times I’ve seen developers coming into a team and complain about the code, creating conflicts that can last the entire working relationship for no good reason.

Much of the time the people who constantly work with code are already aware of the problems and may not be happy with it, but there’s no time or big benefit in improving working code. Or it’s complicated for good reasons which may not be immediately apparent. (ie. inherent complexity).

Here are a couple of benign reasons which probably will serve you much better.

  1. It’s much more difficult and time consuming to make code that is easy to understand. Even in open source, there’s a limited amount of time to spend on any particular thing. This explanation is like a variation of Twain’s “I didn’t have time to write a short letter, so I wrote a long one instead.”, or more abrasively Hanlon’s razor “Never attribute to malice that which is adequately explained by stupidity time pressure”.

  2. When writing the code, the developer has the entire context of his thought process available. You don’t have that, and that’s also the reason why your own code can make no sense a while later. Also it’s just much harder to read code than to write it.

permalink
report
parent
reply
2 points
*

While I agree with all of the above in principle (and even I have trouble reading my own code at times), this part was specifically in response to the section about ‘code optimized to irrecognizability’ and should not be taken as a general statement on finding other people’s code incomprehensible. Deliberately using non-descriptive naming is unfortunately a thing, although thankfully I rarely seem to encounter it anymore.

permalink
report
parent
reply
2 points

And sometimes coding habits are obtuse to people with different coding habits. These habits aren’t bad per service, but can be difficult to grok.

permalink
report
parent
reply

Programming

!programming@programming.dev

Create post

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person’s post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you’re posting long videos try to add in some form of tldr for those who don’t want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



Community stats

  • 3.5K

    Monthly active users

  • 1.7K

    Posts

  • 28K

    Comments