Hey there!

I’m a chemical physicist who has been using python (as well as matlab and R) for a lot of different tasks over the last ~10 years, mostly for data analysis but also to automate certain tasks. I am almost completely self-taught, and though I have gotten help and tips from professors throughout the completion of my degrees, I have never really been educated in best practices when it comes to coding.

I have some friends who work as developers but have a similar academic background as I do, and through them I have become painfully aware of how bad my code is. When I write code, it simply needs to do the thing, conventions be damned. I do try to read up on the “right” way to do things, but the holes in my knowledge become pretty apparent pretty quickly.

For example, I have never written a class and I wouldn’t know why or where to start (something to do with the init method, right?). I mostly just write functions and scripts that perform the tasks that I need, plus some work with jupyter notebooks from time to time. I only recently got started with git and uploading my projects to github, just as a way to try to teach myself the workflow.

So, I would like to learn to be better. Can anyone recommend good resources for learning programming, but perhaps that are aimed at people who already know a language? It’d be nice to find a guide that assumes you already know more than a beginner. Any help would be appreciated.

54 points

Approach programming with the same seriousness that you’d expect a programmer to approach your field with. You say yourself you just want it to “do the thing, conventions be damned”.

Well how would you feel if someone entered your lab or whatever and treated the tools of your trade that way?

permalink
report
reply
14 points

I would agree if OP was trying to get a job as a developer, however I don’t think they are.

It’s more like you used a beaker for something and shook it to mix water and salt, it’s not the recommended way, but it’s fine.

permalink
report
parent
reply
5 points

At some point, they’re gonna have to debug it.

permalink
report
parent
reply
1 point

This isn’t a good assumption for researchers.

permalink
report
parent
reply
30 points

Read your own code that you wrote a month ago. For every wtf moment, try to rewrite it in a clearer way. With time you will internalize what is or is not a good idea. Usually this means naming your constants, moving code inside function to have a friendly name that explain what this code does, or moving code out of a function because the abstraction you choose was not a good one. Since you have 10 years of experience it’s highly possible that you already do that, so just continue :)

If you are motivated I would advice to take a look to Rust. The goal is not really to be able to use it (even if it’s nice to be able able to write fast code to speed up your python), but the Rust compiler is like a very exigeant teacher that will not forgive any mistakes while explaining why it’s not a good idea to do that and what you should do instead. The quality of the errors are crutial, this is what will help you to undertand and improve over time. So consider Rust as an exercice to become a better python programmer. So whatever you try to do in Rust, try to understand how it applies to python. There are many tutorials online. The official book is a good start. And in general learning new languages with a very different paradigm is the best way to improve since it will help you to see stuff from a new angle.

permalink
report
reply
19 points
*

Most of the “conventions” (which are normally just “good practices”) are there to make the software easier to maintain, to make teamwork more efficient, to manage complexity in large code-bases, to reduce the chance of mistakes and to give a little boost in productivity.

For example, using descriptive names for variables (i.e. “sampleDataPoints” rather than “x”) reduces the chances of mistakes due to confusing variables (especially in long stretches of code) and allows others (and yourself if you don’t look at that code for many months) to pick up much faster what’s going on there in order to change it. Dividing your code into functions, on the other hand, promotes reusability of the same code in many places without the downsides of copy & paste of the same code all over the place, such as growing the code base (which makes it costlier to maintain) and, worse, unwittingly copying and pasting bugs so now you have to fix the same stuff in several places (and might even forget one or two) rather than just fixing it in that one function.

Stuff at a higher, software design level, such as classes, are mean to help structure the code into self-contained blocks with clear well controlled ways of interaction between them, thus reducing overall complexity (everything potentially connecting to everything else is the most complex web of connection you could have) increasing productivity (less stuff to consider at any one point whilst doing some code, as it can’t access everything), reduce bugs (less possibility of mistakes when certain things can only be changed by only a certain part of the code) and make it easier for others to use your stuff (they don’t need to know how your classes works, only to to talk to them, like a mini library). That said, it’s perfectly feasible to achieve a similar result as classes without using classes and using scope only, though more advance features of classes such as inheritance won’t be possible to easilly emulate like that.

That said, if your programs are small, pretty much one use (i.e. you don’t have to keep on using them for years) and you’re not having to work on the code as a team, you can get away with not using most “conventions” (certainly the design level stuff) with only the downside of some loss in productivity (you lose code clarity and simplification, which increases the likelihood of bugs and makes it slower to transverse and spot stuff in the code when you have to go back and forth to change things).

I’ve worked with people who weren’t programmers but did code (namelly with Quants in Finance) and they’re simply not very good at doing what is but a secondary job for them (Quants mainly do Mathematical modelling) which is absolutelly normal because unlike with actual Developers, doing code well and efficiently is not what their focus has been in for years.

permalink
report
reply
16 points

The O’Reilly “In a Nutshell” and “Pocket Guide to” books are great for folks who can already code, and want to pick up a related tool or a new language.

The Pocket Guide to Git is an obvious choice in your situation, if you don’t already have it.

As others have mentioned, you’re allowed to ignore the team stuff. In git this means you have my permission to commit directly to the ‘main’ branch, particularly while you’re learning.

Lessons that I’ve learned the hard way, that apply for someone scripting alone:

  • git will save your ass. Get in the habit of using if for everything ASAP, and it’ll be there when you need it
  • find that one friend who waxes poetic about git, and keep them close. Usually listening politely to them wax poetically about git will do the trick. Five minutes of their time can be a real life saver later. As that friend, I know when you’re using me for my git-fu, and I don’t mind. It’s hard for me to make friends, perhaps because I constantly wax poetically about git.
  • every code swan starts as an ugly duck that got the job done.
  • print(f"debug: {what_the_fuck_is_this}") is a valid pattern that seasoned professionals still turn to. If you’re in a code environment that doesn’t support it, then it’s a bad code environment.
  • one peer who reads your code regularly will make you a minimum of 5x more effective. It’s awkward as hell to get started, but incredibly worth it. Obviously, you traditionally should return the favor, even though you won’t feel qualified. They don’t really feel qualified either, so it works out. (Soure: I advise real scientists about their code all the time. It’s still wild to me that they, as actual scientists, listen to me - even after I see how much benefit I provide.)
permalink
report
reply
3 points

Along a similar vain to making a git friend, buy your sysadmins/ops people a box of doughnuts once in a while. They (generally) all code and will have some knowledge of what you are working on.

permalink
report
parent
reply
1 point

That is great advice that has served me well, as well!

permalink
report
parent
reply
3 points

print(f"debug: {what_the_fuck_is_this}") is a valid pattern that seasoned professionals still turn to. If you’re in a code environment that doesn’t support it, then it’s a bad code environment.

I’ve been known to print things to the console during development, but it’s like eating junk food. It’s better to get in the habit of using a logging framework. Insufficient logging has been in the OWASP Top 10 for a while so you should be logging anyway. Why not logger.debug("{what_the_fuck_is_this}") or get fancy with some different frameworks and logger.log(SUPER_LOW_LVL, "{really_what_the_fuck_is_this}")

You also get the bonus of not going back and cleaning up all the print statements afterward. All you have to do is set the running log level to INFO or something to turn all that off. There was a reason you needed to see that stuff in the first place. If you ever need to see all that stuff again the change the log level to whatever grain you need it.

permalink
report
parent
reply
3 points
*

Absolutely true.

And you make a great point that: print(f"debug: {what_the_fuck_is_this}") should absolutely be maturing into logger.log(SUPER_LOW_LVL, "{really_what_the_fuck_is_this}")

Unfortunately I have found that when print(“debug”) isn’t working, usually logging isn’t setup correctly either.

In a solidly built system, a garbage print line will hit the logs and raise several alerts because it’s poorly formatted - making it easy for the developer to find.

Sadly, I often see the logging setup so that poorly formatted logs go nowhere, rather than raising alerts until they’re fixed. This inevitably leads to both debug logs being lost and critical but slightly misformatted logs being lost.

Your point is particularly valuable when it’s time to get the system fixed, because it’s easier to say “logging needs to work” than “fix my stupid printf”, even though they’re roughly equivalent.

Edit: And getting back to the scripting scientist context, scripting scientists still have my formal official permission to just say “just make my print(‘debug’) work”.

permalink
report
parent
reply
15 points

As one physicist to another, the most important thing in the code are long variable names (descriptive) and comments.

We usually do not do multi-people multi year projects, so all other comments in this page especially the ones coming from programmers are not that relevant. Classes are cool, but they are not needed and often obscure clarity of algorithmic/functional programming.

S. Wolfram (creator of Mathematica) said something along these lines (paraphrasing) if you are writing real code in Mathematica - you are doing something wrong.

permalink
report
reply
4 points
*

We usually do not do multi-people multi year projects

Seriously - why not?

Say you’re doing an experiment, wouldn’t it be nice if someone else could repeat that experiment? Maybe in 3 years? in 30 years? in 3,000 years time? And maybe they could use your code instead of writing it themselves and possibly getting it wrong?

If something is worth doing, then it is worth doing properly.

Classes are cool, but they are not needed and often obscure clarity

I write code all day professionally. A lot of my code doesn’t use classes. I agree they often “obscure clarity”.

But sometimes they do the opposite - they make things crystal clear. It’s important to know how to use classes and even more important to know when to use them. I guarantee some of the work you do could benefit from a few simple classes. They don’t need to be complex - I wrote a class the earlier today that is only four lines of code. And yes, a class was apropriate.

permalink
report
parent
reply
2 points

The reason they don’t do multi people and multi year coding projects has nothing to do with repeatability of the experiments, most science coding is done through simple-ish code that uses existing libraries, doesn’t code them. That code is usually stored in notebooks (jupyter, zeppelin) or simple scripts.

For science code, it usually falls in the realm of data analysis, and as a data engineer, let me tell you that the analysis part of the job is usually very ad-hoc modifications of the script and live coding though notebooks and such.

The part where whatever conclusion of the research is then transformed into a functioning application, taking care of naming conventions, the architecture of the system where the input data, the transformations, the postprocessing and such is done, is usually done by another team of dedicated data engineers or software developers.

I guess that it would be helpful for the analysis part to have standardized templates for data extraction and such, but usually the tools used in the research portion of the process and the implementation portion are completely different (python with tensorflow vs C++ with openvino or whatever cloud based) so it’s not really fair to load the architecture design since the beginning.

permalink
report
parent
reply
1 point

You know how changing requirements is the bane of real™ production grade™ software?

In science requirements change all the time. You write some 50-100 lines to plot your results. You realize that the effect is not visible, so you go back to the lab, change 5 variables and run the test again. Some quick code changes and you see the effect. Perfect. Now you do the measurement as a function of temperature. You adapt your script, you indent the data processing code to turn your list of files into a list of characteristics parameters and adapt the plotting. You run the experiment for the three samples you have prepared and compare their plots. Some more experiments tuning and corresponding script tuning is required. You take the characteristic parameters that your code (grew to 200 lines now, but whatever) calculated and write a new script to take that array and plot it nicely.

Now someone wants to repeat the experiment 4 years later. The measurement equipment changed and the data format is slightly different. It’s impossible to document the exact state of the hardware in your code anyway. They are actually interested in a different effect and want to plot that as well, but they need the effect/characteristics parameters that are shown by your code as a sanity check. They need to rewrite 125 of the 200 lines.

There never is a finished product that is worth maintaining long term. Everyone using the script has to understand the domain precisely anyway. Is it worth it to reuse the old code when you need to rewrite more than half of it anyway?

Don’t get me wrong, code reuse does happen. But it’s much more “oh, I wrote that three months ago somewhere else” ctrl-c ctrl-v. “Ok, now I need to change five lines in this function to adapt to the new thing I’m trying to do.” It makes absolutely no sense to write that function in a abstracted way. Every time you use it the requirements changed and the abstraction is no longer valid anyway.

permalink
report
parent
reply
1 point

They need to rewrite 125 of the 200 lines.

And I guarantee, it is much easier to write 200 new lines than change 125 out of 200 lines in somebodies code. No matter how nicely that code is written.

permalink
report
parent
reply
1 point

Great potatoes… This is not very good advice. Ok for prototypes that are intended to be discarded shortly after writing. Nothing more.

permalink
report
parent
reply
2 points

Yes, those prototypes are the goal here.

permalink
report
parent
reply
1 point

Cool! Have fun! I wouldn’t worry about a lot of code quality opinions then. Especially if somebody is looking at prototypes and thinking they are not prototypes haha

permalink
report
parent
reply

Programming

!programming@programming.dev

Create post

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person’s post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you’re posting long videos try to add in some form of tldr for those who don’t want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



Community stats

  • 3.1K

    Monthly active users

  • 1.8K

    Posts

  • 30K

    Comments