Avatar

houseofleft

houseofleft@slrpnk.net
Joined
1 posts • 24 comments
Direct message

I’m a data engineer, use parquet all the time and absolutely love love love it as a format!

arrow (a data format) + parquet, is particularly powerful, and lets you:

  • Only read the columns you need (with a csv your computer has to parse all the data even if afterwards you discard all but one column)

  • Use metadata to only read relevant files. This is particularly cool abd probably needs some unpacking. Say you’re reading 10 files, but only want data where “column-a” is greater than 5. Parquet can look at file headers at run time, and figure out if a file doesn’t have any column-a values over five. And therefore, never have to read it!.

  • Have data in an unambigious format that can be read by multiple programming languages. Since CSV is text, anything reading it will look at a value like “2022-04-05” and say “oh, this text looks like dates, let’s see what happens if I read it as dates”. Parquet contains actual data type information, so it will always be read consistently.

If you’re handling a lot of data, this kind of stuff can wind up making a huge difference.

permalink
report
reply

This is so true!

I think people are so in love with the idea of “innovation” because secretly we all just know that it means “easy-fix” and that sounds a lot better than “hard work”.

permalink
report
parent
reply

Oh nice! I didn’t know about it- thanks for the link

permalink
report
parent
reply
11 points

Well not for the people taking them, but you can make heaps of cash doling then out! (sarcasm)

permalink
report
reply

Yeah, should be clear that I don’t think choosing not to have children makes you in any way a climate facist.

I totally hear you on thinking those things won’t have an effect. But I would say this: the only people who benefit from climate change activism being a lost cause, are the people looking to exploit our planet. Will you or me or a big group of us stop climate change in its tracks? Sadly no. But the future isn’t written, and we can still do a lot to mitigate the worst impacts and hold corporations to account.

permalink
report
parent
reply

I’m a data engineer, and have seen an ungodly ammount of 200-but-actually-no-stuff-is-broken errors and it’s the bane of my life!

We have generic code to handle pulling in api data, and transforming it. It’s obviously check the status code, but any time an API implements this we have to choose between:

  • having code fail wierdly further down the line because can’t parse the status
  • adding in some kind of insane if not response.ok or "actually no there's an error really" in response.content logic

Every time you ignore protocols and invent your own, you are making everyone sad.

Will take recommendations of support groups I can join for victims of terrible apis.

permalink
report
reply

I’ve heard this argument a lot, and honestly in scares me for a bunch of reasons. It feels like flirting with climate facism, but more than that, it feels like giving up on the world as a whole, and I don’t think that helps.

If you care about climate change, get involved in activism, vote for policies that will make a difference, do whatever you can to make the future a place that isn’t a burden to inhabit.

permalink
report
reply

Take a look at retropi, which is more or less what you’re talking about!

Depending what you’re wanting to get out the project:

  • You might be happy just using retropi
  • You might be happy working on top of retropi
  • You might want to build something from scratch and just use retropi as a refence

Anywag, I’ll stop being a shill now and just give you the link: https://retropie.org.uk/

permalink
report
reply

I lile this a lot. This reminds me a lot of KQL (a microsoft query language that’s used for a bunch if azure logging).

I use a lot of python pandas/dask- I’ve definitely got used to viewing a table as a series of operations to perform rather than the kind of declarative queries you get in SQL.

At what point is it no longer SQL? If we’re changing fundamental stuff, I’d love a way of writing loops or if statements that isn’t painful too.

permalink
report
reply

I though this would be some kind of scifi future Venice type thing, and was pretty stoked. Even more exciting that it’s a real project!

I surf and it’s amazing just how many beaches aren’t always safe to swim at, let alone city rivers and lakes. I think we forget how surreal it is how little lives in those waters.

permalink
report
reply