ultimate storage hack

einfach_orangensaft@sh.itjust.works · 2 months ago

ultimate storage hack

MTK@lemmy.world · 2 months ago

If you have a tub full of water and a take a sip, you still have a tub full of water. Therefore only drink in small sips and you will have infinite water.

Water shortage is a scam.

Aux@feddit.uk · 2 months ago

There is a water shortage?

MTK@lemmy.world · 2 months ago

Exactly!

Feyter@programming.dev · 2 months ago

Out of context, but this video showing the amount of freshwater on the planet in perspective was eye opening for me… I see water availability different since.

https://youtu.be/b3_Abb2Vqnc

Ricky Rigatoni@lemm.ee · 2 months ago

Don’t worry, global warming is desalinating the water so it will all be fresh in time 🙏

Kiuyn@lemmy.ml · 2 months ago

If you have a water bottle and only drink half of it each time, you will also have infinite 💦

wizzim@infosec.pub · edit-2 2 months ago

Awesome idea. In base 64 to deal with all the funky characters.

It will be really nice to browse this filesystem…

Aurenkin@sh.itjust.works · 2 months ago

The design is very human

CorvidCawder@sh.itjust.works · 2 months ago

Or use yEnc: https://en.m.wikipedia.org/wiki/YEnc

stochastictrebuchet@sh.itjust.works · 2 months ago

Broke: file names have a max character length.

Woke: split b64-encoded data into numbered parts and add .part-1…n suffix to each file name.

tetris11@lemmy.ml · 2 months ago

each file is minimum 4kb

(base64.length/max_character) * min_filesize < actual_file_size

For this to pay off

Venator@lemmy.nz · 1 month ago

Just use folders instead 😏

The Ramen Dutchman@ttrpg.network · 1 month ago

each file is minimum 4kb

$ touch empty_file
$ ls -l
total 8
-rw-rw-r-- 1 user group 0 may 14 20:13 empty_file
$ wc -c empty_file 
0 empty_file

Huh?

tetris11@lemmy.ml · 1 month ago

Oh, I’m thinking folders aren’t I. Doy…

The Ramen Dutchman@ttrpg.network · 20 days ago

It seems those are 4 KiB on Linux, interesting to know.

mmddmm@lemm.ee · 2 months ago

I’d go with a prefix, so it’s ls-friendly.

psud@aussie.zone · edit-2 2 months ago

Browse your own machine as if it’s under alt.film.binaries but more so

Thorry84@feddit.nl · 2 months ago

It’s all fun and games until your computer turns into a black hole because there is too much information in too little of a volume.

proti@lemmy.world · 2 months ago

Even better! According to no hiding theorem, you can’t destroy information. With black holes you maybe possibly could be able to recover the data as it leaks through the Hawking radiation.
Perfect for long term storage

Klear@lemmy.world · 2 months ago

Can’t wait to hear news about a major site leaking user passwords through hawking radiation.

einfach_orangensaft@sh.itjust.works · 2 months ago

i love this comment

mmddmm@lemm.ee · 2 months ago

Really-long term storage :)

limerod@reddthat.com · 2 months ago

Longer than your lifespan, too.

mmddmm@lemm.ee · 2 months ago

Longer than the life span of the most long-lived star. By orders of magnitude.

kryptonianCodeMonkey@lemmy.world · 2 months ago

I had a manager once tell me during a casual conversation with complete sincerity that one day with advancements in compression algorithms we could get any file down to a single bit. I really didn’t know what to say to that level of absurdity. I just nodded.

friendlymessage@feddit.org · edit-2 2 months ago

That’s the kind of manager that also tells you that you just lack creativity and vision if you tell them that it’s not possible. They also post regularly on LinkedIn

DefederateLemmyMl@feddit.nl · edit-2 2 months ago

You can give me any file, and I can create a compression algorithm that reduces it to 1 bit. (*)

spoiler

(*) No guarantees about the size of the decompression algorithm or its efficacy on other files

The Ramen Dutchman@ttrpg.network · 1 month ago

Here’s a simple command to turn any file into a single b!

echo a > $file_name

einfach_orangensaft@sh.itjust.works · 2 months ago

u can have everthing in a single bit, if the decompressor includes the whole universe

schnurrito@discuss.tchncs.de · 2 months ago

https://xkcd.com/1381/

pressanykeynow@lemmy.world · 2 months ago

Well he’s not wrong. The decompression would be a problem though.

groet@feddit.org · 1 month ago

Yeah with lossy compression the future is today!

Valmond@lemmy.world · 2 months ago

Send him your work: 1 (or 0 ofc)

bluemellophone@lemmy.world · 2 months ago

That’s precisely when you bet on it.

calcopiritus@lemmy.world · 2 months ago

Just make a file system that maps each file name to 2 files. The 0 file and the 1 file.

Now with just a filename and 1 bit, you can have any file! The file is just 1 bit. It’s the filesystems that needs more than that.

VineGram@programming.dev · 1 month ago

Maybe they also believe themselves to be father of computing

Randelung@lemmy.world · edit-2 2 months ago

It’s an interesting question, though. How far CAN you compress? At some point you’ve extracted every information contained and increased the density to a maximum amount - but what is that density?

Couldbealeotard@lemmy.world · 2 months ago

I think by the time we reach some future extreme of data density, it will be in a method of storage beyond our current understanding. It will be measured in coordinates or atoms or fractions of a dimension that we nullify.

Max@lemmy.world · edit-2 2 months ago

This is a really good question!

I believe the general answer is, until the compressed file is indistinguishable from randomness. At that point there is no more redundant information left to compress. Like you said, the ‘information content’ of a message can be measured.

(Note that there are ways to get a file to look like randomness that don’t compress it)

burlemarx@lemmygrad.ml · 2 months ago

How to tell someone you don’t know how compression algorithms work, without telling them directly.

JamonBear@sh.itjust.works · 2 months ago

You want real infinite storage space? Here you go: https://github.com/philipl/pifs

nibbler@discuss.tchncs.de · edit-2 2 months ago

that’s awesome! I’m just migrating all my data to πfs. finally mathematics is put to a proper use!

needanke@feddit.org · 2 months ago

Finally someone uses the fact that compute time is so much cheaper than storage!

And009@lemmynsfw.com · 2 months ago

Breakthrough vibes

qnvx@lemmy.world · edit-2 2 months ago

deleted by creator

groet@feddit.org · 1 month ago

Easy, just replace each byte of data with multiple bytes of metadata. I see no problem here

iknowitwheniseeit@lemmynsfw.com · 2 months ago

Reality is stranger than fiction:

https://www.patrickcraig.co.uk/other/compression.htm

4grams@awful.systems · 2 months ago

That story is immediately what came to mind.

QuazarOmega@lemy.lol · 2 months ago

This was too damn funny for what I expected it to be

ulterno@programming.dev · 2 months ago

Nice stuff.

I got sold on the :

EOF does not consume less space than “5”

because, even though the space taken by the filesystem is the fault of the filesystem, one needs to consider the minimum information requirements of stating starts and ends of files, specially when stuff is split into multiple files.

I would have actually considered the file size information as part of the file size instead (for both the input and the output) because, for a binary file, which can include a string of bits which might match an EOF, causing a falsely ended file, would be a problem. And as such, the contestant didn’t go checking for character == EOF, but used the function that truly tells whether the end of file is reached, which would, then be using the file system’s file size information.

Since the input file was a 3145728 bytes and the output files would have been smaller than that, I would go with 22 bits to store the file size information. This would be in favour of the contestant as:

That would be the minimum (hyh) number of bits required to store the file size, making it as easy as possible for the contestant to make more files
You could actually go with 2 bits, if you predefine MiB to be the unit, but that would make it harder for the contestant, because they will be unable to present file sizes less than 1 MiB, and would have to increase the file size information bits

On the other hand, had the contestant decided to break the file between bits (instead at byte ends), instead of bytes (which, from the code, I think they didn’t) the file size information would require an additional 3 bits.

Now, using this logic, if I check the result:

From the result claimed by the contestant, there were 44 extra bytes (352 bits) remaining.

+ 22 bits for the input file size information - 22*219 bits for the output file size information because 219 files

so the contestant succeeds by 352 + 22 − (22 × 219) = −4444 bits. In other words, fails by 4444 bits.

Now of course, the output file size information might be representable in a smaller number of bits, but to calculate that, I would require downloading the file (which I am not in the mood for.
And in that case, you would require additional information to tell the file size bits. So;

5 bits for the number 22 in the input
5 bits for the size of the file size information (I am feeling this won’t give significant gains) and rest of the bits as stated in the first 5 bits, as the file size bits
- you waste bits for every file size requiring more than 16 bits to store the file size information
- it is possible to get a net gain with this, as qalc says, log(3145728 / 219, 2) = (ln(1048576) − ln(73)) / ln(2) ≈ 13.81017544

But even then, you have 352 + 5 + 22 − (5 + (14 × 219)) = −2692 for the best case scenario in which all output file sizes manage to be under 14 bits of file size informations. More realistically, it would be something around 352 + 5 + 22 − ((5 + 14) × 219) = −3782 because you will the the 5 bits for every file, separately, with the 14 in this case, be a changing value for every file, giving a possibly smaller number.

If instead going with the naive 8 bit EOF that the offerer desired, well, going with 2 consecutive characters instead of a single one, seems doable. As long as you are able to find enough of said 2 characters.
After going on a little google search, I seem to think that in a 3MiB file, there would be either 47 or 383 (depending upon which of my formulae was correct) possible occurrences of the same 2 character combination. Well, you’d need to find the correct combination.

But of course, that’s not exactly compression for a binary file, as I said before, as an EOF is not good enough.

skisnow@lemmy.ca · 2 months ago

I was sort of on Mike Goldman (the challenge giver)'s side until I saw the great point made at the end that the entire challenge was akin to a bar room bet; Goldman had always set it up as a kind of scam from the start and was clearly more than happy to take $100 from anyone who fell for it, and so should have taken responsibility when someone managed to meet the wording of his challenge.

Cratermaker@discuss.tchncs.de · 2 months ago

Yeah, he was bamboozled as soon as he agreed to allow multiple separate files. The challenge was bs from the start, but he could have at least nailed it down with more explicit language and by forbidding any exceptions. I think it’s kind of ironic that the instructions for a challenge related to different representations of information failed themselves to actually convey the intended information.

MBM@lemmings.world · 2 months ago

Both people sound obnoxious lol

Valmond@lemmy.world · 2 months ago

Nice read, thanks!

Typewar@infosec.pub · 2 months ago

Reminds me of a project i stumbled upon the other day using various services like Google drive, Dropbox, cloudflare, discord for simultaneous remote storage. The goal was to use whatever service that has data to upload to, to store content there as a Filesystem.

I only remember discord being one of the weird ones where they would use base512 (or higher, I couldn’t find the library) to encode the data. The thing with discord, is that you’re limited by characters, and so the best way to store data in a compact way is to take advantage of whatever characters that are supported

astrsk@fedia.io · 2 months ago

What about a hard drive made of network pings?

https://m.youtube.com/watch?v=JcJSW7Rprio

2 months ago

I remember a project where someone booted Linux off of Google Drive. Cursed on many levels.

jjagaimo@sh.itjust.works · edit-2 2 months ago

“Harder Drive”

Store the data in pings that constantly get resent to keep the data in the internet

fnrir@lemmy.blahaj.zone · 2 months ago

[email protected]

bitfucker@programming.dev · 2 months ago

I was looking for this comment. I knew someone from Lemmy would have seen that

psud@aussie.zone · 2 months ago

GmailFS was a thing

kate@lemmy.uhhoh.com · 2 months ago

I don’t know if it was this one but it might’ve been this one https://github.com/qntm/base65536

Typewar@infosec.pub · 1 month ago

Yep, that looks like it!

Little8Lost@lemmy.world · edit-2 2 months ago

Stupid BUT: making the font in LibreOffice bigger saves space. so having 11 is readible but by changing the font size to like 500 it can save some mb per page
I dont know how it works, i just noticed it at some point

Edit: i think it was kb, not mb

SkaveRat@discuss.tchncs.de · 2 months ago

per page

I mean, yes. obviously.

If you had 1000 bytes of text on 1 page before, you now have 1byte per page on 1000 pages afterwards

Jankatarch@lemmy.world · 2 months ago

Have a macro that decreases all font size on opening and then increases all again before closing.

Follow me irl for more compression techniques.

InFerNo@lemmy.ml · 2 months ago

You could always diff the XML before and after to see what’s causing it.

Honytawk@feddit.nl · 2 months ago

Good luck with your 256 characters.

DefederateLemmyMl@feddit.nl · 2 months ago

When you run out of characters, you simply create another 0 byte file to encode the rest.

Check mate, storage manufacturers.

PieMePlenty@lemmy.world · edit-2 2 months ago

File name file system! Looks like we broke the universe! Wait, why is my MFT so large?!

barsoap@lemm.ee · edit-2 2 months ago

255, generally, because null termination. ZFS does 1023, the argument not being “people should have long filenames” but “unicode exists”, ReiserFS 4032, Reiser4 3976. Not that anyone uses Reiser, any more. Also Linux’ PATH_MAX of 4096 still applies. Though that’s in the end just a POSIX define, I’m not sure whether that limit is actually enforced by open(2)… man page speaks of ENAMETOOLONG but doesn’t give a maximum.

It’s not like filesystems couldn’t support it it’s that FS people consider it pointless. ZFS does, in principle, support gigantic file metadata but using it would break use cases like having a separate vdev for your volume’s metadata. What’s the point of having (effectively) separate index drives when your data drives are empty.

Brahvim Bhaktvatsal@lemmy.kde.social · edit-2 2 months ago

…Just asking, just asking: Why is the default FILENAME_MAX on Linux/glibc 4096?

barsoap@lemm.ee · 2 months ago

Because PATH_MAX is? Also because it’s a 4k page.

FILENAME_MAX is not safe to use for buffer allocations btw it could be INT_MAX.

Brahvim Bhaktvatsal@lemmy.kde.social · 1 month ago

Thanks! Got an answer and not 200 downvotes. This is why I love Lemm-Lemm.

Anh Kagi@jlai.lu · 2 months ago

this is actually a joke compression algorithm that compresses your data by one byte by appending it to the filename. (and you can execute it as many time as you want)

Too bad I can’t remember the name.

badcommandorfilename@lemmy.world · 2 months ago

gnutrino@programming.dev · 2 months ago

Obligatory “pi hasn’t been proved to be normal”

Sonotsugipaa@lemmy.dbzer0.com · 2 months ago

Relatable tbh

SkaveRat@discuss.tchncs.de · 2 months ago

Pi being irrational and not normal makes it the most relatable number

palordrolap@fedia.io · 2 months ago

Oh, I guarantee that pi is 100% normal. Just not necessarily in the base you want it to be normal in.

gnutrino@programming.dev · 2 months ago

I don’t know of a proof that pi is normal in any base (even non-integer bases) so I’d be interested to see on what basis you can guarantee it.

AwesomeLowlander@sh.itjust.works · 2 months ago

It’s somewhere in pi. Wait a moment while I look it up.

lime!@feddit.nu · 2 months ago

we could also use it as a file system

bstix@feddit.dk · 2 months ago

It’s like that chip tune webpage where the entire track is encoded in the url.

LemmyFeed@lemmy.dbzer0.com · 2 months ago

Link?

PoolloverNathan@programming.dev · 2 months ago

https://beepbox.co/ for example

skisnow@lemmy.ca · 2 months ago

Are you trying to get rickrolled?

hades@lemm.ee · 2 months ago

I remember the first time I ran out of inodes: it was very confusing. You just start getting ENOSPC, but du still says you have half the disk space available.

grrgyle@slrpnk.net · 2 months ago

Ah memories. That was an interesting lesson.

tatterdemalion@programming.dev · 2 months ago

https://github.com/philipl/pifs

WorldsDumbestMan@lemmy.today · 2 months ago

Let me guess, over 30 years old.