I work at an open science NGO that has been approached by a developer who is interested in making a public portal [email protected] BOINC project. The basic idea is that scientific researchers could submit workunits for a number of popular science apps (molecular modeling etc) to be distributed and crunched by volunteers using BOINC. Essentially, a way for them to harness the petaflops of free processing power offered by the BOINC network without having to run their own BOINC server.
In order to make the proof-of-concept, we are looking for an science app which:
- Is open source and runs on Linux
- Has a large userbase
- Has no way to get access to compute power for it for free or cheap without making your own HPC cluster or other work distribution system
- Takes a long compute time and takes at least a couple hours to run on a standard consumer machine
- Can have tasks split into smaller sub-tasks to be run on several machines
Bonus points for:
- Tasks have a determinate output, meaning the results are the same for a given task no matter what kind of computer they are run on. This makes it easy to verify the work is done correctly by simply making sure two tasks have the same output.
- App that is cross-discipline (applicable to multiple areas of science) or relevant to biology/health research
Any suggestions?
If you would be interested in helping with development for this project (Python/React, Remix (Vercel or Netlify)) let us know too
Isn’t this the same model as Folding @ Home? How did they accomplish it? Their model is great, here’s a docker image, linux install, windows install, whatever you have and it’ll just spin up
BOINC and folding@home are similar in how they work in that they are both volunteer computing platforms. Folding@home is closed source, proprietary, run by a single research group, BOINC is open source and used by many different research groups working on various topics from health to astrophysics.
The idea here is to build a BOINC server where researchers can submit work to the server and get results without having to run their own BOINC server in the process.
Bioinformatics perhaps? But I think a lot of those are just specific analyses done in notebooks. Being able to submit a notebook and have it computed though would be pretty handy I Imagine
Bioinformatics
This is an excellent field for our proof of concept, just looking for a specific app to start with.
I’m a bioinformatician. The problem with using bioinformatics software here is that the input or output data size is huge for most tasks, which makes submitting jobs off site much more difficult.
Bacterial genome assembly isn’t too bad though. I use Nanopore sequencing data and the input is usually on the order of a few gigabytes per task for an output file of a few megabytes. (pulling numbers outta my butt, but shouldn’t be too far off) But the multiplying this by 48 or 96 which is the number of samples out machine can run all at the same time and you’re getting into hundreds of gigabytes for input data. It’s just tough to manage this with cloud services.
But if you go simpler, you could offer a BLAST server. You just need to host your own database and accept queries. Not sure if you can split it into smaller tasks though. If you segment the main database your p-value results will change.
Is that data compressible? A few GB for an input or output file isn’t entirely unmanageable from our perspective. Not ideal, but workable. What are some popular OSS tools used in your field?
snakemake is a popular tool to define analysis workflows for bioinformatics (also another equivalent called nextflow), there is also KBase, which is a webui for running different jobs, not sure if that is open source.
DM me. This is essentially what the company I work for does for the media industry. We can chat.