Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

Stopthatgirl7@lemmy.world · 4 months ago

Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

gcheliotis@lemmy.world · 3 months ago

The real question here is why the researcher “librarian” didn’t even attempt to anonymize the dataset before making it available. Full anonymization isn’t a trivial task, but at least removing unique identifiers or replacing them with randomly generated ones would be good practice.