Reddit: 'We Are in the Early Stages of Monetizing Our User Base'

rinze@infosec.pub · 9 个月前

Reddit: 'We Are in the Early Stages of Monetizing Our User Base'

TheOneCurly@lemm.ee · 9 个月前

I wonder what the risks are to including deleted and pre-edited content in training data. Most of the edits are going to be typos and formatting, do you want 2-3 copies of the same message with typos in them for training data? Similarly, deleted comments are mostly nonsense, unhelpful, duplicate, or highly controversial things.

If someone wants to dig through and find individual users to restore that’s one thing, but I don’t think I’d immediately choose to train off of that other data unless I had to.

nutomic@lemmy.ml · 9 个月前

It should be very easy to distinguish edits and deletes which were made within a few minutes or hours after writing a comment, from those made months or years later right around the reddit blackout.