>>33882971 here. I hauled ass last month and hacked together a daily tally for VOD views 24 hours after stream end. It's based off the same time period as the daily CCV tally where the stream has to start between 00:00:00 and 23:59:59 JST. However I am also automatically disqualifying VODs that are longer than 12 hours and those that have been privated within 24 hours after stream end, such as unarchived streams. This is because the views from streams like these are unreliable due to unusual processing time and missing data.The only hard cutoff is 36 hours after midnight, because that's the latest a VOD can be and still be qualified for my tally (i.e. a stream that starts at midnight and goes for 12 hours).
I am also periodically recording views throughout the 24 hours in 3 hour intervals to check growth rate. This interval was chosen because most VODs don't grow enough to justify being tracked more than this and also so I don't blow all my API quota. I'm going to use this to tack on an additional stat at the end of every line on the tally to show how much each VOD grew from 3 to 24 hours after stream end. Given the average length of a stream that starts at time, you can probably expect a tally for a certain day at about 5PM or 6PM EST the next day.
I have a few questions though:
1. What should the cutoff be for views? I was thinking either 100k, 150k, or at most 200k.
2. Should premieres on the podium be removed or not? This would require manual intervention because the YouTube API sucks and doesn't have any distinction between streams and premieres. I could implement something to automatically flag or filter abnormally short streams, like those below 20 minutes. There would be some false positives though, like streams that have an ending condition that gets reached quickly, meme gimmick streams, or streams that were started on accident.
3. I scraped vrabi and I'm tracking the same vtubers that they are, however I want to include as many vtubers as I can. Is there an organized list of notable indies and small corpos outside of vrabi and vstats?
Pic is the result of a database query for a portion of the data I'm tracking.