That seems like an awful lot

asimd23 · Post by **asimd23** » Tue Feb 11, 2025 3:43 am

Let’s assume an average of 300 classifiable Naughty-Nice interactions per day. , but you’ll notice that they add up when you start paying attention. Each event includes a description, categorization, detailed metrics used for scoring, timestamp, source, and the identifier for the media file. In addition, a link to the individual is stored in an associative table. Let’s assume that all of that requires 300 bytes for a single interaction (again, probably an overestimation). That works out to 32 MB brazil whatsapp number data per person per year. Double that because events from both the current and previous year are stored. Now, multiply that by a billion individuals and that comes to 65 petabytes. Now we’re getting into some pretty big numbers. Of course, it’s distributed so not all of that space is needed in the same place. Furthermore, nearly all of the data elements are easily and highly compressible so it seems that reducing the size by a factor of five is doable, bringing the size down to about 13 petabytes.

This data is stored as long as the individual is in the active database, or 15 years whichever is longer. That’s nearly 200 petabytes. Probably need to go back and do some compression studies to get that number down some more. Deduplication might help as well.

Locations:

The ISO 19160-1 address specification does not mandate field lengths. Using the maximum number of possible characters for each address component would require more than 1000 bytes for each address. By using variable length character strings, we won’t be storing anywhere close to that maximum. Nearly all addresses are less than 100 characters, meaning that even a billion addresses would require just 93 gigabytes.