The title of the paper is “Cost-Effective Outbreak Detection in Networks.“
Here is their question:
Blog rankings
Rankings are based on the following question: Which blogs should one read to be most up to date, i.e., to quickly know about important stories that propagate over the blogosphere? [emphasis added]
Budget=100 blogs: If we can read 100 blogs, which should I read to be most up to date? Unit cost (each blog costs 1 unit), optimizing the information captured (we want to be the first to know about something with many people blogging about the story after us)
Budget=5000 posts: If we can read the total of 5000 posts, which blogs should one read? Cost of reading a blog is the number of posts it has, we optimize the information captured
Multicriterion solution: We want to read both a small number of blogs and a small number of posts. These results are from the experiment on figure 4(a) from the paper. We find the right budget where value of objective function is 40%. Cost of a blog is a combination of a number of posts (NP) a blog has plus a constant (UC).
Here is their real-life comparison:
The spread of information in the blogosphere: First blog writes a post and then other blogs refer to it. The behavior (information) spreads (cascades) through the network of blogs.
Water distribution networks
[The] same techniques and algorithms as used for blogs also apply to detecting disease outbreaks in water distribution networks. Consider a city water distribution network, delivering water to households via pipes and junctions. Intrusions can cause contaminants to spread over the network, and we want to select a few locations (pipe junctions) to install sensors, in order to detect these contaminations as quickly as possible.
The sensor placements obtained by our algorithm are provably near optimal, providing a constant fraction of the optimal solution. Our approach scales, achieving speedups and savings in storage of several orders of magnitude.
This same link also provides their algorithm and some illustrations, plus links to more detailed information. Don't know that I care for being compard to contaminated water, however. Couldn't they have done something with, say, ice cream?
This is the .pdf of their paper, with illustrations of how the cascades work.
But what is surprising is the list they came up with. Of course, #1 is no surprise at all -Instapundit, of course. But after that, it’s up for grabs:
Here’s some data regarding the parameters of their table:
Top 100 blogs for unit cost case and PA objective function
- PA score : score for the solution of length k
- NP : number of posts of a blog in 2006
- IL : number of inlinks that a blog got from other blogs inside the dataset in 2006
- OLO : number of outlinks to other blogs in the dataset
- OLA : number of all outlinks (also counting links other resources on the web)
The table is below the fold. You’re going to be surprised at some of the blogs that made the list, and some that are noticeably absent.
- - - - - - - - -
k | PA score | Blog | NP | IL | OLO | OLA |
1 | 0.1283 | instapundit.com | 4593 | 4636 | 1890 | 5255 |
2 | 0.1822 | donsurber.blogspot.com | 1534 | 1206 | 679 | 3495 |
3 | 0.2224 | sciencepolitics.blogspot.com | 924 | 576 | 888 | 2701 |
4 | 0.2592 | watcherofweasels.com | 261 | 941 | 1733 | 3630 |
5 | 0.2923 | michellemalkin.com | 1839 | 12642 | 1179 | 6323 |
6 | 0.3152 | blogometer.nationaljournal.com | 189 | 2313 | 3669 | 9272 |
7 | 0.3353 | themodulator.org | 475 | 717 | 1844 | 4944 |
8 | 0.3508 | bloggersblog.com | 895 | 247 | 1244 | 10201 |
9 | 0.3654 | boingboing.net | 5776 | 6337 | 1024 | 6183 |
10 | 0.3778 | atrios.blogspot.com | 4682 | 3205 | 795 | 3102 |
11 | 0.3885 | lawhawk.blogspot.com | 1862 | 463 | 1668 | 6597 |
12 | 0.3984 | gothamist.com | 6223 | 3324 | 1891 | 17172 |
13 | 0.4078 | mparent7777.livejournal.com | 25925 | 199 | 4027 | 47933 |
14 | 0.4163 | wheelgun.blogspot.com | 1174 | 128 | 262 | 939 |
15 | 0.4245 | gevkaffeegal.typepad.com/the_alliance | 302 | 428 | 333 | 2481 |
16 | 0.4318 | anglican.tk | 66 | 66 | 1377 | 3482 |
17 | 0.4384 | micropersuasion.com | 1503 | 2880 | 506 | 5666 |
18 | 0.4444 | pajamasmedia.com | 5007 | 141 | 2920 | 26881 |
19 | 0.4500 | blogher.org | 3302 | 412 | 1587 | 14222 |
20 | 0.4556 | mypetjawa.mu.nu | 1108 | 1733 | 757 | 3609 |
21 | 0.4611 | reddit.com | 2618 | 1940 | 201 | 1117 |
22 | 0.4661 | soccerdad.baltiblogs.com | 814 | 451 | 1137 | 4307 |
23 | 0.4711 | thenoseonyourface.com/the_nose_on_your_face | 400 | 394 | 349 | 1645 |
24 | 0.4759 | ahistoricality.blogspot.com | 441 | 87 | 293 | 805 |
25 | 0.4803 | theanchoressonline.com | 989 | 430 | 1597 | 6358 |
26 | 0.4848 | americablog.blogspot.com | 5786 | 3351 | 331 | 3950 |
27 | 0.4890 | sfist.com | 3068 | 1461 | 1891 | 13203 |
28 | 0.4931 | tbogg.blogspot.com | 1412 | 864 | 5567 | 19396 |
29 | 0.4971 | horsepigcow.com | 516 | 498 | 203 | 1220 |
30 | 0.5009 | whyhomeschool.blogspot.com | 513 | 211 | 205 | 1030 |
31 | 0.5046 | daoureport.salon.com | 2012 | 5255 | 177 | 768 |
32 | 0.5083 | sisu.typepad.com/sisu | 331 | 304 | 293 | 1968 |
33 | 0.5119 | metafilter.com | 5866 | 1277 | 607 | 13374 |
34 | 0.5151 | megite.com | 535 | 33 | 378 | 2422 |
35 | 0.5183 | laist.com | 2651 | 1259 | 1389 | 7680 |
36 | 0.5214 | captainsquartersblog.com/mt | 2623 | 6495 | 517 | 6187 |
37 | 0.5243 | shakespearessister.blogspot.com | 4580 | 2116 | 1386 | 5839 |
38 | 0.5271 | blog.guykawasaki.com | 218 | 1470 | 24 | 311 |
39 | 0.5299 | tryinotocomeundone.blogstream.com | 76 | 183 | 343 | 973 |
40 | 0.5326 | bluestarchronicles.blogspot.com | 180 | 144 | 283 | 1082 |
41 | 0.5352 | googleblog.blogspot.com | 294 | 2815 | 3 | 84 |
42 | 0.5377 | theglitteringeye.com | 924 | 377 | 1088 | 3927 |
43 | 0.5402 | asterisco.paradigma.pt | 2419 | 145 | 521 | 14280 |
44 | 0.5425 | readwriteweb.com | 543 | 1236 | 275 | 1937 |
45 | 0.5448 | digbysblog.blogspot.com | 1784 | 3553 | 574 | 3153 |
46 | 0.5470 | conservativecat.com | 682 | 284 | 916 | 3551 |
47 | 0.5491 | phillyist.com | 1633 | 800 | 1797 | 6328 |
48 | 0.5511 | socialcustomer.com | 279 | 119 | 122 | 889 |
49 | 0.5530 | business2.blogs.com/business2blog | 635 | 343 | 132 | 1801 |
50 | 0.5549 | gatewaypundit.blogspot.com | 2677 | 3172 | 1146 | 6829 |
51 | 0.5567 | crooksandliars.com | 2426 | 2578 | 1275 | 6147 |
52 | 0.5584 | rightwingnews.com | 1975 | 1700 | 891 | 8478 |
53 | 0.5600 | 10000birds.com | 160 | 72 | 46 | 217 |
54 | 0.5617 | radar.oreilly.com | 647 | 1219 | 160 | 2699 |
55 | 0.5632 | cowboyblob.blogspot.com | 1208 | 173 | 145 | 379 |
56 | 0.5648 | business-opportunities.biz | 1419 | 450 | 224 | 4773 |
57 | 0.5663 | dcist.com | 2873 | 1995 | 1346 | 8049 |
58 | 0.5678 | headrush.typepad.com/creating_passionate_users | 159 | 1149 | 45 | 313 |
59 | 0.5693 | legitgov.org | 2810 | 10835 | 473 | 562 |
60 | 0.5707 | whataboutclients.com | 518 | 80 | 220 | 1252 |
61 | 0.5722 | roughtype.com | 365 | 1074 | 101 | 455 |
62 | 0.5736 | tuaw.com | 3656 | 0 | 368 | 34518 |
63 | 0.5750 | aude91.canalblog.com | 375 | 81 | 67 | 208 |
64 | 0.5764 | thelondonfog.blogspot.com | 953 | 117 | 192 | 861 |
65 | 0.5777 | bostonist.com | 1080 | 944 | 1402 | 5001 |
66 | 0.5791 | seattlest.com | 2562 | 1326 | 1367 | 8063 |
67 | 0.5805 | austinist.com | 3113 | 1086 | 1199 | 7531 |
68 | 0.5818 | indianwriting.blogspot.com | 419 | 49 | 48 | 451 |
69 | 0.5831 | powerlineblog.com | 2081 | 2362 | 179 | 1487 |
70 | 0.5844 | firedoglake.blogspot.com | 655 | 1163 | 232 | 1496 |
71 | 0.5857 | elisson1.blogspot.com | 736 | 257 | 200 | 737 |
72 | 0.5869 | rhymeswithright.mu.nu | 1325 | 329 | 1050 | 5583 |
73 | 0.5882 | ragnell.blogspot.com | 403 | 170 | 121 | 689 |
74 | 0.5894 | pulverblog.pulver.com | 934 | 445 | 313 | 5653 |
75 | 0.5906 | mry.blogs.com/les_instants_emery | 558 | 49 | 91 | 1347 |
76 | 0.5918 | gapingvoid.com | 1156 | 905 | 235 | 1752 |
77 | 0.5929 | catymology.blogspot.com | 114 | 56 | 41 | 169 |
78 | 0.5941 | hughhewitt.com | 1330 | 1234 | 500 | 2468 |
79 | 0.5953 | lifehacker.com | 4436 | 2420 | 927 | 16658 |
80 | 0.5964 | jordoncooper.com | 619 | 264 | 229 | 2189 |
81 | 0.5976 | econbrowser.com | 263 | 349 | 210 | 1647 |
82 | 0.5987 | socialitelife.com | 4455 | 1677 | 1400 | 10616 |
83 | 0.5998 | gatesofvienna.blogspot.com | 894 | 1090 | 404 | 1892 |
84 | 0.6009 | nevillehobson.com | 578 | 0 | 384 | 4142 |
85 | 0.6019 | waxy.org/links | 836 | 2093 | 97 | 289 |
86 | 0.6030 | aliferestarted.blogspot.com | 77 | 52 | 95 | 387 |
87 | 0.6040 | volokh.com | 2400 | 1150 | 489 | 2047 |
88 | 0.6051 | library.coloradocollege.edu/steve | 154 | 33 | 85 | 459 |
89 | 0.6061 | drsanity.blogspot.com | 963 | 1419 | 807 | 2269 |
90 | 0.6071 | mudvillegazette.com | 770 | 1351 | 579 | 2902 |
91 | 0.6081 | saysuncle.com | 1992 | 1 | 552 | 4025 |
92 | 0.6091 | privacydigest.com | 1819 | 683 | 543 | 14208 |
93 | 0.6100 | londonist.com | 2624 | 844 | 868 | 6308 |
94 | 0.6110 | shanghaiist.com | 1359 | 1656 | 1292 | 5442 |
95 | 0.6120 | markshea.blogspot.com | 3109 | 551 | 413 | 1750 |
96 | 0.6129 | singleservecoffee.com | 442 | 325 | 237 | 885 |
97 | 0.6139 | jeremy.zawodny.com/blog | 279 | 617 | 84 | 550 |
98 | 0.6148 | scienceblogs.com | 4261 | 1614 | 3168 | 15324 |
99 | 0.6157 | basicthinking.de/blog | 2084 | 410 | 432 | 15046 |
100 | 0.6166 | scobleizer.wordpress.com | 1144 | 757 | 406 | 2487 |
A commenter, Zman Biur, at Soccer Dad said:
“if there’s a best day to read blogs to maximize the information your getting, it’s Friday.”
Who has time to read blogs on Friday? Must be an anti-Semitic algorithm!
“if you only have time to read 100 blogs”
Who on earth has time to read 100 blogs?
Why, bloggers have the time, Mr. Zman Biur.
And commenters also, who like to hang around and share their thoughts but don’t want to deal with the upkeep of a blog. It’s kind of like letting your neighbors kids in to play occasionally because they like your neat “stuff”, but you can send them home when you feel like it.
What I did notice however, was that study said the best time to read blogs is on Friday. It’s been my experience our traffic drops off then. First, lots of people skip work on Friday. Second, we must have more Jewish readers getting ready for Shabbat than I realized.
Cool!
NOTE: I recognize that being on this list does not mean we're actually in the top 100 in virtual reality. What these students were establishing was the most efficient way to use your blog-reading time. That's what this list signifies.
12 comments:
LGF didn't make the list.
Gratified to see The Anchoress made it.
The Daily Kos isn't there, and neither is the Drudge Report...
Blue Star Chronicles??? Wow! Way to go Beth!
The smart students should have checked the addresses before publishing their report. #3 and #88 are not there anymore.
However, I'm still at the same old address, writing the same old boring stuff.
I come to the Gate because it's good writing and you all have lots of stuff to disseminate from parts of the world most think don't matter.
For whatever the list is worth, I am pleased that LGF and HotAir didn't make it. They are heading full tilt down the Danrather Holier-than-thou path.
And commenters also, who like to hang around and share their thoughts but don’t want to deal with the upkeep of a blog. It’s kind of like letting your neighbors kids in to play occasionally because they like your neat “stuff”, but you can send them home when you feel like it.
I no longer maintain my blog because the usefulness/danger ratio is too low in my profession, and because I'm so much less talented than so many others.
That said, this will be the last of my infrequent comments. I wasn't aware that non-bloggers' comments were an annoyance.
Mr. Pasha--
Reading over what you excerpted--
And commenters also, who like to hang around and share their thoughts but don’t want to deal with the upkeep of a blog. It’s kind of like letting your neighbors kids in to play occasionally because they like your neat “stuff”, but you can send them home when you feel like it. --
I can see how it sounds. Infelicitous to say the least. It was *supposed to be* funny. In no way did I mean it to be an exclusion of anyone.
I guess a better way to put it would be that the neighbor's children come in to play because they like your stuff and then they leave when they get bored.
We only started blogging because our comments at Belmont Club were too long and I thought we were hogging the thread sometimes.
I checked and your profile is no longer available so there is no way to apologize or explain. Since you won't be back, there's simply no way to let you know it came out wrong.
However, if you took my comment that way, others obviously might do the same (as Peter Drucker said, "communication is the act of the recipient" so it doesn't matter what I *meant*. What matters is what you took from my comment).
I sure don't want this faux pas to spread...
Indigo red (interesting nic...I'm trying to imagine it)--
Are you kidding? Go thru all one hundred links??? Heck, we're way overdue to houseclean our own blogroll. Anyway, who knows how long it took them to put that table together...
Right now, we do it on a catch-as-catch can basis -- i.e., if I click on someone and the link brings me to an advertising page then that's a clue the blog is no longer registered, so we delete it.
Hey, now that I think of it...did *you* go through all one hundred clicks?
Hmmm... would you like to go through our blogroll? I'll be your friend...
No, seriously, I could send you a gift certificate from Amazon. We have their credit card, which I use for all our expenses. When you get enough points, they send out gift certificates. Now that the future Baron is staying here until grad school and eating us out of house and home I get more certificates than I used to...
(before any of y'all scold me about my irresponsible parenting...yes, he *does* pay room and board, in the form of one week's paycheck a month).
So anyway, are you game for this job?
Conservative Swede and aethling2 --
The fact is that both those blogs are way bigger than we are.
The students weren't looking at size, they were looking at nodes and examined the blogosphere as a cascade, which is a clever way of doing it. This method was premised on the "cost-efficiency" of having x time to read x blogs.
IOW, more bang for your buck.
The students didn't make a mistake. These were rankings for 2006.
My guess about LGF and OTB (another big one that didn't make the cut) is that there was too much overlap between those an others mentioned. I don't know why that would be as both seem to be agenda setters.
Still I wonder, if you changed one or two blogs in the list how much would it affect the others? Would dropping two blogs from the list mean that you'd then have to drop, say, another 3 and then replace those with 5 different blogs? (I would assume that such a dynamic would occur, but can't prove it.) Or would you be able just to remove two and replace those two without any further loss of efficiency?
Soccer Dad--
You were too modest in this comment: you failed to mention that *you* made the list twice.
The first time in the Watcher of Weasels Council, and the second time for your own blog.
That is pretty cool.
This is because LGF and drudge are both accumulators, whereas other blogs on this list generate a lot of stuff.
Not that LGF doesn't generate stuff, just that they tend to be more accumulators.
Almost everything I read here, I haven't seen elsewhere or its in some obscure danish newspaper, etc.
Well gosh, I'm not there! *gasp* oh well...
LOL oh, it's just for 2006 and I hadn't started yet? yeah... that must be it...
ok, I know hardly anyone reads my blog anyway... boohoo... Maybe if I get the job in Iraq and actually have enough time to write things down...
Who cares about LGF? It is an echo chamber over there anyway. And not that good a blog. I like a reading room with, you know, real content. Not some place going, "nyah! Nyah! I'm smarter than all the rest of you plebes!' like LGF seems to be stuck on doing.
Post a Comment