Sunday, October 28, 2007

So Who Are the Top 100 Blogs? Not Who I’d Have Thought

This Carnegie Mellon Computer Science study won the prize for “Best Student Paper.”

The title of the paper is “Cost-Effective Outbreak Detection in Networks.

Here is their question:

Blog rankings

Rankings are based on the following question: Which blogs should one read to be most up to date, i.e., to quickly know about important stories that propagate over the blogosphere? [emphasis added]

Budget=100 blogs: If we can read 100 blogs, which should I read to be most up to date? Unit cost (each blog costs 1 unit), optimizing the information captured (we want to be the first to know about something with many people blogging about the story after us)

Budget=5000 posts: If we can read the total of 5000 posts, which blogs should one read? Cost of reading a blog is the number of posts it has, we optimize the information captured

Multicriterion solution: We want to read both a small number of blogs and a small number of posts. These results are from the experiment on figure 4(a) from the paper. We find the right budget where value of objective function is 40%. Cost of a blog is a combination of a number of posts (NP) a blog has plus a constant (UC).

Here is their real-life comparison:

The spread of information in the blogosphere: First blog writes a post and then other blogs refer to it. The behavior (information) spreads (cascades) through the network of blogs.

Water distribution networks

[The] same techniques and algorithms as used for blogs also apply to detecting disease outbreaks in water distribution networks. Consider a city water distribution network, delivering water to households via pipes and junctions. Intrusions can cause contaminants to spread over the network, and we want to select a few locations (pipe junctions) to install sensors, in order to detect these contaminations as quickly as possible.

The sensor placements obtained by our algorithm are provably near optimal, providing a constant fraction of the optimal solution. Our approach scales, achieving speedups and savings in storage of several orders of magnitude.

This same link also provides their algorithm and some illustrations, plus links to more detailed information. Don't know that I care for being compard to contaminated water, however. Couldn't they have done something with, say, ice cream?

This is the .pdf of their paper, with illustrations of how the cascades work.

But what is surprising is the list they came up with. Of course, #1 is no surprise at all -Instapundit, of course. But after that, it’s up for grabs:

Here’s some data regarding the parameters of their table:

Top 100 blogs for unit cost case and PA objective function

  • PA score : score for the solution of length k
  • NP : number of posts of a blog in 2006
  • IL : number of inlinks that a blog got from other blogs inside the dataset in 2006
  • OLO : number of outlinks to other blogs in the dataset
  • OLA : number of all outlinks (also counting links other resources on the web)

The table is below the fold. You’re going to be surprised at some of the blogs that made the list, and some that are noticeably absent.
- - - - - - - - -

k PA score Blog NP IL OLO OLA 
0.1283 instapundit.com 4593 4636 1890 5255 
0.1822 donsurber.blogspot.com 1534 1206 679 3495 
0.2224 sciencepolitics.blogspot.com 924 576 888 2701 
0.2592 watcherofweasels.com 261 941 1733 3630 
0.2923 michellemalkin.com 1839 12642 1179 6323 
0.3152 blogometer.nationaljournal.com 189 2313 3669 9272 
0.3353 themodulator.org 475 717 1844 4944 
0.3508 bloggersblog.com 895 247 1244 10201 
0.3654 boingboing.net 5776 6337 1024 6183 
10 0.3778 atrios.blogspot.com 4682 3205 795 3102 
11 0.3885 lawhawk.blogspot.com 1862 463 1668 6597 
12 0.3984 gothamist.com 6223 3324 1891 17172 
13 0.4078 mparent7777.livejournal.com 25925 199 4027 47933 
14 0.4163 wheelgun.blogspot.com 1174 128 262 939 
15 0.4245 gevkaffeegal.typepad.com/the_alliance 302 428 333 2481 
16 0.4318 anglican.tk 66 66 1377 3482 
17 0.4384 micropersuasion.com 1503 2880 506 5666 
18 0.4444 pajamasmedia.com 5007 141 2920 26881 
19 0.4500 blogher.org 3302 412 1587 14222 
20 0.4556 mypetjawa.mu.nu 1108 1733 757 3609 
21 0.4611 reddit.com 2618 1940 201 1117 
22 0.4661 soccerdad.baltiblogs.com 814 451 1137 4307 
23 0.4711 thenoseonyourface.com/the_nose_on_your_face 400 394 349 1645 
24 0.4759 ahistoricality.blogspot.com 441 87 293 805 
25 0.4803 theanchoressonline.com 989 430 1597 6358 
26 0.4848 americablog.blogspot.com 5786 3351 331 3950 
27 0.4890 sfist.com 3068 1461 1891 13203 
28 0.4931 tbogg.blogspot.com 1412 864 5567 19396 
29 0.4971 horsepigcow.com 516 498 203 1220 
30 0.5009 whyhomeschool.blogspot.com 513 211 205 1030 
31 0.5046 daoureport.salon.com 2012 5255 177 768 
32 0.5083 sisu.typepad.com/sisu 331 304 293 1968 
33 0.5119 metafilter.com 5866 1277 607 13374 
34 0.5151 megite.com 535 33 378 2422 
35 0.5183 laist.com 2651 1259 1389 7680 
36 0.5214 captainsquartersblog.com/mt 2623 6495 517 6187 
37 0.5243 shakespearessister.blogspot.com 4580 2116 1386 5839 
38 0.5271 blog.guykawasaki.com 218 1470 24 311 
39 0.5299 tryinotocomeundone.blogstream.com 76 183 343 973 
40 0.5326 bluestarchronicles.blogspot.com 180 144 283 1082 
41 0.5352 googleblog.blogspot.com 294 2815 84 
42 0.5377 theglitteringeye.com 924 377 1088 3927 
43 0.5402 asterisco.paradigma.pt 2419 145 521 14280 
44 0.5425 readwriteweb.com 543 1236 275 1937 
45 0.5448 digbysblog.blogspot.com 1784 3553 574 3153 
46 0.5470 conservativecat.com 682 284 916 3551 
47 0.5491 phillyist.com 1633 800 1797 6328 
48 0.5511 socialcustomer.com 279 119 122 889 
49 0.5530 business2.blogs.com/business2blog 635 343 132 1801 
50 0.5549 gatewaypundit.blogspot.com 2677 3172 1146 6829 
51 0.5567 crooksandliars.com 2426 2578 1275 6147 
52 0.5584 rightwingnews.com 1975 1700 891 8478 
53 0.5600 10000birds.com 160 72 46 217 
54 0.5617 radar.oreilly.com 647 1219 160 2699 
55 0.5632 cowboyblob.blogspot.com 1208 173 145 379 
56 0.5648 business-opportunities.biz 1419 450 224 4773 
57 0.5663 dcist.com 2873 1995 1346 8049 
58 0.5678 headrush.typepad.com/creating_passionate_users 159 1149 45 313 
59 0.5693 legitgov.org 2810 10835 473 562 
60 0.5707 whataboutclients.com 518 80 220 1252 
61 0.5722 roughtype.com 365 1074 101 455 
62 0.5736 tuaw.com 3656 368 34518 
63 0.5750 aude91.canalblog.com 375 81 67 208 
64 0.5764 thelondonfog.blogspot.com 953 117 192 861 
65 0.5777 bostonist.com 1080 944 1402 5001 
66 0.5791 seattlest.com 2562 1326 1367 8063 
67 0.5805 austinist.com 3113 1086 1199 7531 
68 0.5818 indianwriting.blogspot.com 419 49 48 451 
69 0.5831 powerlineblog.com 2081 2362 179 1487 
70 0.5844 firedoglake.blogspot.com 655 1163 232 1496 
71 0.5857 elisson1.blogspot.com 736 257 200 737 
72 0.5869 rhymeswithright.mu.nu 1325 329 1050 5583 
73 0.5882 ragnell.blogspot.com 403 170 121 689 
74 0.5894 pulverblog.pulver.com 934 445 313 5653 
75 0.5906 mry.blogs.com/les_instants_emery 558 49 91 1347 
76 0.5918 gapingvoid.com 1156 905 235 1752 
77 0.5929 catymology.blogspot.com 114 56 41 169 
78 0.5941 hughhewitt.com 1330 1234 500 2468 
79 0.5953 lifehacker.com 4436 2420 927 16658 
80 0.5964 jordoncooper.com 619 264 229 2189 
81 0.5976 econbrowser.com 263 349 210 1647 
82 0.5987 socialitelife.com 4455 1677 1400 10616 
83 0.5998 gatesofvienna.blogspot.com 894 1090 404 1892 
84 0.6009 nevillehobson.com 578 384 4142 
85 0.6019 waxy.org/links 836 2093 97 289 
86 0.6030 aliferestarted.blogspot.com 77 52 95 387 
87 0.6040 volokh.com 2400 1150 489 2047 
88 0.6051 library.coloradocollege.edu/steve 154 33 85 459 
89 0.6061 drsanity.blogspot.com 963 1419 807 2269 
90 0.6071 mudvillegazette.com 770 1351 579 2902 
91 0.6081 saysuncle.com 1992 552 4025 
92 0.6091 privacydigest.com 1819 683 543 14208 
93 0.6100 londonist.com 2624 844 868 6308 
94 0.6110 shanghaiist.com 1359 1656 1292 5442 
95 0.6120 markshea.blogspot.com 3109 551 413 1750 
96 0.6129 singleservecoffee.com 442 325 237 885 
97 0.6139 jeremy.zawodny.com/blog 279 617 84 550 
98 0.6148 scienceblogs.com 4261 1614 3168 15324 
99 0.6157 basicthinking.de/blog 2084 410 432 15046 
100 0.6166 scobleizer.wordpress.com 1144 757 406 2487 

A commenter, Zman Biur, at Soccer Dad said:

“if there’s a best day to read blogs to maximize the information your getting, it’s Friday.”

Who has time to read blogs on Friday? Must be an anti-Semitic algorithm!

“if you only have time to read 100 blogs”

Who on earth has time to read 100 blogs?

Why, bloggers have the time, Mr. Zman Biur.

And commenters also, who like to hang around and share their thoughts but don’t want to deal with the upkeep of a blog. It’s kind of like letting your neighbors kids in to play occasionally because they like your neat “stuff”, but you can send them home when you feel like it.

What I did notice however, was that study said the best time to read blogs is on Friday. It’s been my experience our traffic drops off then. First, lots of people skip work on Friday. Second, we must have more Jewish readers getting ready for Shabbat than I realized.

Cool!

NOTE: I recognize that being on this list does not mean we're actually in the top 100 in virtual reality. What these students were establishing was the most efficient way to use your blog-reading time. That's what this list signifies.

12 comments:

Conservative Swede said...

LGF didn't make the list.

atheling2 said...

Gratified to see The Anchoress made it.

The Daily Kos isn't there, and neither is the Drudge Report...

Blue Star Chronicles??? Wow! Way to go Beth!

Indigo Red said...

The smart students should have checked the addresses before publishing their report. #3 and #88 are not there anymore.

However, I'm still at the same old address, writing the same old boring stuff.

I come to the Gate because it's good writing and you all have lots of stuff to disseminate from parts of the world most think don't matter.

Gordon Pasha said...

For whatever the list is worth, I am pleased that LGF and HotAir didn't make it. They are heading full tilt down the Danrather Holier-than-thou path.

And commenters also, who like to hang around and share their thoughts but don’t want to deal with the upkeep of a blog. It’s kind of like letting your neighbors kids in to play occasionally because they like your neat “stuff”, but you can send them home when you feel like it.

I no longer maintain my blog because the usefulness/danger ratio is too low in my profession, and because I'm so much less talented than so many others.

That said, this will be the last of my infrequent comments. I wasn't aware that non-bloggers' comments were an annoyance.

Dymphna said...

Mr. Pasha--

Reading over what you excerpted--

And commenters also, who like to hang around and share their thoughts but don’t want to deal with the upkeep of a blog. It’s kind of like letting your neighbors kids in to play occasionally because they like your neat “stuff”, but you can send them home when you feel like it. --

I can see how it sounds. Infelicitous to say the least. It was *supposed to be* funny. In no way did I mean it to be an exclusion of anyone.

I guess a better way to put it would be that the neighbor's children come in to play because they like your stuff and then they leave when they get bored.

We only started blogging because our comments at Belmont Club were too long and I thought we were hogging the thread sometimes.

I checked and your profile is no longer available so there is no way to apologize or explain. Since you won't be back, there's simply no way to let you know it came out wrong.

However, if you took my comment that way, others obviously might do the same (as Peter Drucker said, "communication is the act of the recipient" so it doesn't matter what I *meant*. What matters is what you took from my comment).

I sure don't want this faux pas to spread...

Dymphna said...

Indigo red (interesting nic...I'm trying to imagine it)--

Are you kidding? Go thru all one hundred links??? Heck, we're way overdue to houseclean our own blogroll. Anyway, who knows how long it took them to put that table together...


Right now, we do it on a catch-as-catch can basis -- i.e., if I click on someone and the link brings me to an advertising page then that's a clue the blog is no longer registered, so we delete it.

Hey, now that I think of it...did *you* go through all one hundred clicks?

Hmmm... would you like to go through our blogroll? I'll be your friend...

No, seriously, I could send you a gift certificate from Amazon. We have their credit card, which I use for all our expenses. When you get enough points, they send out gift certificates. Now that the future Baron is staying here until grad school and eating us out of house and home I get more certificates than I used to...

(before any of y'all scold me about my irresponsible parenting...yes, he *does* pay room and board, in the form of one week's paycheck a month).

So anyway, are you game for this job?

Dymphna said...

Conservative Swede and aethling2 --

The fact is that both those blogs are way bigger than we are.

The students weren't looking at size, they were looking at nodes and examined the blogosphere as a cascade, which is a clever way of doing it. This method was premised on the "cost-efficiency" of having x time to read x blogs.

IOW, more bang for your buck.

Soccer Dad said...

The students didn't make a mistake. These were rankings for 2006.

My guess about LGF and OTB (another big one that didn't make the cut) is that there was too much overlap between those an others mentioned. I don't know why that would be as both seem to be agenda setters.

Still I wonder, if you changed one or two blogs in the list how much would it affect the others? Would dropping two blogs from the list mean that you'd then have to drop, say, another 3 and then replace those with 5 different blogs? (I would assume that such a dynamic would occur, but can't prove it.) Or would you be able just to remove two and replace those two without any further loss of efficiency?

Dymphna said...

Soccer Dad--

You were too modest in this comment: you failed to mention that *you* made the list twice.

The first time in the Watcher of Weasels Council, and the second time for your own blog.

That is pretty cool.

Doc Merlin said...

This is because LGF and drudge are both accumulators, whereas other blogs on this list generate a lot of stuff.
Not that LGF doesn't generate stuff, just that they tend to be more accumulators.
Almost everything I read here, I haven't seen elsewhere or its in some obscure danish newspaper, etc.

falcon_01 said...

Well gosh, I'm not there! *gasp* oh well...
LOL oh, it's just for 2006 and I hadn't started yet? yeah... that must be it...

ok, I know hardly anyone reads my blog anyway... boohoo... Maybe if I get the job in Iraq and actually have enough time to write things down...

Robohobo said...

Who cares about LGF? It is an echo chamber over there anyway. And not that good a blog. I like a reading room with, you know, real content. Not some place going, "nyah! Nyah! I'm smarter than all the rest of you plebes!' like LGF seems to be stuck on doing.