Saturday 21 December 2013

Top 5 favourite blogs of chemistry bloggers

I'm always looking for ways to free up time on the webs, and my New Year's resolution is to avoid reading all Top N lists. These ubiquitous lists somehow compel me to read the contents no matter how trivial. No more. Well, no more after this, my own top N list.

A list of all chemistry blogs is maintained by Peter Maas over at Chemical Blogspace (CB). You can subscribe to one of the feeds over there to keep on top of all of them at the same time (or a relevant subset). If you have a chemistry blog and it's not on there, you should submit it to the site and your readership will balloon overnight.

Keeping on top of new chemistry blogs is tricky though. One way I thought to find new blogs was to collate the blog rolls of all of the existing blogs on CB; to a first approximation one can do this by collating all of the links on each blog's front page. This gives some idea of which blogs other bloggers read.

The Top 5 are Derek Lowe's "In the Pipeline", Egon Willighagen's "Chem-bla-ics", Paul Bracher's "ChemBark", Milkshake's "Orp Prep Daily" and Nature Chemistry's "The Sceptical Chymist".

And here are the raw results (code at the end of post) sorted by frequency of occurrence.
34 http://pipeline.corante.com/
31 https://subscribe.wordpress.com/
27 http://wordpress.org/
19 http://chem-bla-ics.blogspot.com/
18 http://blog.chembark.com/
17 http://wordpress.com/
16 http://orgprepdaily.wordpress.com/
16 http://blogs.nature.com/thescepticalchymist/
15 http://www.chemspider.com/blog/
12 http://gaussling.wordpress.com/
12 http://ashutoshchemist.blogspot.com/
12 http://www.blogger.com/
11 http://depth-first.com/
11 http://wwmm.ch.cam.ac.uk/blogs/murrayrust/
11 http://www.chemistry-blog.com/
10 http://www.thechemblog.com/
10 http://curlyarrow.blogspot.com/
10 http://usefulchem.blogspot.com/
10 http://www.statcounter.com/
9 http://miningdrugs.blogspot.com/
9 http://cultureofchemistry.blogspot.com/
9 http://chemjobber.blogspot.com/
9 http://www.sciencebase.com/science-blog/
9 http://scienceblogs.com/moleculeoftheday/
9 http://www.chemspider.com/
8 http://kinasepro.wordpress.com/
8 http://cenblog.org/
8 http://liquidcarbon.livejournal.com/
8 http://chemistrylabnotebook.blogspot.com/
7 http://www.chemicalforums.com/
7 http://omicsomics.blogspot.com/
7 http://transitionstate.wordpress.com/
7 http://blog.chemicalforums.com/
7 http://homebrewandchemistry.blogspot.com/
6 http://www.fieldofscience.com/
6 http://www.coronene.com/blog/
6 http://blog.everydayscientist.com/
6 http://totallymechanistic.wordpress.com/
6 http://scienceblogs.com/insolence/
6 http://www.bccms.uni-bremen.de/cms/
6 http://www.researchblogging.org/
6 http://www.nature.com/
6 http://outonalims.wordpress.com/
6 http://www.totallysynthetic.com/blog/
6 http://www.tiberlab.com/
6 http://www.rscweb.org/blogs/cw/
5 http://wiki.cubic.uni-koeln.de/cb/
5 http://organometallics.blogspot.com/
5 http://baoilleach.blogspot.com/
5 http://wavefunction.fieldofscience.com/
5 http://atompusher.blogspot.com/
5 http://feedburner.google.com/
5 http://chemical-quantum-images.blogspot.com/
5 http://propterdoc.blogspot.com/
5 http://allthingsmetathesis.com/
5 http://cb.openmolecules.net/
5 http://the-half-decent-pharmaceutical-chemistry-blog.chemblogs.org/
5 http://www.ch.ic.ac.uk/rzepa/blog/
5 http://usefulchem.wikispaces.com/
4 http://www.rsc.org/chemistryworld/
4 http://justlikecooking.blogspot.com/
4 http://synthreferee.wordpress.com/
4 http://mndoci.com/blog/
4 http://scienceblogs.com/pharyngula/
4 http://jmgs.wordpress.com/
4 http://pubchem.ncbi.nlm.nih.gov/
4 http://walkerma.wordpress.com/
4 http://gmc2007.blogspot.com/
4 http://purl.org/dc/elements/1.1/
4 http://coronene.blogspot.com/
4 http://naturalproductman.wordpress.com/
4 http://blog.rguha.net/
4 http://syntheticnature.wordpress.com/
4 http://www.organic-chemistry.org/
4 http://www.emolecules.com/
4 http://www.livejournal.com/
4 http://carbontet.blogspot.com/
4 http://therealmoforganicsynthesis.blogspot.com/
4 http://syntheticenvironment.blogspot.com/
4 http://totallymedicinal.wordpress.com/
4 http://comporgchem.com/blog/
4 http://www.jungfreudlich.de/
4 http://graphiteworks.wordpress.com/
4 http://chemicalcrystallinity.blogspot.com/
4 http://www.ebi.ac.uk/
4 http://scienceblogs.com/principles/
4 http://sanjayat.wordpress.com/
4 http://greenchemtech.blogspot.com/
4 http://www.feedburner.com/
4 https://www.ebi.ac.uk/chembl/
3 http://blogs.nature.com/
3 http://chiraljones.wordpress.com/
3 http://pubs.acs.org/journals/joceah/
3 http://cen07.wordpress.com/
3 http://www.natureasia.com/
3 http://masterorganicchemistry.com/
3 http://chem-eng.blogspot.com/
3 http://pubs.acs.org/journals/orlef7/
3 http://molecularmodelingbasics.blogspot.com/
3 http://chemistswithoutborders.blogspot.com/
3 http://www.typepad.com/
3 http://wordpress.org/extend/ideas/
3 http://www.sciencebase.com/
3 http://codex.wordpress.org/
3 http://chemicalmusings.blogspot.com/
3 http://chemicalblogspace.blogspot.com/
3 http://wordpress.org/extend/themes/
3 http://drexel-coas-elearning.blogspot.com/
3 http://cenblog.org/terra-sigillata/
3 http://planet.wordpress.org/
3 http://l-stat.livejournal.com/
3 http://wordpress.org/extend/plugins/
3 http://jmol.sourceforge.net/
3 http://chembl.blogspot.com/
3 http://theme.wordpress.com/themes/regulus/
3 http://scientopia.org/blogs/ethicsandscience/
3 http://www.google.com/
3 http://youngfemalescientist.blogspot.com/
3 http://www.badscience.net/
3 http://paulingblog.wordpress.com/
3 http://l-api.livejournal.com/
3 http://waterinbiology.blogspot.com/
3 http://gmpg.org/xfn/
3 http://totallysynthetic.com/blog/
3 http://chemicalmusings.wordpress.com/
3 http://liberalchemistry.blogspot.com/
3 http://www.hdreioplus.de/wordpress/
3 http://pubs.acs.org/
3 http://wordpress.org/news/
3 http://verpa.wordpress.com/
3 http://www.opentox.org/
3 http://pubs.acs.org/journals/jacsat/
3 http://www.chemical-chimera.blogspot.com/
3 http://www.kilomentor.com/
3 http://invivoblog.blogspot.com/
3 http://blog.khymos.org/
3 http://www.livejournal.com/search/
3 http://chemicalsabbatical.blogspot.com/
3 http://wordpress.org/support/
3 http://scienceblogs.com/clock/
3 http://profmaster.blogspot.com/
3 http://www.orgsyn.org/
3 http://www.surechembl.org/
3 http://www.ebi.ac.uk/chebi/
3 http://scienceblogs.com/aetiology/
3 http://www.mazepath.com/uncleal/
3 http://syntheticremarks.com/
3 http://www.nature.com/nchem/
2 http://www.syntheticpages.org/
2 http://www.eyeonfda.com/
2 http://genchemist.wordpress.com/
2 http://cenblog.org/newscripts/2013/12/amusing-news-aliquots-128/
2 http://blogs.scientificamerican.com/the-curious-wavefunction/2013/10/09/computational-chemistry-wins-2013-nobel-prize-in-chemistry/
2 http://cenblog.org/just-another-electron-pusher/
2 http://www.uu.se/
2 http://bugs.bioclipse.net/
2 http://www.ch.cam.ac.uk/magnus/
2 http://archive.tenderbutton.com/
2 http://cenblog.org/the-safety-zone/2013/12/csb-report-on-chevron-refinery-fire-urges-new-regulatory-approach/
2 http://pubs.acs.org/cen/
2 http://interfacialscience.blogspot.com/
2 http://www.blogtopsites.com/
2 http://www.amazingcounters.com/
2 http://www.chemheritage.org/
2 http://drexelisland.wikispaces.com/
2 http://intermolecular.wordpress.com/
2 http://www.aldaily.com/
2 http://www.plos.org/
2 http://kashthealien.wordpress.com/
2 http://web.expasy.org/groups/swissprot/
2 http://theme.wordpress.com/themes/enterprise/
2 http://cenboston.wordpress.com/
2 http://infiniflux.blogspot.com/
2 http://laserjock.wordpress.com/
2 http://bkchem.zirael.org/
2 http://www.qdinformation.com/qdisblog/
2 http://syntheticorganic.blogspot.com/
2 http://www.sciencebasedmedicine.org/
2 http://scienceblogs.com/pontiff/
2 http://www.chemistryguide.org/
2 http://pubs.acs.org/journals/jmcmar/
2 http://blog.openwetware.org/scienceintheopen/
2 http://cenblog.org/transition-states/
2 http://theme.wordpress.com/themes/contempt/
2 http://www.fiercebiotech.com/
2 http://www.paulbracher.com/blog/
2 http://scienceblogs.com/ethicsandscience/
2 http://tripod.nih.gov/
2 http://sciencegeist.net/
2 http://spectroscope.blogspot.com/
2 http://kilomentor.chemicalblogs.com/
2 http://pharmagossip.blogspot.com/
2 http://chembioinfo.com/
2 http://philipball.blogspot.com/
2 http://browsehappy.com/
2 http://www.realclimate.org/
2 http://chem.vander-lingen.nl/
2 http://cenblog.org/the-safety-zone/2013/12/lab-safety-is-critical-in-high-school-too/
2 http://joaquinbarroso.com/
2 http://eristocracy.co.uk/brsm/
2 http://daneelariantho.wordpress.com/
2 http://blogs.discovermagazine.com/cosmicvariance/
2 http://johnirwin.docking.org/
2 http://www.scienceblog.com/cms/
2 http://www.chemistry-blog.com/2013/12/11/number-11-hydrogen/
2 http://thebioenergyblog.blogspot.com/
2 http://www.eclipse.org/
2 http://www.ebyte.it/stan/
2 http://scienceblogs.com/
2 http://boscoh.com/
2 http://www.nature.com/nchem/journal/v6/n1/
2 http://cdavies.wordpress.com/
2 http://chemistandcook.blogspot.com/
2 http://www.rheothing.blogspot.com/
2 http://openbabel.org/
2 http://www.metabolomics2012.org/
2 http://cenblog.org/grand-central/
2 http://www.scilogs.es/
2 http://openflask.blogspot.com/
2 http://d3js.org/
2 http://stuartcantrill.com/
2 http://blogs.discovermagazine.com/gnxp/
2 http://www.bioclipse.net/
2 http://rajcalab.wordpress.com/
2 http://bacspublish.blogspot.com/
2 https://cszamudio.wordpress.com/
2 http://scienceblogs.com/sciencewoman/
2 http://theorganicsolution.wordpress.com/2013/12/12/my-top-10-chemistry-papers-of-2013/
2 http://chem242.wikispaces.com/
2 http://icpmassspectrometry.blogspot.com/
2 http://theme.wordpress.com/themes/ocean-mist/
2 http://theme.wordpress.com/themes/andreas09/
2 http://impactstory.org/
2 http://blog.metamolecular.com/
2 http://lamsonproject.org/
2 http://agilemolecule.wordpress.com/
2 http://proteinsandwavefunctions.blogspot.com/
2 http://cenblog.org/newscripts/2013/12/heirloom-chemistry-set/
2 http://beautifulphotochemistry.wordpress.com/
2 http://www.etracker.com/
2 http://chem241.wikispaces.com/
2 http://practicalfragments.blogspot.com/
2 http://www.nobelprize.org/nobel_prizes/chemistry/laureates/2013/
2 http://researchblogging.org/
2 http://retractionwatch.wordpress.com/
2 http://u-of-o-nmr-facility.blogspot.com/
2 http://www3.interscience.wiley.com/cgi-bin/jhome/26293/
2 http://scienceblogs.com/goodmath/
2 http://creativecommons.org/licenses/by-nc-sa/3.0/
2 http://cenblog.org/the-haystack/
2 http://www.simbiosys.com/
2 http://www.steinbeck-molecular.de/steinblog/
2 http://chemistry.about.com/
2 http://cen.acs.org/
2 http://cniehaus.livejournal.com/
2 http://chem.chem.rochester.edu/~nvd/
2 http://www.chemtube3d.com/
2 http://news.google.com/
2 http://theme.wordpress.com/themes/digg3/
2 http://theme.wordpress.com/themes/mistylook/
2 http://www.wordpress.org/
2 http://luysii.wordpress.com/
2 http://disqus.com/
2 http://openbabel.sourceforge.net/
2 http://networkedblogs.com/
2 http://www.chemspy.com/
2 http://www.openphacts.org/
2 http://cic-fachgruppe.blogspot.com/
2 http://weconsent.us/
2 http://cdktaverna.wordpress.com/
2 http://gilleain.blogspot.com/
2 http://scienceblogs.com/scientificactivist/
2 http://synchemist.blogspot.com/
2 http://www.compchemhighlights.org/
2 http://acdlabs.typepad.com/elucidation/
2 http://cdk.sf.net/
2 http://www.phds.org/
2 http://zinc.docking.org/
2 http://www.ch.imperial.ac.uk/rzepa/blog/
2 http://www.cas.org/
2 http://brsmblog.com/
2 http://www.sciencetext.com/
2 http://altchemcareers.wordpress.com/
2 http://theeccentricchemist.blogspot.com/
2 http://www.sciscoop.com/
2 http://www.agile2robust.com/
2 http://mwclarkson.blogspot.com/
2 http://www.jcheminf.com/
2 http://www.tns-counter.ru/V13a****sup_ru/ru/UTF-8/tmsec=lj_noncyr/
2 http://www.slideshare.net/
2 http://scienceblogs.com/eruptions/
2 http://www.scilogs.com/
2 http://scienceblogs.com/seejanecompute/
2 http://blog.tenderbutton.com/
2 http://www.amazingcounter.com/
2 http://creativecommons.org/licenses/by/3.0/
2 http://scienceblogs.com/greengabbro/
2 http://madskills.com/public/xml/rss/module/trackback/
2 http://edheckarts.wordpress.com/
2 http://www.scilogs.fr/
2 http://feed.informer.com/
2 http://scienceblogs.com/chaoticutopia/
2 http://www.chemaxon.com/
2 http://www.rsc.org/mpemba-competition/
2 http://neksa.blogspot.com/
2 http://orgchem.livejournal.com/
2 http://scienceblogs.com/transcript/
2 http://drexel-coas-elearning-transcripts.blogspot.com/
2 http://pmgb.wordpress.com/
2 http://www.acs.org/
2 http://www.scienceblogs.com/
2 http://www.milomuses.com/chemicalmusings/
2 http://www.the-scientist.com/
2 http://calvinus.wordpress.com/
2 http://www.pandasthumb.org/

import re
import os
import urllib
from collections import defaultdict

from bs4 import BeautifulSoup

def seedFromCB():
    if not os.path.isdir("CB"):
        os.mkdir("CB")
        url = "http://cb.openmolecules.net/blogs.php?skip=%d"
        N = 0
        while True:
            page = urllib.urlopen(url % N)
            # Need to remove apostrophes from tag URLs or BeautifulSoup will choke
            html = page.read().replace("you'll", "youll").replace("what's", "whats")
            if html.find("0 total") >= 0: break
            print >> open(os.path.join("CB", "CB_%d.html" % N), "w"), html
            N += 10
    N = 0
    allurls = []
    while True:
        filename = os.path.join("CB", "CB_%d.html" % N)
        if not os.path.isfile(filename): break

        soup = BeautifulSoup(open(filename).read())
        divs = soup.find_all("div", class_="blogbox_byline")
        urls = []
        for div in divs:
            children = div.find_all("a")
            anchor = children[2]
            urls.append(anchor['href'])

        print filename, len(urls)
        allurls.extend(urls)
        N += 10

    notfound = ["http://imagingchemistry.com", "http://www.caspersteinmann.dk"]
    allurls = [url for url in allurls if url not in notfound]
    return allurls

def url2file(url):
    for x in "/:?":
        url = url.replace(x, "_")
    return url

def norm(url):
    for x in [".org", ".com", ".es", ".fr", ".net"]:
        if url.endswith(x):
            return url + "/"
    for txt in ["index.html", "index.php", "index.htm", "blog.html"]:
        if url.endswith(txt):
            return url[:-len(txt)]
    if url == "http://www.corante.com/pipeline/":
        return "http://pipeline.corante.com/"
    return url

def getAllLinks(urls):
    if not os.path.isdir("Blogs"):
        os.mkdir("Blogs")
    for url in urls:
        filename = os.path.join("Blogs", url2file(url))
        print filename
        if not os.path.isfile(filename):
            html = urllib.urlopen(url).read()
            print >> open(filename, "w"), html

    countlinks = defaultdict(int)
    for url in urls:
        filename = os.path.join("Blogs", url2file(url))
        links = re.findall('"((http|ftp)s?://.*?)"', open(filename).read())
        links = set([x[0] for x in links if x[1]=='http'])
        for link in links:
            countlinks[norm(link)] += 1
    return countlinks

if __name__ == "__main__":
    blogURLs = seedFromCB()

    countlinks = getAllLinks(blogURLs)

    tmp = countlinks.items()
    tmp.sort(key=lambda x:x[1], reverse=True)
    err = open("err.txt", "w")
    for x, y in tmp:
        if y > 1:
            if x.endswith("/") and not (y==6 and "accelrysin x) and not (y==3 and "fieldofsciencein x):
                print y, x
            else:
                print >> err, y, x

No comments: