DNA Patent Database DNA Patent Database


Search

About the DPD
Conditions of Use
Genome Archive
Patent Coding Form
Resources
Search Algorithm

Home >> Delphion Search Algorithm

Delphion Search Algorithm

To generate the collection of DNA Patent Database, the process is as follows:

  1. Run the algorithm below on the Delphion Patent Database. This algorithm is explained in plain English below.
  2. Review patent titles and claims, and reject patents where the mention of a nucleic acid term is merely incidental (for example, as one of many examples of a subordinate claim)

Delphion Search Algorithm

((047???* OR 119* OR 260???* OR 426* OR 435* OR 514* OR 536022* OR 5360231 OR 536024* OR 536025* OR 800*) <in> NC)

AND

((antisense OR <case><wildcard>cDNA* OR centromere OR deoxyoligonucleotide OR deoxyribonucleic OR deoxyribonucleotide OR <case><wildcard>DNA* OR exon OR "gene" OR "genes" OR genetic OR genome OR genomic OR genotype OR haplotype OR intron OR <case><wildcard>mtDNA* OR nucleic OR nucleotide OR oligonucleotide OR oligodeoxynucleotide OR oligoribonucleotide OR plasmid OR polymorphism OR polynucleotide OR polyribonucleotide OR ribonucleotide OR ribonucleic OR "recombinant DNA" OR <case><wildcard>RNA* OR <case><wildcard>mRNA* OR <case><wildcard>rRNA* OR <case><wildcard>siRNA* OR <case><wildcard>snRNA* OR <case><wildcard>tRNA* OR ribonucleoprotein OR <case><wildcard>hnRNP* OR <case><wildcard>snRNP* OR <case><wildcard>SNP*) <in> CLAIMS)

Translation of Delphion Search Algorithm

  1. Search US Patent classes 047 (plant husbandry), 119 (animal husbandry), 260 (organic chemistry), 426 (food), 435 (molecular biology and microbiology), 514 (drug, bio-affecting and body treating compositions), 536/subclasses 22 through 23.1 (nucleic acids, genes, etc., but not peptides or proteins), subclasses 24 and 25 (various nucleic acids, variants, and related methods), and class 800 (multicellular organisms).
  2. Select patents from that group that include one or more of the following terms in their claims:
  • antisense
  • cDNA
  • centromere
  • deoxyoligonucleotide
  • deoxyribonucleic
  • deoxyribonucleotide
  • DNA (with or without following letters, such as DNAs)
  • exon
  • gene or genes (exact match only)
  • genetic
  • genome
  • genomic
  • genotype
  • haplotype
  • intron
  • mtDNA (with or without following letters such as mtDNAs)-exact case match only
  • nucleic
  • nucleotide
  • oligonucleotide
  • oligodeoxynucleotide
  • oligoribonucleotide
  • plasmid
  • polymorphism
  • polynucleotide
  • polyribonucleotide
  • ribonucleotide
  • ribonucleic
  • recombinant DNA (exact match for case and words only)
  • RNA (all upper case only, with or without following letters such as RNAs)
  • mRNA (exact case match only, with or without following letters such as mRNAs)
  • rRNA (exact case match only, with or without following letters such as rRNAs)
  • siRNA (exact case match only, with or without following letters such as siRNAs)
  • snRNA (exact case match only, with or without following letters such as snRNAs)
  • tRNA (exact case match only, with or without following letters such as tRNAs)
  • ribonucleoprotein
  • hnRNP (exact case match only, with or without following letters such as hnRNPs)
  • snRNP (exact case match only, with or without following letters such as snRNPs)
  • SNP (exact case match only, with or without following letters such as SNPs)

Original Martinell algorithm:

This algorithm was based on an original algorithm developed by USPTO Senior Examiner James Martinell in response to a 1993 request from the Office of Technology Assessment, U.S. Congress.

((435* OR 800* OR 530* OR 536/23*) <in> NC)

AND

((sequenc* OR (atga* OR atgc* OR atgg* OR atgt*) OR cDNA? OR deoxyribo* OR deoxynuclei* OR deoxynucle* OR dna? OR gene? OR nucle* OR nucleotide OR oligonucle* OR oligodeoxy*) <in> CLAIMS)

Translation of original Martinell algorithm:

This algorithm searches classes 435, 800, 530, or 536/23 and

  • any word starting with "sequenc" (such as sequence or sequences)
  • atga., atgc., atgg., atgt. (to capture DNA sequences starting with an "ATG" sequence, which includes many complementary DNA [gene] patents)
  • cDNA with just one more letter
  • any word starting "deoxyribo", "deoxynuclei"
  • DNA with just one more letter (such as DNAs)
  • any five-letter word starting with "gene" (including "genes")
  • nucleotide
  • any word starting with "oligonucle"
  • any word starting "oligodeoxy"

Process for modifying the algorithm

The algorithm was systematically modified from the original Martinell algorithm to the current algorithm ("Ade/Cook-Deegan" algorithm) as follows: Individual terms were tested for "sensitivity" (whether a word identified all the patents we believed it should), and "specificity" (whether it selected only those patents and not patents lacking DNA- or RNA-based claims). The starting point for testing sensitivity and specificity was a set of patents previously read and coded by hand.

To expand the algorithm, we gathered a set of all patents assigned to companies known to do primarily genomic research (Human Genome Sciences and Incyte). Those patents were read and coded by hand, rejecting patents not based on DNA or RNA (e.g., each company had some protein and peptide patents). Patents that contained DNA-based claims but not captured by the Martinell algorithm were then reviewed to identify nucleic-acid-specific terms that would identify them. Those terms were added to the list (e.g., "polynucleotide" was added this way). Finally, all USPTO patents were searched for terms specific to nucleic acids, and all or a sample of those patents were read to verify that they were based on DNA or RNA.

We eliminated terms that did not improve either sensitivity or specificity (using the "ATG." terms, for example, did not identify any patents not already identified). In particular, we rejected class 530 (protein and peptides) and 526/23.2-23.74; and added classes that contained some DNA-based patents that were not included in the Martinell algorithm. We added many new terms specific to nucleic acids, and retained terms that retrieved more than 4 previously unidentified patents (all years), after verifying that the newly identified patents included at least one DNA- or RNA-based claim. The term that introduced the most spurious (non-DNA-based) patents was "sequenc*" (words starting with "sequenc").

The results of searches on the USPTO's EAST and WEST software (searches performed on site at the USPTO in Crystal City, Virginia) were compared to Delphion search results for replicability before shifting to the Delphion search system.

Last updated on March 29, 2007



Kennedy Institute of Ethics, Georgetown University

For comments, suggestions, information, or questions you may contact:

Bob Cook-Deegan or Mara Snyder

©2008-2011. Web Design. Kennedy Institute of Ethics, Georgetown University
©2005. Plan 9 Database Design. IP Data Corporation - ALL RIGHTS RESERVED
©2000. Portions of this program. Faircom Corporation - ALL RIGHTS RESERVED

Duke Institute for Genome Science and Policy