![]() |
![]() |
|||||||
|
About the DPD Conditions of Use Genome Archive Patent Coding Form Resources Search Algorithm |
Home >> Delphion Search Algorithm Delphion Search AlgorithmTo generate the collection of DNA Patent Database, the process is as follows:
Delphion Search Algorithm ((047???* OR 119* OR 260???* OR 426* OR 435* OR 514* OR 536022* OR 5360231 OR 536024* OR 536025* OR 800*) <in> NC) AND ((antisense OR <case><wildcard>cDNA* OR centromere OR deoxyoligonucleotide OR deoxyribonucleic OR deoxyribonucleotide OR <case><wildcard>DNA* OR exon OR "gene" OR "genes" OR genetic OR genome OR genomic OR genotype OR haplotype OR intron OR <case><wildcard>mtDNA* OR nucleic OR nucleotide OR oligonucleotide OR oligodeoxynucleotide OR oligoribonucleotide OR plasmid OR polymorphism OR polynucleotide OR polyribonucleotide OR ribonucleotide OR ribonucleic OR "recombinant DNA" OR <case><wildcard>RNA* OR <case><wildcard>mRNA* OR <case><wildcard>rRNA* OR <case><wildcard>siRNA* OR <case><wildcard>snRNA* OR <case><wildcard>tRNA* OR ribonucleoprotein OR <case><wildcard>hnRNP* OR <case><wildcard>snRNP* OR <case><wildcard>SNP*) <in> CLAIMS) Translation of Delphion Search Algorithm
Original Martinell algorithm: This algorithm was based on an original algorithm developed by USPTO Senior Examiner James Martinell in response to a 1993 request from the Office of Technology Assessment, U.S. Congress. ((435* OR 800* OR 530* OR 536/23*) <in> NC) AND ((sequenc* OR (atga* OR atgc* OR atgg* OR atgt*) OR cDNA? OR deoxyribo* OR deoxynuclei* OR deoxynucle* OR dna? OR gene? OR nucle* OR nucleotide OR oligonucle* OR oligodeoxy*) <in> CLAIMS) Translation of original Martinell algorithm: This algorithm searches classes 435, 800, 530, or 536/23 and
Process for modifying the algorithm The algorithm was systematically modified from the original Martinell algorithm to the current algorithm ("Ade/Cook-Deegan" algorithm) as follows: Individual terms were tested for "sensitivity" (whether a word identified all the patents we believed it should), and "specificity" (whether it selected only those patents and not patents lacking DNA- or RNA-based claims). The starting point for testing sensitivity and specificity was a set of patents previously read and coded by hand. To expand the algorithm, we gathered a set of all patents assigned to companies known to do primarily genomic research (Human Genome Sciences and Incyte). Those patents were read and coded by hand, rejecting patents not based on DNA or RNA (e.g., each company had some protein and peptide patents). Patents that contained DNA-based claims but not captured by the Martinell algorithm were then reviewed to identify nucleic-acid-specific terms that would identify them. Those terms were added to the list (e.g., "polynucleotide" was added this way). Finally, all USPTO patents were searched for terms specific to nucleic acids, and all or a sample of those patents were read to verify that they were based on DNA or RNA. We eliminated terms that did not improve either sensitivity or specificity (using the "ATG." terms, for example, did not identify any patents not already identified). In particular, we rejected class 530 (protein and peptides) and 526/23.2-23.74; and added classes that contained some DNA-based patents that were not included in the Martinell algorithm. We added many new terms specific to nucleic acids, and retained terms that retrieved more than 4 previously unidentified patents (all years), after verifying that the newly identified patents included at least one DNA- or RNA-based claim. The term that introduced the most spurious (non-DNA-based) patents was "sequenc*" (words starting with "sequenc"). The results of searches on the USPTO's EAST and WEST software (searches performed on site at the USPTO in Crystal City, Virginia) were compared to Delphion search results for replicability before shifting to the Delphion search system. Last updated on March 29, 2007 |
|||||||
|
||||||||