Cluster-Based Patent Retrieval for Scientists and Technologists
Dr. Rajendra Prasad25 October 2011Introduction
Patents are a rich resource of scientific knowledge as each invention patented or desired to be patented
embodies several technological concepts besides the key concept on which it is often based. Like in scientific literature, there
is a practice of citing previous patents and other sources of information while describing the invention in a patent application.
Retrieval of past relevant patents irrespective of having been cited or not, to a target technology is a sine qua non
in the filing and prosecution of patent applications. The main purpose of retrieving patent documents is to validate the
genuineness of the technology in a patent application which is assiduously followed by large companies in the corporate
sector maintaining sizable patent portfolios and also by their Patent Agents and Patent Attorneys. Competitive intelligence,
to keep track on the competitor's activities is usually their main objective for continuously conducting the patent search.
Patent Examiners in patent offices around the world also conduct significant patent search to establish the novelty and check
any infringement in the context of invalidity search (i.e., finding prior patents that contain some conflicting claims) which
is often critical for a newly applied patent applications.
For academic researchers, usually the purpose of patent search is to get an insight and knowledge for new leads in
research. Most often these are conducted based on key-word searches in free patent databases and assiduously building a
set of patents by hand-picking from a maze of collection thus obtained. These are fairly painstaking exercises and fail
to inspire many scientists to conduct patent searches of their interests on regular basis. As we shall see in the following
sections, patent retrieval, especially the emerging mode of cluster-based patent search is a potentially new tool for gathering
relevant existing knowledge in the hands of technology developers.
On-line Search of Patents
With growing harmonization of patent laws of various countries, the taxonomy of patent documents and
the respective fields of their content have also now become quite comparable with adoption of 'Internationally agreed
Numbers for the Identification of (bibliographic) Data (INID)' codes since 1970s to identify bibliographic data on the
front page of patent documents. These INID codes lend the patent documents collected from different sources to be collated
in a common database for their bibliographic analysis on global basis. As on date, there are approximately 60 INIDs
representing distinct bibliographic data. These are widely used on the first page of patent documents or in Patent Gazettes.
Some key INIDs pertain to patent classification codes that refer to the nature of technical information
possessed by the patent documents. Various classification systems are in vogue, mostly evolved by different
countries as per their own convenience and requirement. International Patent Classification (IPC) has been
introduced by World Intellectual Property Organization (WIPO) which is regularly being updated from time
to time. Fortunately, IPC is universally accepted and most Patent Offices ascribe IPC codes besides their
own codes on all patents granted by them. Another redeeming feature of IPC is that as the old number codes
are changed as a result of expansion policy, all old patents residing in various online databases are also
updated with new number codes, so that patent search based on a valid IPC code never goes dysfunctional.
The IPC system divides all fields of technology into hierarchical sets of sections, classes, subclasses and
groups. The technical areas which may mean any technical matter, e.g., process, product, technique or apparatus
are defined and suitably differentiated at the level of a class or a subclass in the International Patent Classification
in a patent. As of now the total number of codes as per this classification runs over 70,000. It is an indispensable tool
for industrial property offices world over, in conducting searches to establish the novelty of an invention, or to determine
the state of the art in a particular area of technology.
There are several patent classification schemes that are in use by different patent authorities but International
Patent Classification (IPC) is by far the most popular and more universally applied. For example, UK Patent Office
withheld using its own classification system (UKC) in favour of IPC since July 2007. More than 70 patent granting
authorities are believed to be using the IPC codes and this number is growing further. Moreover, there are now several
comprehensive patent databases online offering free search facility for patent data from a large number of countries.
For example, 'esp@cenet', a worldwide patent database of European Patent Office is one source that provides free search
facility from its database pooling patent documents from more than 90 countries.
It presupposes that a scientist or technologist needs to fully understand the IPC system or more precisely
know the IPC codes referring to the areas of his / her interest to optimally benefit from such a comprehensive
source for search based on these codes and build a meaningful patent inventory.
Cluster Based Patent Retrieval & Visualisation
Searches based on IPC codes are essentially cluster based on 'topic' and not on 'search terms', by the
very nature of IPC codes. During the prosecution stage, each document is hand-assigned to its appropriate IPC codes by the
Patent Examiners resulting in the document being part of several pre-defined clusters. Many automated software for patent
searches and mapping have been based on IPC clusters and further segmentation and clustering based on specific terms before
visualization of resulting trends and profiles from the refined data. Many commercial mapping and analytical software vendors
maintain their own databases of processed patents which are intuitively categorized and classified into different scientific
fields as per their own scheme instead of using IPC clusters of raw patents. Clearly, such a facility provides a significant
value-addition but at a cost which may be out of reach for many scientists.
Prior IP: Patent Cluster Visualization
Prior IP (www.prior-ip.com) is one of the latest to join the band wagon of patent search engines on the
internet with a novel concept of clustering based on patent citations and visually display technology areas with patent data.
The methodology is not clearly explained in the available information though, the site claims to have evolved over 50,000
clusters and a patent or application could belong to more than one cluster. Three distinct modes of visualizations of related
clusters are provided which include cluster maps, cluster landscapes and cluster neighborhoods.
The very first step a user is supposed to take is to search relevant term(s) in the box(es) for a 'technology',
'organization, i.e., assignee or applicant', 'inventor', 'document number', etc. which returns a list of relevancy
ranked patent documents (granted and applications) available within its database. Clicking on a listed patent or
application displays the selected patent document in a familiar format with text of the document along with a thumbnail
of the front page. Alongside the list of patents / applications, a small window with a thumbnail of network of clusters
is provided on the right side indicating a label, "Visualize IP Search Results". Clicking on the this thumbnail opens up
a 'Cluster Landscape' with a series of 'Cluster Maps' bearing a set of specific top terms for each cluster and indicating
number of patents and applications in that cluster. How these clusters bring forth relevant patents and applications
through the maze of patent clusters and allow users to hand pick most relevant for them by surfing through visually
friendly and spatially orientated landscape of familiar technological areas can be demonstrated through an example of actual search.
Since contamination of soil environment is a hot topic and variety of technologies based on different approaches have
been developed and still new technologies are being developed in many countries, we chose a term 'soil contamination'
for our initial search that resulted a simple list of 1529 patents and applications. As anticipated, a thumbnail for
"Visualize IP Search Results" appears on the right side. Running through the list, we can see that there are patents
from various countries, namely USA, Germany, Taiwan, Korea, Japan etc. Clicking on the document titles, we access
bibliographic data with a facsimile of the first page of the document. Also provided are the links to reach the source
database for full text of the document and also a link for downloading a pdf version of it. The running list is provided
along with a 'relevancy score' presumably based on the intensity of search term within the document. Please note we
commissioned a simple search with the given terms and it is also possible to search through various sections of the
document, viz, title, abstract, description etc. and hence we can get a different number and scores of returned documents.
Most interesting part of this search exercise, however, is cluster maps instead of the running list.
Thus, when we click on "Visualize IP Search Results", we are presented with a cluster landscape with a network
of a series of thumbnail view of the cluster maps. In this case, we get as many as 43 cluster maps (even though
elsewhere it indicates 114 clusters!). All these cluster maps have a unique names (with top terms in the parentheses)
referring to a specific area of research and technology. Many of the cluster names may appear remotely connected with
key area of interest, but some are definitely too close. Table 1 below shows some of these cluster titles,
e.g, i) ddt contaminated soil, ii) removing soil contaminants, iii) remediating contaminated soil and so on.
The patents and applications from any of the clusters can be viewed by clicking on the provided links. Similarly, by clicking on the cluster image or link as provided brings forth the spread of the network of clusters. Thus clicking on the link for Cluster 3 in Table 1 above, we can see the desired cluster map as in Figure1.
Once again, all cluster titles may not appear to be quite relevant but some could attract the attention of the user. Some of the other clusters as accessed at a successive stage through previous cluster are shown in the Table 1. It may be noted that all the successive clusters are not sub-group of the previous cluster but all are independent clusters networked with each other through a pre-determined relationship (citations!). This can be understood from the fact that the number of patents / applications in these clusters are highly variable and bear no hierarchical relationships of any kind. The details of patents and applications in any of these clusters can be seen in pop-up windows by clicking on the relevant link. It is interesting to note that a separate link is also provided to access details of some patents / applications which do not form the part of the cluster for some reason.
Pop up of cluster details appears in an interactive template as shown in Figure 2
and is quite informative. In this figure, details of the cluster - remediating contaminated soil
(water, removing, verfahren) which we reached through Cluster 3 shown in Table 1 above are shown.
This has under its fold 1780 patents and 1689 applications. Besides, patent time-line which
essentially shows the growth profile of patents applied / granted on a time scale for the
patents covered in the cluster, It shows in separate windows, top assignees, top inventors
and also top patents and top applications. It is understood that while top patents and top
applications are both with reference to the citations the patents / applications listed
in the relevant cluster, the top assignees and top inventors should be based on their
absolute numbers in each cluster. Attempts to view top inventors, however, did not
succeed in returning any results. Nonetheless, the window for top assignees presents
very interesting information. The active links for the top assignees point to a
comprehensive profile of the assignee with details of patents, applications,
technologies and the clusters to which it can be attributed. It also provides
the assignee's most recent patents, applications and licensable technologies
which are not just restricted to the field of the cluster. Thus one can
easily comprehend the overall technological strength and back up of the
assignee beyond one's narrow area of interest.
Conclusion
While experts in information science are still vigorously trying to develop various
algorithms for cluster-based patent search, there already is on the horizon a very exciting tool
albeit with a few bugs yet. The cluster-based patent search facility as offered by Prior IP is
certainly very useful for everyone and above all for scientists and technologist who are likely
to patronize its maximum use. Scientists with keen interest for building a patent inventory in
their narrow subject of specialization should feel extremely at home searching patents through
the available clusters without bothering to learn the basics of patent systems. The added beauty
of the facility is that the user can download all patents of interest on his / her desk-top in
a csv format to build a personal inventory of relevant patents in Excel.
Go to