Have you ever tried an internet search on Google or Yahoo hoping that there will be some results on what you were looking for only to find there were so many results that you found yourself overwhelmed? Overload of information is a common and consistent challenge. The same applies to innovators, researchers and those who work with patent databases and need to analyze information. On the plus side, there is virtually unlimited access to a vast amount of patent and IP data in the form of online databases and other technical publications. This abundance of data is also the downside for many as it can prove very challenging to find specific information across a very large group of patents.
Most sources of IP data and patent information provide search capabilities and while this is an important component of any database, it’s not necessarily the most efficient way of finding what you are looking for in patent data. It’s important to understand the difference and benefits of both “search” and “text clustering”. Searching is necessary, however since searches match keywords and identify results based on the hits they can end up returning too many records many of which may not fall in the context of what you’re looking for.
Text clustering technology however, identifies meaningful clusters of text or segments of information within the patent data which is more along the lines of how researchers would look through the data. It scans, identifies and then ranks relevant topics or concepts within the data which can help the researcher interpret the information better. By looking through a generated set of topics from the search results the user can quickly identify those he would like to set aside for deeper review and those he would like to ignore or mark irrelevant. That’s because clusters can represent both: topics you want and topics you don’t want. In either case you are rapidly narrowing down your search to the relevant few.
One must however set their expectations right since there isnt (and never can be) ‘the one set of right clusters’ for a set of patent records. Most solutions that provide clustering capabilities do not give any flexibility to the user to tune the way clustering is done thereby keeping the clustering process as black-box and not allowing any refinement in the generated set of clusters. That’s because it is assumed that the generated set of topics can only be used for trend analysis and not for exploring a.k.a “digging through” or narrowing down a large patent result set. However with the right set of tuning parameters a user can quickly instruct the clustering engine to focus on the “broader topics”, or just the “finer topics”, or to keep broader topics at first level and finer topics under them, or to give more weightage to topics or concepts containing a particular set of words. With such flexibility a user can now run the clustering engine more than once, each time with a different setting, to rapidly dissect a large patent set and comprehend its various facets. This flexibility has been the cornerstone of the text clustering capabilities provided in Patent iNSIGHT Pro and a wide range of parameters can be tuned to influence the clustering process at each step.
Below is an example of the use of this technology taken from our White Paper on Text Clustering
In this sample set, we did a simple search for the word “skateboard” in Title, Abstract and Claims of patents across key countries and then de‐duplicated the results to only unique families. This resulted in 552 unique inventions.
Text clustering was then performed using Patent iNSIGHT Pro* over the Title, Abstract and Claims sections of these patents and the results obtained are illustrated below. We have used the sub‐topics on Skateboards used in Wikipedia as a sample for cross‐reference.
For more download the white paper on Text Clustering HERE
The results are automatically categorized making it easier to narrow down on a category or set of patents and the data retrieved for analysis is far more refined. In effect, the way to better efficiency in managing larger amounts of patent data and being able to analyze the information quicker lies in the automation factor of text clustering technology.
Searching through an IP database, reading through the text of the hundreds of results and then analyzing the information manually would not only be slow but very tedious in most cases. While the benefits of smarter patent data analysis software go beyond this, for helping one find the information they need and presenting patent data in a clearer light, it’s an invaluable investment with visible returns. So while you build on the large sources of IP data you have access to and gather more data, also explore the right software tools that will help you quickly narrow down and get the most from your data. With the right patent data analysis software, even a 30,000 search result set can be managed efficiently without being overwhelmed by the volume of data.