The basic idea of developing TAS Tagger was to create a software, that is able to retrieve and determine key phrases and topics from texts. The identification of these expressions and named entities (person names, locations, organizations, dates, …) are implemented by computational linguistic and machine learning methods and tools. We apply some or all of these methods and tools, depending on the needs of the particular customer.
The types of digital text bodies could be various: web-hosted texts (articles and other documents), scientific contents (essays, dissertation, published researches), business documents (contracts, notes) or even emails.
Our tagger solution can provide input data for automatic (Machine Learning based) classification of texts.
Currently tagging is available in Hungarian and English, but – of course – tagging of other language texts are also possible if necessary.
The effective utilization of your own or external text documents and contents – collected by TAS Data Collector – is essential and the basis of successful business working with enormous quantity of text datasets. TAS Tagger provides various advantages. Tagging bigger text bodies is improving the usage efficiency of such documents:
The application of Tagger service makes the processes comfortable so facilitate the effective workflow and that means both business- and organizational advantage.
Because of the predictable and permanent expenses (monthly service fee in case of continuous use) it is easy to consider to what extent the Tagger is beneficial.
The tagging process
If the content to be tagged is not available as a structured database, then the textual content can be collected by the TAS Data Collector service. Thereafter TAS Tagger analyses the text body and define tags automatically or the set of possible tags can be defined by the customer in advance. In these cases we build a professional tag-database in partnership with the user. This database contains the pre-defined tags. The machine learning model uses this database and could be re-trained every time the tag-database changes. This re-training method can be accomplished by the user through the TAS user interface. The tagging process is also trackable on the same GUI. Once a tag is accepted, the software stores it. The system also stores the previously tagged text contents.
There is also an opportunity to define relationship between the tags (broader and narrower terms, synonyms, co-occurences) in the software. For example: detected expressions like car, bus, motorcycle, coach, truck could be connected with the word “‘vehicle”. The Tagger can also recognise the synonyms like bicycle – bike.
The more connections and relations are defined, the more specific tagging results are going to be available. Therefore, it is always important to build the tag database carefully.
The GUI provides the user the opportunity to review the connections between the tags – even in historized and visualised form.
Graph of the connections between the defined tags
The Tagger GUI can be created within the confines of TAS Platform (TAS Cloud service) or On Premise (locally installed). The appearance of Tagger is consistent with the corporate identity of TAS Platform. The visualization and the other parts of the user interface are also configurable. The particular solution depends on the customer’s needs.
The look of Tagger GUI
During our previous projects we gained significant experience in tagging various text datasets. Consider that – in some cases – the aims and application forms of tagging have a wide range, we provide various (even machine learning-based) solutions. Whether tagging articles, documents, emails or other text bodies, our experience, software programming background and flexibility are the key points to the satisfaction of our customers.
We have delivered solutions for media companies, web journals, enterprises and banks.
In case of ordering the Tagger solution, the customer receives the followings:
Tagger was developed to generate tags to text contents automatically, but is able to attend special needs, so the more we know about your business needs, the higher quality solution and more sophisticated approach we provide.
Depending on the difficulty and complexity of the task, we apply several methods to bid the best solution matching to the customer’s requirements.
These methods are
Have a special need tagging texts? Wondering if there is a solution? Let us know!
Realization of a project
After assessing your needs, aims and requirements, we use the collected information to setup project roll and prepare the quotation.
Factors affecting the quotation
Tagger is a part of the TAS platform. We have also developed other software services in the TAS framework.
TAS Data Collector is the basic of the TAS Platform. Data Collector is able to collect web-based data content in a structured format so as to make this content available for information systems or for further processing and analysis.
TAS Enterprise Search is an Elastic based enterprise search engine with massive data searching capability (access rights to your data). TAS Enterprise Search engine enables the user to accomplish searches in the data collected by TAS Data Collector. It is a perfect combination when you not just need the data, but you want your dataset to be effectively searchable. TAS Enterprise Search engine is also capable of finding named entities (ie. like company names or date) in various formats.
TAS Thesaurus Manager is a Thesaurus-building module that facilitates the more optimal and sophisticated operation of the TAS Enterprise Search engine.
TAS Search Log Analyzer is a perfect solution if you have your structured database and it is searchable by TAS Enterprise Search, you may be keen on getting information about the launched searches. TAS – Search Log Analyzer for example lets you know which keywords are used frequently or without any match. These and similar informations can be used to continuously improve your search system.
Beside TAS products it might be that the use of machine learning is required. A machine learning-based tagging procedure uses very detailed processes and – therefore – able to bid very sophisticated solutions.
Initial system requirements (On Premise)
x86_64 CPU at least 4 core
at least 16GB RAM
35GB disk (it may grow as the amount of logs increase)
64-bit Linux, Windows, or macOS – 64-bit JDK 1.8 or above
Availability and platform support
On Premise API
Java SDK is available
Integration with other products
Precognox TAS platform
What is the duration of implementing a TAS Tagger solution? Every TAS tagging project has different requirements, it could be realized even in 15-30 days. A Machine Learning-based tagging solution needs more time, especially if you don’t have annotated data yet.
Can you handle special requirements? Sure, no problem. We are not only the owners of TAS Platform solutions – but also a software development enterprise so we are capable to develop your custom solution.
Are you prepared to get into business with enterprises outside of Hungary? We have several partners in Europe and we also have overseas customers. We all do speak in English, and some of us in German.
What kind of document formats could be tagged? Basically any documents what consist textual information. If you have special requests, please contact us.