The basic idea of developing TAS Tagger was to create a software, that is able to retrieve and determine key phrases and topics from texts. The identification of these expressions and named entities (person names, locations, organizations, dates, …) are implemented by computational linguistic and machine learning methods and tools. We apply some or all of these methods and tools, depending on the needs of the particular customer.
The types of digital text bodies could be various: web-hosted texts (articles and other documents), scientific contents (essays, dissertation, published researches), business documents (contracts, notes) or even emails.
Our tagger solution can provide input data for automatic (Machine Learning based) classification of texts.
Currently tagging is available in Hungarian and English, but – of course – tagging of other language texts are also possible if necessary.
The effective utilization of your own or external text documents and contents – collected by TAS Data Collector – is essential and the basis of successful business working with enormous quantity of text datasets. TAS Tagger provides various advantages. Tagging bigger text bodies is improving the usage efficiency of such documents:
- enriching its data (tags are metadata)
- making them more easily searchable (documentations or even emails)
- improving its data quality
The application of Tagger service makes the processes comfortable so facilitate the effective workflow and that means both business- and organizational advantage.
Because of the predictable and permanent expenses (monthly service fee in case of continuous use) it is easy to consider to what extent the Tagger is beneficial.
The tagging process details
The tagging process
- definition of the text body to be tagged
- specification of tags
- controlling of how precise the tags are
If the content to be tagged is not available as a structured database, then the textual content can be collected by the TAS Data Collector service. Thereafter TAS Tagger analyses the text body and define tags automatically or the set of possible tags can be defined by the customer in advance. In these cases we build a professional tag-database in partnership with the user. This database contains the pre-defined tags. The machine learning model uses this database and could be re-trained every time the tag-database changes. This re-training method can be accomplished by the user through the TAS user interface. The tagging process is also trackable on the same GUI. Once a tag is accepted, the software stores it. The system also stores the previously tagged text contents.
There is also an opportunity to define relationship between the tags (broader and narrower terms, synonyms, co-occurences) in the software. For example: detected expressions like car, bus, motorcycle, coach, truck could be connected with the word “‘vehicle”. The Tagger can also recognise the synonyms like bicycle – bike.
The more connections and relations are defined, the more specific tagging results are going to be available. Therefore, it is always important to build the tag database carefully.
The GUI provides the user the opportunity to review the connections between the tags – even in historized and visualised form.
The look of TAS Tagger UI
The Tagger GUI can be created within the confines of TAS Platform (TAS Cloud service) or On Premise (locally installed). The appearance of Tagger is consistent with the corporate identity of TAS Platform. The visualization and the other parts of the user interface are also configurable. The particular solution depends on the customer’s needs.
Fields of application of TAS Tagger
During our previous projects we gained significant experience in tagging various text datasets. Fields of application of tagging have a wide range, we provide various (even machine learning-based) solutions. Whether tagging articles, documents, emails or other text bodies, our experience, software programming background and flexibility are the key points to the satisfaction of our customers.
We have delivered solutions for media companies, web journals, enterprises and banks.
What you really get? What are the related services?
In case of ordering the Tagger solution, the customer receives the followings:
- Tagger GUI
- Tags generated by TAS Tagger (automatic or predefined tags)
- API to integrate Tagger functionality in your own software
- Adjustable access rights (admin, editor, read only)
- Customer service / permanent communication
- Follow-up of the project
How does TAS Tagger work?
Tagger was developed to generate tags to text contents automatically, but is able to attend special needs, so the more we know about your business needs, the higher quality solution and more sophisticated approach we provide.
Depending on the difficulty and complexity of the task, we apply several methods to bid the best solution matching to the customer’s requirements.
These methods are
- named entity recognition (NER)
- key phrase extraction
- machine learning by annotating your data manually or using your already annotated text
- the combination of the above mentioned methods
Have a special need tagging texts? Wondering if there is a solution? Let us know!
Realization of a project
After assessing your needs, aims and requirements, we use the collected information to setup project roll and prepare the quotation.
Factors affecting the quotation
- the complexity of the task
- the quality and the quantity of the text dataset
- optional development needs
- special requirements
Other products of the TAS platform
Tagger is a part of the TAS platform. We have also developed other software services in the TAS framework.
TAS Data Collector is the basic of the TAS Platform. Data Collector is able to collect web-based data content in a structured format so as to make this content available for information systems or for further processing and analysis. Find out more by reading the Data Collector use case.
TAS Enterprise Search is an Elastic based enterprise search engine with massive data searching capability (access rights to your data). TAS Enterprise Search engine enables the user to accomplish searches in the data collected by TAS Data Collector. It is a perfect combination when you not just need the data, but you want your dataset to be effectively searchable. TAS Enterprise Search engine is also capable of finding named entities (ie. like company names or date) in various formats. Find out more by reading the Enterprise Search use case.
TAS Thesaurus Manager is a Thesaurus-building module that facilitates the more optimal and sophisticated operation of the TAS Enterprise Search engine. Find out more by reading the Thesaurus Manager use case.
TAS Search Log Analyzer is a perfect solution if you have your structured database and it is searchable by TAS Enterprise Search, you may be keen on getting information about the launched searches. TAS – Search Log Analyzer for example lets you know which keywords are used frequently or without any match. These and similar informations can be used to continuously improve your search system. Find out more by reading the Search Log Analyzer use case.
Beside TAS products it might be that the use of machine learning is required. A machine learning-based tagging procedure uses very detailed processes and – therefore – able to bid very sophisticated solutions.
Initial system requirements (On Premise)
x86_64 CPU at least 4 core
at least 16GB RAM
35GB disk (it may grow as the amount of logs increase)
64-bit Linux, Windows, or macOS – 64-bit JDK 1.8 or above
Availability and platform support
On Premise API
Java SDK is available
Integration with other products
Questions and Answers
What is the duration of implementing a TAS Tagger solution? Every TAS tagging project has different requirements, it could be realized even in 15-30 days. A Machine Learning-based tagging solution needs more time, especially if you don’t have annotated data yet.
Can you handle special requirements? Sure, no problem. We are not only the owners of TAS Platform solutions – but also a software development enterprise so we are capable to develop your custom solution.
Are you prepared to get into business with enterprises outside of Hungary? We have several partners in Europe and we also have overseas customers. We all do speak in English, and some of us in German.
What kind of document formats could be tagged? Basically any documents what consist textual information. If you have special requests, please contact us.