Use cases

Data Collector in action 2

TAS Data Collector for gathering procurement data

TAS Data Collector has been specifically designed to enable collecting content available on the World Wide Web. These contents can serve as a basis for data provision, business decision making, scientific and research work, and the compilation of open data.

Online data collection plays an extremely important role in a number of areas, including significant projects such as DIGIWHIST, which aims to collect public procurement data by country.

In action

Learn more from our use cases about what business pains can be solved by TAS Platform solutions.

Further use cases:
TAS Enterprise Search,
TAS Tagger,
TAS Thesaurus Manager,
TAS Search Log Analyzer.

About DIGIWHIST

The original aim of DIGIWHIST was to build a platform that can analyze procurement data from 35 European countries in various languages. During the years, however, the list has been expanded with further countries from Africa and Latin America. In addition, a whistleblower platform has also been implemented during the project.

As part of the project, Precognox has collected public procurement data and processed them using the DIGIWHIST framework. Data from Tanzania, Romania, Poland, Mexico, Kenya and Uruguay have been collected so far, among other countries. Brazilian public procurement data is currently being collected.

In connection with the project, we perform the following tasks:

  • collecting raw data from websites
  • data cleaning and normalization, necessary software development
  • data export in DIGIWHIST structure in json and csv formats
  • custom software development: development of tools to support the process (eg “reverse-flatten tool” for converting CSV content to JSON format)

Dr Mihaly Fazekas talks about DIGIWHIST project

Making public tenders more transparent

As a result of the project, it is possible for everyone on the Opentender page to view the data on individual national portals (by selecting the given country) or explore all available data.
This is how data collection of procurement data makes public tenders more transparent.

Be as successful as the DIGIWHIST project

DIGIWHIST is a good example of how useful and successful a data collection project can be. In addition, the tasks mentioned at DIGIWHIST can be provided for all TAS Data Collector projects.

Learn more about how TAS Data Collector provides transparency for public information.

If you think that the data collected from the World Wide Web by TAS Data Collector can also be useful for you and you want to take advantage of the potency inherent in them, please contact us using the form.

Important and relevant contents can exist not only within the company but also on the World Wide Web. Is it essential to track the external information generated online and extract information out of it? Collect them by TAS Data Collector and use their potential.

Be always up to date exploiting the power of online news and information.

Basic information

The basic idea behind developing TAS Data Collector was to create advanced solutions for problems and customer needs caused by Big Data. Owning or using Big Data (large amounts of data or data from different sources and formats) is a great challenge for every enterprise because this data is mainly unstructured, hardly accessible, inappropriate, invalid, not unitedly formatted and difficult or impossible to reuse and integrate to a specific platform. TAS Data Collector was created considering the 5Vs of Big Data (Volume, Velocity, Variety, Value and Veracity); therefore, it offers advanced and up-to-date solutions.

Useful tip

Collect text contents daily from the
World Wide Web that relevant to your daily work. Exploit them by TAS Insight Engine.

Retrieve the values properly by TAS Tagger and discover the answers by
TAS Enterprise Search.

TAS Data Collector as part of the TAS Platform is able to collect Internet data content in structured format so as to make this content available for information systems or for further processing and analysis.

We have developed TAS Data Collector to provide advanced and flexible solutions working with large datasets as a way to ensure advantages for your business.

Data cleaning and organization

Data is the new oil. We all know this phrase. With the increase in the amount of business data both internally and externally, it is essential to realize the future risks and possibilities of using such data in order to maintain our competitiveness.

Additionally, the processing of Big Data is extremely time-consuming. It has been measured that the largest part of the job of a data scientist is data cleaning and data collection.

“Data scientists spend 60% of their time on cleaning and organizing data. Collecting data sets comes second at 19% of their time, meaning data scientists spend around 80% of their time on preparing and managing data for analysis.

76% of data scientists view data preparation as the least enjoyable part of their work.” (Forbes.com, Gil Press)

Features

TAS Data Collector through its solutions targets exactly these above mentioned two phases, so it is easy to admit that it means not just a more effective but a more pleasant workflow.

We save you both time and effort by collecting all the unstructured and structured data from a certain domain available on the Internet while making the process additionally more comfortable in the background.

By using the TAS Data Collector service (one-off or recurring), you get not just business, but organizational benefits. Reaching and applying the collected data (i.e. market data, information about competitors, etc.) can support the whole organization very effectively.

How can you use TAS Data Collector within your company

In details, how useful TAS Data Collector is to the company’s departments

The flow from data to revenue

The following chart indicates the efficiency of TAS Data Collector compared to a human resource, considering the data quantity. The bigger the data amount is, the bigger the difference in effectiveness; it follows that the bigger the dataset, the larger the business advantage that TAS Data Collector provides.

The effectiveness of Data Collector

What do you really get? What are the related services?

The collected dataset (validated and enriched) and data from TAS Data Collector can be imported into other products (see all supported formats under the “Integration with other TAS products” section), or you can obtain it in JSON format.

Our offer contains the following:

  • API
  • Data Validation Index
  • Monitored and transparent process with continuous progress indication
  • 3rd level customer service / permanent communication
  • Satisfaction measurement – effectiveness monitoring by query
  • Re-examination of the project
  • Follow-up
  • Maintenance
TAS Data Collector

Example of maintenance (monitoring) by recurring data collection

You can check the daily trend of your data on a dashboard that provides you with the opportunity to constantly monitor the data flow.

The collected data can be used in its raw form or can be converted and enriched further by other TAS products.

Because of predictable expenses (monthly service fee), it is easy to consider to what extent TAS Data Collector is beneficial.

The customer’s needs and requests affect the cost of TAS Data Collector, which is clarified in the first step of the project.

Details of the data collection process

The data contained by the specified sites or documents is collected with respect to the following details:

  • TAS Data Collector is able to extract the visible data, metadata (tags, picture description) or pagination from a website.
  • Sites, subpages, login-required pages, even hierarchical sites or pages with a slideshow component or with multilingual content also cause no problem for TAS Data Collector.
  • When data is recognized as hidden, we offer a screenshot solution (the original exact look of the data).
  • In some cases it is forbidden by robots.txt to collect data. We respect this; however, this data is also possible to collect.
  • We can extract texts from a lot of different documents and image formats (PDF, spreadsheet, diagram or image file formats).
  • We are prepared to produce and deliver any required output format, even ones that require software development.

Important! Please consider that we are not responsible for the further utilization of the collected data.

 

How it works

TAS Data Collector was developed to attend to individual needs, so the more we know about your business needs, the better and more sophisticated approach we can provide. The exact solution depends on the following parameters:

  • the required output formats
  • data transformation and enrichment need
  • the aim of the term of use (e.g. system integration to a CRM system)
  • individual needs (e.g.  a specific file type as an output format)
  • the source, amount and quality of data

The constant change of business expectations and IT tools and every new purchase order promote the evolution of TAS Data Collector. We constantly search for new application scope so as to provide up-to-date solutions.

Have a special requirement for data? Wondering if there is a solution?

Through implementation of a project we assess your needs, aims and requirements.  We use the collected information to prepare the project and quotation plan, we identify the key points, and subsequently, we prepare the specifications and the quotation.

Use case

Do want to learn more about the utilization of the solution?

Read TAS Data Collector use case.  

Factors affecting the quotation

  • the complexity of the task
  • special requirements
  • the quality and the quantity of data
  • possible custom development needs
  • the expected deadline

Examples of data collector projects

Through our previous projects, we have gained significant experience in collecting and processing various web-hosted data. Whether downloading unstructured public data (e.g: public money spending), competition monitoring and analysis, downloading publications for research aims or collecting job advertisements, these are all use cases and real tasks for TAS Data Collector. Another scope of usage is data journalism because of its enormous need for data. Collecting open data is also a frequent scope of usage. We are also the operator of the Opendata.hu portal.


Other products of TAS Platform

We have also developed other software services in TAS Platform.

TAS Enterprise Search is an Elastic-based enterprise search engine with massive data searching capability (access rights to your data). TAS Enterprise Search Engine enables the user to accomplish searches in the data collected by TAS Data Collector. It is a perfect combination when you don’t just need the data, but you also want your dataset to be effectively searchable. TAS Enterprise Search Engine is also capable of finding named entities (i.e. company names or dates) in various formats. Find out more by reading the TAS Enterprise Search use case.

TAS Enterprise Bulk Search is a supplementary service for TAS Enterprise Search dedicated to simplify and shorten the complex and time-consuming search processes which previously could only be done one by one.

TAS Thesaurus Manager is a synonym-builder module that enables the building of more intelligent search engines with the TAS Enterprise Search Engine. Searches launched with the combination of TAS Enterprise Search Engine and Thesaurus Manager lead to more meaningful and relevant matches. Find out more by reading the TAS Thesaurus Manager use case.

TAS Search Log Analyzer is a perfect solution if you have your structured database and it is already searchable. In this case you may be keen on getting information about the launched searches. As an example, TAS – Search Log Analyzer lets you know which keywords are used frequently or without any match. These and similar information can be used to continuously improve your search system. Find out more by reading the TAS Search Log Analyzer use case.

TAS Tagger is a tagging solution that is able to retrieve and determine key phrases and topics from texts. It gives tags to your collected datasets automatically. By the tags it is possible to find the connectivity points between different text bodies. One of the key application form is tagging articles in news websites or tagging a great amount of science contents or various business documents. Find out more by reading the TAS Tagger use case.


Technical description

Initial system requirements (On Premise)

x86_64 CPU at least 4 core

at least 16GB RAM 35GB disk (it may grow as the amount of logs increase)

64-bit Linux, Windows or macOS (64-bit)

JDK 1.8 or above

Availability and platform support

Cloud API – On Premise API – Java SDK is available

Output format
Default: JSON

Integration with other products

TAS Platform

Tableau

RapidMiner

Power BI

Google Data Studio

IBM SPSS

Questions and Answers

What is the duration of implementing a TAS solution? Every TAS project has different requirements, although previous projects have been implemented in 1-3 months.

What expenses can I expect? There may be considerable differences, but in simple cases the TAS Data Collector service may start from a couple tens of USD monthly.

Can you handle special requirements? Sure, no problem. We are not only the owners of TAS Platform solutions but also a software development enterprise, so we are capable of developing your custom solution.

Are you prepared to go into business with enterprises outside of Hungary? We have several partners in Europe and we also have overseas customers. We all speak English, and some of us speak German as well.

Do you have other questions about the product or the quotation? Send your message.