Opinnäytetyöt Tieto- ja viestintätekniikka, metadata

Thesis in Finnish metadata 2017-2022

This dataset has been gathered from the 478 bachelor‘s theses written for the Finnish-taught ICT (Information and Communication Technology) degree program which were published between 2017 and 2022. Most of the theses can be found in the joint database of Finnish Universities of Applied Sciences, Theseus (https://www.theseus.fi). The students were able to choose whether to write their thesis in Finnish or English (135 theses).

The data was extracted from the theses’ pdf files by a python script into two excel files, one for the ones written in Finnish, one for those in English, which were then cleaned and converted into csv and json format. While extracting the data, about 100 theses which’s metadata could not be read in full, have already been discarded, most for not following the template given, also for providing Finnish metadata for an English thesis or not mentioning the author‘s name. Rows marked as “restricted“ (8 in Fi, 2 in En) “does not follow the template” (2 in En) or “not in theseus“ (17 in Fi, 1 in En) have been removed by hand, as well as a few lines with obvious logical errors (e.g. more keyword appearances than words in the thesis). Dots at the end of keywords and spaces in the middle of words have been removed, minor typos have been corrected.

The word count includes only the thesis itself, neither abstract nor appendix. 10 pages were provided by the template given. The supervisor id‘s are matching those in the dataset for the English-taught ICT degree program.

The dataset contains the following fields:

Total References – Total number of references

Printed References – Number of printed references

Internet References – Number of references from the internet

Weak References – Number of weak references (wikipedia, reddit, blog, youtube)

Pages – Number of pages

Total Word Count – Number of words

Study Credits – Number of study credit at the moment of graduation

Study Entitlement Days – Length of study entitlement measured in days

Grade – Thesis grade (1-5, 1 is the lowest passing grade and 5 the highest)

The dataset contains the following fields:

Total References – Total number of references

Printed References – Number of printed references

Internet References – Number of references from the internet

Weak References – Number of weak references (wikipedia, reddit, blog, youtube)

Pages – Number of pages

Total Word Count – Number of words

Study Credits – Number of study credit at the moment of graduation

Study Entitlement Days – Length of study entitlement measured in days

Grade – Thesis grade (1-5, 1 is the lowest passing grade and 5 the highest)

Supervisor ID - Supervisors ID

Keywords – Keywords and the number of times they occur in the thesis

Total Occurrences – Total number of keyword occurrences

Theses produced as group work are evaluated separately and the metadata can appear multiple times in the dataset.

Preview

There are no views created for this data resource yet.

Additional information

Format JSON
File size 151593
Temporal Coverage -
Data last updated 27 February 2024
Metadata last updated 22 March 2024
Created 27 February 2024
SHA256 7d84785f4fa810e32b2bf58bf71822214b46800a0fa777576158756d977f5320