In the big data era, programming skills are required in order to efficiently handle datasets. Scientists in natural sciences are commonly exposed to big datasets facing a most demanding task. It is most common that SLU PhD students have to analyze datasets that require different skillsets to traditional tools like Excel. Moreover, a common situation requires datasets to be transformed in various formats in order to be analyzed by specialized software. The latter in the case of big data can be achieved only programmatically. Python is currently the most popular programming language for data science. The latter could not be possible without the Pandas library which greatly facilitates a wide range of operations needed for data analysis, like transforming data format, combining data stored in different files and producing insightful summaries regarding data quality and interpretation. Moreover, the extensive graphic-related ecosystem of Python like the Seaborn library offers tremendous possibilities for constructing informative graphs both for facilitating data interpretation and for publication purposes.
The course format will include morning lectures that will be followed by practical exercises. The interactive development environment of Jupyter (www.jupyter.org) will be used throughout the course. Basic Python syntax will be introduced and thereafter students will gradually build core data science related skills. In particular, the students will be introduced to the Pandas library and practice data manipulation and aggregation techniques in large datasets. Finally, the students will gain experience in producing informative graphs using the Seaborn library or similar.
Expected study time
Total: 54 hours
Own study prior to course: 10 hours
Lectures: 14 hours
Computer assignments: 30 hours
Distance course: Yes
Responsible department: Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences
Date: 2021-09-06 – 2021-09-10