Day 05

Day 5: Handling data

Version from 02/10/2014

25.08.2016: The .pdf file of the script has been updated. Please use the new version.

Command reference: Matlab syntax

pdf version of the script: [Tag05.pdf]

Word version of the script:[Tag05.docx]

Downloads:

T5A3: [fuetterungen.mat]

*T5A5: [search_demo.m], [VP_data.mat]

T5B4: [algae.mat]

T5B7: [meisenshow.m] [bmeise.jpg]

T5C2: [aliasing_effect.m]

T5C3: [stimulus.mat] [spikes.mat]

T5H1)[picturematrix.mat]

T5H2) [phValues.mat]

T5H3) [spikedata_short.mat]

*T5H5)[bird_table_insa.m]

*T5H6)[wunschkatze_jutta.m]

Topics:

A) SEARCH AND SORT

Scientific data evaluation often requires data to be searched according to certain criteria in order to then only use that part of the data for analysis for which certain conditions are met. In principle, data can be searched very well (and in any programming language) using logical operators, case distinctions and loops.

Matlab also provides a special concept, logical indexing. This involves applying a matrix (or vector) of type logical as indices to a matrix (or vector) of the same size. The result is a vector that contains only those elements of the original matrix that correspond to the logical true in the index matrix. E.G:

  • a=[1 9 6; -9 7 0];
  • indi=logical([0 1 0; 1 0 1]);
  • b=a(indi)

returns b=[9;-9;0].

Very often the matrix is generated from logicals by applying a comparison operator to the original matrix. E.g. ag0=a>0

Matlab provides the special find command for searching matrices according to certain criteria. find returns the indices of those elements of a vector or matrix that are not equal to 0. If find is combined with logical operators, data can be searched very efficiently even for complicated patterns (at least with a little practice...). Syntax of find:

ind=find(v)

Returns all indices of the vector v that are not equal to 0.

ind=find(v,k)

Returns the first k indices of the vector v that are not equal to 0.

ind=find(v,k,'last')

Returns the last k indices of the vector v that are not equal to 0.

[row,col]=find(M,...)

Returns the row and column indices of the elements of the matrix M that are not equal to 0.

[row,col,val]=find(M,...)

Returns row and column indices and values of the elements of the matrix M that are not equal to 0.

It is often the case that you do not want to leave data in its original order, but sort it according to certain criteria. In principle, this can also be done "by hand" with case distinctions and loops - but this can quickly become quite a lot of work. This is why Matlab has two practical commands, sort and sortrows.

sort sorts the data independently for each row (or column) of a matrix. With sortrows, the matrix is sorted according to a column by retaining the rows in each case.

vs1=sort(v1)

sorts the elements of a vector v1 in ascending order

ms1=sort(m,1)

sorts the elements of each column of the matrix m in ascending order (independent of each other)

ms2=sort(m,2)

sorts the elements of each row of the matrix m in ascending order(independent of each other)

msd=sort(m,1,'descend')

sorts the elements of each column of the matrix m in descending order (independent of each other)

[ms1,index]=sort(m,1)

returns the indices in addition to the sorted matrix

mr1=sortrows(m1,n)

sorts the rows of the matrix m1 according to their entries in the nth column in ascending order.

mdesc1=sortrows(m1,-n)

sorts the rows of the matrix m1 according to their entries in the nth column in descending order.

mr_n1=sortrows(m1,[n,1])

sorts the rows of the matrix m1 according to their entries in the n-th column in ascending order. If the values in the n-th column, these rows are sorted according to the 1st column.

[mr1,index]=sortrows(m1,n)

also returns a vector of indices.

Tasks:

T5A1) Try out logical indexing:
m=[ 0 1; 5 9; 3 0]
ind_m=logical([0 1; 0 0; 1 0])
m_teil=m(ind_m)
Use logical indexing to generate a vector that only contains those elements of m that are greater than 2.

T5A2) Try out the find command:
v=[1 7 5 0 8 0 3 1]
ind_unequal_0=find(v) % Search for indices of elements not equal to 0
v_unequal_0=v(ind_unequal_0) % Write values of elements not equal to 0 in a vector
ind_equal_1=find(v==1) % Search for indices of elements equal to 1

m=[ 0 1; 5 9; 3 0]
[row_not0,col_not0]=find(m) % Matrix type of search
lin_ind_not0=find(m) % Search with linear indexing
m_not0=m(lin_ind_not0) % Output the elements not equal to 0 as a vector
[row_1,col_1]=find(m==1)

Use find to create a vector that only contains those elements of m that are greater than 2.

T5A3) Load the following series of observations: [fuetterungen.mat]
In this matrix, the amount of precipitation of the day in mm (1st column) and how often a female tit flew to its nest to feed (2nd column) were entered on 31 days of a month. Find out:

  • On which days of this month did no rain fall?
  • On which days did the titmouse feed more than 50 times?
  • On which days without precipitation did the titmouse feed more than 50 times?
  • How often did the titmouse feed more than 100 times on two consecutive days?

T5A4) Sort the matrix feedings once according to the number of feedings and once according to the amount of precipitation.

* T5A5) If you want to see an example of searching in measurement data, here is an example from the Advanced Maths course where the same data set is searched in three different ways: [search_demo.m] uses the data set [VP_data.mat]

B) GRAPHICAL REPRESENTATION OF DATA

Displaying data graphically is an extremely important task in scientific data analysis. The most important command for this is the already known plot command. In this section, we will look at some additional options for graphical visualisation.

An important prerequisite for understanding graphic formats is colour coding, which can be used to precisely define the colours used in a graphic. Colours are represented in the so-called RGB coding, i.e. they are defined by a vector colour=[red green blue]. In Matlab, values between 0 and 1 are specified for coding red, green and blue, e.g. [1 0 0] is red, [0 0 0] is black, [1 1 0] is yellow, [0.6 0 0.6] is dark violet. Any colours can be mixed in this way. (See task T5B6)

An important graphics format is jpg. jpg files can be loaded and edited in Matlab. Each point is coded by a vector of three colour values (although this is slightly different to the usual format in Matlab). For two-dimensional images, this results in a 3-dimensional matrix of the size MxNx3 (M and N are the number of pixels in the vertical and horizontal directions). (See exercise T5B7)

You can save Matlab mappings and open them again in Matlab. The easiest way to do this is to use the menu item ">File >Save as". The default is to save the figure with the extension .fig - this is a format that can only be processed properly by Matlab, but retains all the properties of the figure when it is reloaded.

If you want to use the figures with another programme, e.g. to include them in a report written with Word, you must use a file format other than .fig. To do this, you can save as described above, but you must select the desired file format and adjust the file extension accordingly.

T5B1) You often process more data than can be meaningfully accommodated in a figure. You can use the subplot command to divide a figure into several smaller figures. With subplot(2,3,5) you create a matrix of small figures with 2 rows and 3 columns, whereby the 5th window is active. The plot command(and all associated formatting and labelling) can be used normally in this window. To activate another of the mapping parts, the subplot command is again used with the corresponding specification for the current number (e.g. subplot(2,3,1)). Generate a figure consisting of 4 subplots in which the curves x.^1 to x.^4 are displayed for x=-5:0.1:5.

T5B2) Matlab can also use several graph windows at the same time. A new figure is obtained with figure. With figure(1) and figure(2) you can jump between the active windows. close(1) closes the first window created, close all closes all graphics windows. clf deletes the contents of the current window.
If you open two graphics windows, plot the square root of the x-vector 0.1:0.1:10 in the first window and the square of x in the second window. Try out the above commands.

*T5B3) Pie charts are a popular way of displaying percentages. To show the distribution of votes in the 2006 Oldenburg city council election, for example, you would write
percentages=[32.74 25.99 21.24 6.33 7.24 5.44 0.99];
pie(percentages
)

If you want to make it a little prettier, you can also use e.g:

pie3(prozente)
pie(prozente,{'SPD','CDU','Gruene','FDP','Linke','BFO','NPD'})
or pie3(prozente,[0 0 1 0 0 0 0]).

The Oldenburg city council has 50 seats. Calculate the distribution of seats (in this case there were no deviations from the distribution of seats expected from the percentages) and also display this as a pie chart.

T5B4) Some time ago, we used the imagesc command to display the contents of a matrix in colour-coded form. Use it to look at the distribution of unicellular algae in a petri dish, which is stored as a matrix population in [algae.mat].
What do the two axes mean in this illustration?
The vectors x and y contain the distances between the measurement areas in the x and y directions of the Petri dish. With imagesc(x,y,population) you get meaningful axis labelling.
The colourbar command generates a legend for the colour coding of the values.

T5B5) The same algae distribution can also be displayed three-dimensionally as a "mountain range". The command for this is surf(population). In order to obtain meaningful axes, each point of the matrix must be assigned an x and a y value (the values of the matrix population are plotted on the z axis). Use the xmatrix and ymatrix matrices saved in the file for this purpose. The colourbar command also works here. Try the alternative mesh command. What is the difference to surf?

T5B6) Display the vector 0:1:10 with a solid dark red line using the RGB coding.

*) If you fancy sophisticated graphics: Provide the individual points on the line with turquoise-filled circles (you need line properties for this, see T1C7) or choose a particularly pretty colour combination for lines and markers.

T5B7) Take a look at how the script [meisenshow.m] works. You will also need the file [bmeise.jpg]

  • How does the colour coding of this jpg file differ from that used in Matlab?
  • Divide the matrix into three parts to look at the red, green and blue parts of the image individually.
  • *) Expand the show to include a titmouse with two heads.

C) DATA ACQUISITION AND SPARSE MATRICES

Most biological data does not change step by step, but continuously (e.g. pH value, body temperature, membrane potential of a nerve cell, etc.). When recording such data, one is faced with the problem that continuous recording is fundamentally not possible. At the latest when a computer is used to store the data, it must be discretised - recording is only possible in certain time steps and due to the limited (albeit very high) accuracy of the representation of decimal numbers, the measured values cannot be measured 100% continuously. Basically, the higher the sampling rate (i.e. the more measurements are taken per time unit), the more accurate the measurement - but also the more "expensive", because more data is generated that has to be stored and processed (and the measurement hardware is often significantly more expensive if you want to achieve high sampling rates).

For large two-dimensional matrices, where a high percentage of the elements are zeros, Matlab offers a special data type, the sparse matrices. In the case of matrices filled mainly with zeros, these require considerably less memory than normal matrices consisting of the data type double, where each 0 is coded with 32 bits of memory. With sparse matrices, only those elements that are not equal to 0 are stored in the matrix together with their row and column indices.

A sparse matrix is created with the sparse command, e.g.
A=[0 0 0; 0 9.5 0; 1 0 0];
S1=sparse(A)
results in:
S1 =
(3,1) 1.0000
(2,2) 9.5000

The command full translates a sparse matrix into a normal two-dimensional matrix of the type double: F=full(S1)results in the same matrix that was defined above as A.

With sparse matrices, all matrix manipulations (e.g. indexing, adding rows or columns, etc.) and most operations can be carried out with the same syntax as with normal matrices (e.g. e=S1(2,2); S1=[S1; 0 0 7]; S2=2*S1).

Useful functions for dealing with sparse matrices:

issparse(s) Test whether a matrix is sparse
find
find elements not equal to 0 (as usual)
nonzeros Values of the elements not equal to 0
speye generates sparse identity matrix
spfun applies function to elements not equal to 0.
sprand(S) generates a sparse matrix with the same structure as matrix S but with equally distributed random elements
sprandn(S)

creates a sparse matrix with the same structure as matrix S but normally distributed random elements

spones
Replaces the elements not equal to 0 with ones
spy Visualisation of the sparse-pattern

T5C1) Draw a parabola: x=-5:1:5, y=x.^2, plot(x,y)
What do you notice about this parabola? Visualise the individual points.
Make the steps smaller and plot the parabola again.
Create a common plot in which the values generated above with individual points and the values for x_fein=-5:0.01:5 are drawn in a figure with a solid line.

T5C2) If a continuous variable is measured, the choice of sampling rate is important - sampling rates that are too low can lead to fundamentally incorrect results. This effect is illustrated with the following script:
[aliasing_effekt.m]

T5C3) Take a look at the effect for real data: As part of an electrophysiology practical, intracellular measurements of the membrane potential of a leech nerve cell are made. The cell is stimulated with an electric current with a temporal resolution of 10,000 points/second (10kHz). The time course of this current (in nA) is stored in [stimulus.mat]. The voltage response (in mV) is also recorded at 10kHz. It is stored in [spikes.mat]. The neuron responds to this current pulse with a depolarisation of the membrane potential (i.e. with more positive voltages), which is superimposed with so-called "spikes". (Nerve cells communicate with each other by means of these voltage spikes, which always look similar). The data was recorded at a frequency of 10,000 data points per second.

  • Load the files and plot them on top of each other in two subplots of one figure.
  • Adjust the time axes in the plots so that seconds are displayed and label the axes.
  • Create a vector in which only every 10th point of the vector spikes (this corresponds to a recording with 1kHz).
  • Create another vector in which only every 100th point of the vector spikes is transferred (this corresponds to a recording with 100Hz).
  • Open a new graphics window and create 3 subplot windows one below the other.
  • Draw one of the three vectors with the recordings at 10kHz, 1kHz and 100Hz in each of the three windows.
  • How do the time courses differ?
  • Use the magnifying glass function and the display of individual points on the graph to look at the time course of a single action potential for the three vectors.
  • Also look at a time section in which "nothing" happens, i.e. the measured value fluctuates around a fixed value.
  • Calculate the mean value and standard deviation of the first 200ms of the measurement for all three vectors.

T5C4) Create a clear matrix that contains many zeros, but also some other elements.

  • Create a matrix under a different name sparse matrix from your original matrix.
  • Try out whether you can use the command full command to get the original matrix.
  • Perform some matrix operations on both the sparse- as well as on the full-version of the matrix.
  • Check whether both matrices display the same content at the end.

T5C5) With sparse matrices, space can be saved in the working memory and when saving to files. However, Matlab uses a skilful type of compression when saving to files anyway, so the difference between sparse and normal matrices is not so noticeable here:

  • Matlab Create a really large matrix (at least 1000x1000) consisting of zeros.
  • Replace 1% of the zeros with random numbers in random places.
  • Create a sparse matrix with a different name from this matrix.
  • Save both matrices individually in a file.
  • Compare the file sizes of the two files.
  • Create a matrix of the same size that is completely filled with random numbers. And also create a sparse matrix variant.
  • Save both random matrices individually in files.
  • What do you notice about the four file sizes?

D) MAIN TASK

Command reference: Matlab syntax

T5H1) A short exercise on working with matrices and graphics: Download the matrix picturematrix.mat.

  • Set up the matrix with imagesc to display the matrix graphically.
  • Copy only the part containing the house into a new matrix and display it in a new graphics window.
  • Create a third matrix containing only one of the stars and display it graphically in a new window.
  • *) Why does the star change colour? How could this be prevented?

T5H2) Another small graphic exercise: Let's assume you have a few particularly sensitive fish in your aquarium in which the series of measurements [phWerte.mat] was carried out.

  • These fish tolerate the range from pH 6.5 to pH 7.5 well, but are at risk above and below this range.
  • If you look at the data, you will see if there are any obvious measurement errors in this data set. Think of a good way to find these automatically.
  • Plot the pH values so that they show the measured values within and outside the tolerance range as well as the measurement errors with symbols in three different colours.

T5H3) As above, [spikedata_short.mat] is an intracellular measurement of the membrane potential of a leech neurone. Ten responses of this cell to the same stimulus were recorded, a current pulse whose time course (in nA) is stored in the simultaneously saved vector stimulus .

  • Write a script to view all the response traces together with the time course of the stimulus (this can also be one after the other).
  • Set a threshold "by eye" to find the action potentials.
  • Caution: there is an artefact in the response trace at the end of each current pulse. We do not want to recognise this as an action potential.

**T5H4) Write a function that receives the matrix with the measurement data from the last task and a threshold value as input arguments and returns a vector of how many spikes were triggered in the individual runs as output. (The difficulty with this task is that leech spikes have a temporal length of several milliseconds but should only be counted once).

*T5H5) Use your own solution to task T4H7 (or alternatively the sample solution vogeltabelle_insa.m) to create a matrix of captured birds. Sort them so that first the blackbirds, then the robins and finally the tits are listed in the matrix, with the males and then the females within each of these groups. The animals of one species and one sex should be sorted by weight.

*T5H6) Practise searching in data by extending your solution (or alternatively the sample solution wunschkatze_jutta.m) for the desired cat program from day 3(T3H4) so that it outputs the data (sex, age, colours) of all the desired cats found. Sort this data according to age.

*T5H7) For friends of graphics: Display the cake graphics from T5B3 in the colours matching the parties. (You will have to look in the help for this...)

To the 6th course day

(Changed: 11 Feb 2026)  Kurz-URL:Shortlink: https://uol.de/p36845en
Zum Seitananfang scrollen Scroll to the top of the page

This page contains automatically translated content.