function [aggticks, spacing] = aggreg (series, interval, span, tickmin) %AGGREG Standing-quote aggregation of unevenly spaced ticks into regular x-min. spacing % % [AGGTICKS, SPACING] = AGGREG (SERIES, INTERVAL, SPAN, TICKMIN) % * aggregates vector SERIES of unevenly spaced ticks in MATLAB DATENUM format % * into evenly spaced periods of length INTERVAL in minutes. % % * AGGTICKS is a vector of indices corresponding to those ticks of SERIES onto which % the SERIES was aggregated; % * SPACING is a vector of evenly spaced ticks in DATENUM format corresponding to the % aggregated values. % % * SPAN is a two-element vector defining the starting point (default = SERIES(1)) and % the closing point (default=SERIES(END)) of the aggregated series in DATENUM format. % * TICKMIN defines the starting point for the aggregation within SERIES, e.g. assuming % the first tick in SERIES to be DATENUM('01-Oct-1992 00:00:14') and INTERVAL to be 5 % (default), then for TICKMIN=1, SPACING starts with DATENUM('01-Oct-1992 00:01:00'), % for 2 -> DATENUM('01-Oct-1992 00:02:00'), etc. % TICKMIN must be between 0 and INTERVAL. % % By varying TICKMIN, different aggregated samples can be drawn from the same data set, % thus allowing maximum use of the information contained in the data. % % This function can be used not only for the purpose of deriving an aggregated time % series, but also to compute the frequency of observations in any interval by % differencing AGGTICKS (see also the author's OBSFREQ.M file). % % Another distinguishing feature of this algorithm is that its main part is free of loops % and hence very fast even on large data sets. A large number of gaps in the data (i.e. % intervals in which the quote does not change from the previous one) will prolong the % runtime significantly. % % EXAMPLES on a PII-233 192MB 60ns RAM Windows NT 4.0 system with O&A's HFDF-I dataset: % % Aggregation 1,472,241 DM/$ quotes 567,759 ¥/$ quotes % 60 min. 46 sec.s 19 sec.s % 30 min. 52 sec.s 22 sec.s % 10 min. 77 sec.s 44 sec.s % 5 min. 2.5 min.s 2.0 min.s % 2 min. 10.7 min.s 9.8 min.s % 1 min. 36.9 min.s 38.4 min.s % and 115 MB RAM 65 MB RAM % % The author assumes no responsibility for errors or damage resulting from usage. All % rights reserved. Usage of the programme in applications and alterations of the code % should be referenced. This script may be redistributed if nothing has been added or % removed and nothing is charged. Positive or negative feedback would be appreciated. % Copyright (c) 11 April 1998 by Ludwig Kanzler % Department of Economics, University of Oxford % Postal: Christ Church, Oxford OX1 1DP, U.K. % E-mail: ludwig.kanzler@economics.oxford.ac.uk % Homepage: http://users.ox.ac.uk/~econlrk % $ Revision: 1.22 $ $ Date: 3 October 1998 $ % Check function arguments and assign default values if necessary: if nargin < 3 span = [series(1), series(end)]; if nargin < 2 interval = 5; end end if ~exist('tickmin', 'var') tickmin = interval; end if (tickmin > interval | tickmin <= 0) disp('Error! Inadmissible value assigned to TICKMIN!') return end % Generate the series of evenly spaced ticks: spacing = datenum([datestr(span(1),1), ' ', datestr(span(1)+datenum('00:00:29'),15)]) +... tickmin/1440 : interval/1440 : span(2); % Cat the evenly spaced series to the unevenly-spaced series and sort the total series; % BTS (BothToSorted) is a vector of indices such that SORTED = BOTH (BTS), and % STB (SortedToBoth) is a vector of indices such that BOTH = SORTED(STB). % (If the same value occurs twice in BOTH (i.e. an original tick occurs on even tick % time), these values are left in the same order by MATLAB.) both = [series(:)' spacing]; [sorted, bts] = sort(both); stb(bts) = 1:length(bts); % So, STB(length(series)+1:end) are the indices of all the evenly spaced observations in % BOTH, so all the index numbers preceding each of these index numbers point to the % observations in BOTH which are the ticks immediately preceding or falling onto evenly- % spaced ticks. Therefore, the corresponding indices in BOTH are given by aggticks = bts (stb(length(series)+1 : end) - 1); % If there is at least one unique tick for each interval, the above AGGTICKS is complete. % But if there are "holes" in the data, i.e. if there are instances of ticks which are not % replaced for subsequent intervals, then some of the indices given by AGGTICKS will point % to the evenly spaced series SPACING in BOTH, so they need to be replaced by the tick % indices preceding them until no such instances are left: repeated = find(aggticks > length(series(:))); while repeated aggticks(repeated) = aggticks(repeated-1); repeated = find(aggticks > length(series(:))); end % End of file.