|
Strings of characters play a central role in input/output so that the
operations provided for strings to some extent reflect this. However,
if one wishes, a more general set of operations are available if the
string is first converted into a sequence. We will give some examples
of this below.
Magma provides two kinds of strings: normal character strings, and binary strings. Character strings are an inappropriate choice for
manipulating data that includes non-printable characters. If this is required,
a better choice is the binary string type. This type is similar semantically
to a sequence of integers, in which each character is represented by its ASCII
value between 0 and 255. The difference between a binary string and a sequence
of integers is that a binary string is stored internally as an array of bytes,
which is a more space-efficient representation.
Character strings may consist of all ordinary characters appearing on your
keyboard, including the blank (space). Two symbols have a special meaning: the
double-quote " and the backslash . The double-quote is used to
delimit a character string, and hence cannot be used inside a string; to be
able to use a double-quote in strings the backslash is designed to be an escape
character and is used to indicate that the next symbol has to be taken
literally; thus, by using \" inside a string one indicates that the
symbol " has to be taken literally and is not to be interpreted as the
end-of-string delimiter. Thus:
> "\"Print this line in quotes\"";
"Print this line in quotes"
To obtain a literal backslash, one simply types two backslashes;
for characters other than double-quotes and backslash
it does not make a difference when a backslash precedes them inside a string,
with the exception of n, r and t. Any occurrence of
\n or \r inside a string is converted into a <new-line>
while \t is converted into a <tab>. For example:
> "The first line,\nthe second line, and then\ran\tindented line";
The first line,
the second line, and then
an indented line
Note that a backslash followed by a return allows one to conveniently
continue the current construction on the next line; so \<return>
inside a string will be ignored, except that input will continue on a new
line on your screen.
Binary strings, on the hand, can consist of any character, whether printable or
non-printable. Binary strings cannot be constructed using literals, but must
be constructed either from a character string, or during a read operation from
a file.
Create a string from a succession of keyboard characters (a, b, c) enclosed
in double quotes " ".
BString(s) : MonStgElt -> BStgElt
Create a binary string from the character string s.
s cat t : BStgElt, BStgElt -> BStgElt
s * t : MonStgElt, MonStgElt -> MonStgElt
Concatenate the strings s and t.
s cat:= t : BStgElt, BStgElt -> BStgElt
s *:= t : MonStgElt, MonStgElt -> MonStgElt
Modification-concatenation of the string s with t: concatenate
s and t and put the result in s.
&cat s : [ BStgElt ] -> BStgElt
&* s : [ MonStgElt ] -> MonStgElt
Given an enumerated sequence s of strings, return the concatenation
of these strings.
Form the n-fold concatenation of the string s, for n≥0. If n=0 this is
the empty string, if n=1 it equals s, etc.
Returns the substring of s consisting of the i-th character.
Returns the numeric value representing the i-th character of s.
Eltseq(s) : MonStgElt -> [ MonStgElt ]
Returns the sequence of characters of s (as length 1 strings).
Eltseq(s) : BStgElt -> [ BStgElt ]
Returns the sequence of numeric values representing the characters of s.
Substring(s, n, k) : BStgElt, RngIntElt, RngIntElt -> BStgElt
Return the substring of s of length k starting at position n.
# s : BStgElt -> RngIntElt
The length of the string s.
Position(s, t) : MonStgElt, MonStgElt -> RngIntElt
This function returns the position (an integer p with 0 < p≤#s) in
the string s where the beginning of a contiguous
substring t occurs. It returns 0
if t is not a substring of s. (If t is the empty string, position 1
will always be returned, even if s is empty as well.)
To perform more sophisticated operations, one may convert
the string into a sequence and use the extensive facilities for
sequences described in the next part of this manual; see the examples
at the end of this chapter for details.
Returns the code number of the first character of string s.
This code depends on the computer system that is used; it is ASCII
on most UNIX machines.
Returns a character (string of length 1) corresponding to the code number n,
where the code is system dependent (see previous entry).
Returns the integer corresponding to the string of decimal digits s.
All non-space characters in the string s must be digits (0, 1, ..., 9),
except the first character, which is also allowed to be + or -.
An error results if any other combination of characters occurs.
Leading zeros are omitted.
Returns the integer corresponding to the string of digits s, all assumed to
be written in base b.
All non-space characters in the string s must be digits less than b (if
b is greater than 10, `A' is used for 10, `B' for 11, etc.),
except the first character, which is also allowed to be + or -.
An error results if any other combination of characters occurs.
Returns the sequence of integers corresponding to the string s of
space-separated decimal numbers.
All non-space characters in the string s must be digits (0, 1, ..., 9),
except the first character after each space, which is also allowed to be
+ or -. An error results if any other combination of characters occurs.
Leading zeros are omitted. Each number can begin with a sign (+ or -)
without a space.
Convert the integer n into a string of decimal digits; if n is negative the
first character of the string will be -. (Note that leading zeros
and a + sign are ignored when Magma builds an integer, so the resulting
string will never begin with + or 0 characters.)
Convert the integer n into a string of digits with the given base (which
must be in the range [2 ... 36]); if n is negative the
first character of the string will be -.
s eq t : BStgElt, BStgElt -> BoolElt
Returns true if and only if the strings s and t are identical.
Note that blanks are significant.
s ne t : BStgElt, MonStgElt -> BoolElt
Returns true if and only if the strings s and t are distinct.
Note that blanks are significant.
Returns true if and only if s appears as a contiguous substring of t.
Note that the empty string is contained in every string.
Returns true if and only if s does not appear as a contiguous substring of t.
Note that the empty string is contained in every string.
s lt t : BStgElt, BStgElt -> BoolElt
Returns true if s is lexicographically less than t, false otherwise. Here
the ordering on characters imposed by their ASCII code number is used.
s le t : BStgElt, BStgElt -> BoolElt
Returns true if s is lexicographically less than or equal to t, false otherwise. Here
the ordering on characters imposed by their ASCII code number is used.
s gt t : BStgElt, BStgElt -> BoolElt
Returns true if s is lexicographically greater than t, false otherwise. Here
the ordering on characters imposed by their ASCII code number is used.
s ge t : BStgElt, BStgElt -> BoolElt
Returns true if s is lexicographically greater than or equal to t, false otherwise. Here
the ordering on characters imposed by their ASCII code number is used.
> "Mag" cat "ma";
Magma
Omitting double-quotes usually has undesired effects:
> "Mag cat ma";
Mag cat ma
And note that there are two different equalities involved in the following!
> "73" * "9" * "42" eq "7" * "3942";
true
> 73 * 9 * 42 eq 7 * 3942;
true
The next line shows how strings can be concatenated quickly, and also
that strings of blanks can be used for formatting:
> s := ("Mag" cat "ma? ")^2;
> s, " "^30, s[4]^12, "!";
Magma? Magma? mmmmmmmmmmmm !
Here is a way to list (in a sequence) the first occurrence
of each of the ten digits in the decimal expansion of π, using
IntegerToString and Position.
> pi := Pi(RealField(1001));
> dec1000 := Round(10^1000*(pi-3));
> I := IntegerToString(dec1000);
> [ Position(I, IntegerToString(i)) : i in [0..9] ];
[ 32, 1, 6, 9, 2, 4, 7, 13, 11, 5 ]
Using the length # and string indexing [ ]
it is also easy to count the number
of occurrences of each digit in the string containing the first 1000 digits.
> [ #[i : i in [1..#I] | I[i] eq IntegerToString(j)] : j in [0..9] ];
[ 93, 116, 103, 102, 93, 97, 94, 95, 101, 106 ]
We would like to test if the ASCII-encoding of the string `Magma' appears.
This could be done as follows, using StringToCode and in, or
alternatively, Position.
To reduce the typing, we first abbreviate IntegerToString to its
and StringToCode to sc.
> sc := StringToCode;
> its := IntegerToString;
> M := its(sc("M")) * its(sc("a")) * its(sc("g")) * its(sc("m")) * its(sc("a"));
> M;
779710310997
> M in I;
false
> Position(I, M);
0
So `Magma' does not appear this way. However, we could be satisfied if the letters
appear somewhere in the right order. To do more sophisticated operations (like this)
on strings, it is necessary to convert the string into a sequence, because
sequences constitute a more versatile data type, allowing many more advanced
operations than strings.
> Iseq := [ I[i] : i in [1..#I] ];
> Mseq := [ M[i] : i in [1..#M] ];
> IsSubsequence(Mseq, Iseq);
false
> IsSubsequence(Mseq, Iseq: Kind := "Sequential");
true
Finally, we find that the string `magma' lies in between
`Pi' and `pi':
> "Pi" le "magma";
true
> "magma" lt "pi";
true
Split(S) : MonStgElt -> [ MonStgElt ]
IncludeEmpty: BoolElt Default: false
Given a string S, together with a string D describing a list of separator
characters, return the sequence of strings obtained by splitting S at
any of the characters contained in D. That is, S is considered as
a sequence of fields, with any character in D taken to be a delimiter
separating the fields.
If D is omitted, it is taken to be the string consisting of the newline
character alone (so S is split into the lines found in it). If S
is desired to be split into space-separated words, the argument
" \t\n" should be given for D.
By default, empty fields are not returned. This may be changed by
setting the parameter IncludeEmpty to true.
We demonstrate elementary uses of Split.
> Split("a b c d", " ");
[ a, b, c, d ]
> // Note that adjacent separators do not produce
> // extra fields by default:
> Split("a||b|c", "|");
[ a, b, c ]
> // But they can be made to appear with IncludeEmpty:
> Split("a||b|c", "|" : IncludeEmpty := true);
[ a, , b, c ]
> Split("abxcdyefzab", "xyz");
[ ab, cd, ef, ab ]
> // Note that no splitting happens if the delimiter
> // is empty:
> Split("abcd", "");
[ abcd ]
Given a string R specifying a regular expression, together with a string S,
return whether S matches R.
If so, return also the matched substring of S, together with the sequence of
matched substrings of S corresponding to the parenthesized expressions
of R.
This function is based on the freely distributable reimplementation of the V8
regexp package by Henry Spencer. The syntax and interpretation of the
characters |, *, +, ?, ^, $, [],
is the same as in the UNIX command egrep. The
parenthesized expressions are numbered in left-to-right order of their
opening parentheses. Note that the parentheses should not have an
initial backslash before them as the UNIX commands grep and
ed require.
We demonstrate some elementary uses of Regexp.
> Regexp("b.*d", "abcde");
true bcd []
> Regexp("b(.*)d", "abcde");
true bcd [ c ]
> Regexp("b.*d", "xyz");
false
> date := "Mon Jun 17 10:27:27 EST 1996";
> _, _, f := Regexp("([0-9][0-9]):([0-9][0-9]):([0-9][0-9])", date);
> f;
[ 10, 27, 27 ]
> h, m, s := Explode(f);
> h, m, s;
10 27 27
[Next][Prev] [Right] [Left] [Up] [Index] [Root]
|