Digging Into C++ String Stream Processing

Processing strings is a common operation in any programming language. And, almost all programming languages provide some form of library APIs to deal with string processing. The problem is that a string basically is a collection of characters that cannot be deemed as a primitive data type like int, float, char, etc. But, frequent use of a string in programming demands such a behavior. For example, a string variable in general cannot be assigned directly with a string literal, or something as simple as concatenating two strings requires some sort of logic to realize it in code. The idea of providing specific APIs in the library, especially for strings, is to allay the complexity associated with its manipulation and deal with it in a manner as primitive data type, although it actually is not. This article delineates the string processing schemes as supplied by C++ standard library.

String Processing

String processing begins with defining a string type along with the various method of string manipulation such as searching, inserting, erasing, replacing, comparing, and concatenating strings. To begin with, let’s see how a string variable is defined, assigned, and logically represented in memory.

string greetings = "Hello String";

A string variable definition in memory
Figure 1: A string variable definition in memory

Note that, unlike a string declaration in C as a character pointer, char *str = "Hello String"; is null terminated—it would be represented as:

'H','e','l','l','o',' ', 'S','t','r','i','n','g','\0'

C++ strings do not contain a terminating character ‘\0’. This is a very basic and important difference between string handling in C versus C++.

In C++, the header <string> in namespace std defines the basic_string template for manipulating varying length sequences of characters. This <string> library is extensive in the sense that it uses templates to create a family of various string types, such as:

namespace std {

   typedef basic_string<char> string;
   ...
   typedef basic_string<wchar_t> wstring;
   ...
}

A character of wchar_t type is commonly used for supporting the Unicode character set, which is 16-bit characters but that is not fixed by the standard. The typedef wstring is a string representation of the wchar_t character type.

A string object can be initialized with the help of constructor as follows:

string s1 ("Hello String");    // direct initialization,
                               // created from const char *
string s2 = "Hello String";    // copy initialization
string s3 (10, 'z');           // filled with 10 z's
string s4 = string (10, 'z');  // copy initialization,
                               // with 10 z's
string s5 = s4;                // copy s4 into s5

The String Class

The hallmark of C++ string handling is the string class, as stated earlier. Several operators are overloaded for easy manipulation of strings—such as copying, concatenation, and comparison—apart from providing functions to perform searching, erasing, insertion, and replacing. During performance, these operations manage memory allocation and modification takes place automatically, without concerning the programmer of its internal intricacies. A string object created without initialization always starts with size 0. The size gets modified as a string literal gets copied or initialized. Let’s find that out with the help of a simple program.

#include <iostream>
#include <string>
using namespace std;

string global_str1("Enter a string: ");
int main(int argc, char **argv)
{
   string separator(60, '-');
   string s1;
   cout<<"uninitialized string size, s1 =
      "<<s1.size()<<endl;
   cout<<"initialized string size, global_str =
      "<<global_str1.size()<<endl;
   cout<<global_str1;
   getline(cin,s1);
   // the size() and length() are equivalent
   cout<<"Text entered is: '"<<s1<<"' of size =
      "<<s1.length()<<endl;

   string s2(global_str1, 2, 10);
   cout<<"substring of global_str copied to s2 is
      '"<<s2<<"', size = "<<s2.size()<<endl;

   // create more than one paragraph of text
   string text,para;
   cout<<separator<<endl<<"Enter some text:
      "<<endl;
   while(true){
      getline(cin, para);
      if(para.size()==0)
         break;
      // string concatenated with overloaded +
      // operator
      text += "\n" + para;
   }
   cout<<separator<<endl<<"Text you entered
      is ..."<<endl;
   cout<<text<<endl;
   cout<<separator<<endl<<"Text you entered in
      reverse ...\n"<<endl;
   for(int i=text.size();i>=0;i--){
      cout<<text[i];
   }

   return 0;
}

Listing 1: Modifying the size of a string object

Output

Output of Listing 1
Figure 2: Output of Listing 1

Unlike C-style strings, which begin with subscript 0 and end with subscript length()-1, C++ string functions can take a subscript location as an argument and the number of characters to operate upon. A C++ string also overloads the stream extraction operator (>>) to support statements that read a string from cin.

string str1;
cin>>str1;

In this case, the input is delimited by the white-space character. This means that the input given as ‘Hello string’ will extracted as only ‘Hello’ terminated by the white-space character. This is the reason the getline function is overloaded for the string.

getline(cin, str1);

This function reads a string from the keyboard (through the cin object) into str1, delimited by newline (‘\n’), and not white-spaces like an overloaded extraction operator.

The string class also provides an overloaded version of the member function, called assign, that can be used to copy a specified numbers of characters in a string object.

string str1, str2;
string str3="I saw a saw to saw a tree";
str1.assign(str3);
// target string, start index, no. of characters
str2.assign(str3, 2, 3);

String Concatenate, Compare

The string class overloads operators like + and += to realize concatenation of strings and operators like ==, !=, <,>,<=,and >= are defined to realize string comparison. However, they do not violate the common rules of precedence, such as + precedes comparison operators which precede assignment operators = and +=.

string s1, s2("higher"), s3(" you "),
   s4(" go");
s1 = s2 + s3 + s4;

There is a specific overloaded member function to concatenate or append a string.

s1.append(" the lighter you feel");
// append from 14

th

 index of s1 string

String comparison is done lexicographically and comparison can be done with logical operators or the compare member function.

string s1("ac"), s2("ab");
if(s1==s2)
   cout<<"s1==s2";
else if(s1>s2)
   cout<<"s1>s2";
else
   cout<<"s1<s2";

When comparison is done between strings, say s1 and s2, if s1 is lexicographically greater that s2, a positive number is returned. If the result is equal, 0 is returned; otherwise, a negative value is returned.

int k = s1.compare(s2);
if(k==0)
   cout<<"s1==s2";
else if(k>0)
   cout<<"s1>s2";
else
   cout<<"s1<s2";

String comparison can be performed on a substring or part of a string. In such a case, we can use the overloaded version of the compare function.

string s1("synchronize"), s2("sync");
int k = s1.compare(0,4,s2); //s1==s2

The first argument, 0, specifies the starting subscript; the second argument, 4, denotes the length; and the third argument is the reference string to compare.

Some More Common String Operations

Some of the other common operations performed by the member functions of string class are as follows.

  • The class string provides the swap member function for swapping strings.
    string s1("tick"),s2("tock");
    cout<<"Before swap "<<s1<<"-"<<s2<<endl;
    s1.swap(s2);
    cout<<"After  swap "<<s1<<"-"<<s2<<endl;
    
  • The member function substr is used to retrieve a substring from a string. The first argument is the subscript of the string to begin and the second argument is the string length.
    string s1("...tolerant as a tree");
    cout<<s1.substr(3, 8);
    
  • The member functions that provide information about the characteristics of string are as follows:
    string s1;
    cout<<"Is empty string?
       "<<(s1.empty()?"yes":"no")<<endl;
    cout<<"Capacity: "<<s1.capacity()<<endl;
    cout<<"Maximum size: "<<s1.max_size()<<endl;
    cout<<"Length: "<<s1.length()<<endl;
    cout<<"------------------------"<<endl;
    s1="fiddler on the roof";
    cout<<"Is empty string?
       "<<(s1.empty()?"yes":"no")<<endl;
    cout<<"Capacity: "<<s1.capacity()<<endl;
    cout<<"Maximum size: "<<s1.max_size()<<endl;
    cout<<"Length: "<<s1.length()<<endl;
    
  • Searching the substring in a string, erasing, replacing, and inserting text.
    #include <iostream>
    #include <string>
    using namespace std;
    
    int main(int argc, char **argv)
    {
       string s1("They are allone and the same,"
       "The mind follows matter, and whatever "
       "it thinks of is also material");
       cout<<"--------------------"<<endl;
       cout<<"Original text"<<endl;
       cout<<"--------------------"<<endl;
       cout<<s1<<endl;
       cout<<"--------------------"<<endl;
       cout<<"Replaced ' '(space) with '&nbsp;'"
          <<endl;
       cout<<"--------------------"<<endl;
       // erased the extra word 'all' from
       // 'allone'
       // in the text
       s1.erase(9,3);
       size_t spc=s1.find(" ");
       while(spc!=string::npos){
          s1.replace(spc,1,"&nbsp;");
          spc=s1.find(" ",spc+1);
       }
       cout<<s1<<endl;
       cout<<"--------------------"<<endl;
       cout<<"Back to original text"<<endl;
       cout<<"--------------------"<<endl;
       spc=s1.find("&nbsp;");
       while(spc!=string::npos){
          s1.replace(spc,6," ");
          spc=s1.find("&nbsp;",spc+1);
       }
       cout<<s1<<endl;
       cout<<"--------------------"<<endl;
       cout<<"Inserting new text into existing
          text"<<endl;
       cout<<"--------------------"<<endl;
       s1.insert(0,"Matter or Transcendence? ");
       cout<<s1<<endl;
    
       return 0;
    }
    
Editor’s Note: In the preceding code listing, each group of 48 hyphens was reduced to a groups of 20 hyphens to fit the available space without breaking the code lines unnecessarily.

C-style String Operations in a String Class

The string class in C++ also provides member functions to convert string objects to C-style pointer-based strings. The following example illustrates how it may be done.

#include <iostream>
#include <string>
using namespace std;

int main(int argc, char **argv)
{
   string s1("ABC");

   // copying characters into allocated memory
   int len=s1.length();
   char *pstr1=new char[len+1];
   s1.copy(pstr1,len,0);
   pstr1[len]='\0';
   cout<<pstr1<<endl;

   // string converted to C-style string
   cout<<s1.c_str()<<endl;

   // function data() returns const char *
   // this is not a good idea because pstr3
   // can become invalid if the value of s1 changes
   const char *pstr3=s1.data();
   cout<<pstr3;

   return 0;
}

Using Iterators with Strings

We can use iterators with string objects in the following manner.

string s1("a rolling stone gathers no moss");
for(string::iterator iter=s1.begin();iter!=s1.end();iter++)
   cout<<*iter;

Strings and IO Stream

A C++ stream IO can be used to operate directly with the string in memory. It provides two supporting class for that. One is called isstringstream for input, and ostringstream for output. They are basically typedefs of template class basic_istringstream and basic_ostringstream, respectively.

typedef basic_istringstream<char> istringstream;
typedef basic_ostringstream<char> ostringstream;

These template classes provide the same functionality as istream and ostream in addition to their own member functions for in-memory formatting.

The ostringstream object uses string objects to store output data. It has a member function called str(), which returns a copy of that string. The ostringstream object uses a stream insertion operator to output a collection of strings and numeric values to the object. Data is appended to the in-memory string with the help of the stream insertion operator.

The istringstream object inputs data from a in-memory string to program variables. Data is stored in the form of a character. The input from the istringstream objects works in a manner similar to input from any file where the end of string is interpreted by the istringstream objects as end-of-file marker.

To get an idea of what this class object can do, let’s implement it with a simple example.

#include <iostream>
#include <string>
#include <sstream>
using namespace std;

int main(int argc, char **argv)
{
   ostringstream out;
   out<<"Float value = "<<1.3<<endl
      <<"and int value =
         "<<123<<"\t"<<"tabbed"<<endl;
   cout<<out.str();

   string s1("0 1 2 3 4 5 6 7 8 9");
   istringstream in(s1);
   while(in.good()){
      int ival;
      in>>ival;
      cout<<ival<<endl;
   }
   return 0;
}

Listing 2: Observing the istringstream objects

Output

Output of Listing 2
Figure 3: Output of Listing 2

Conclusion

The standard C++ library class string provides all that is required for string processing apart from some out-of-the-box convenient functionality. It is better to stick to the object oriented way of handling string than resorting to C-style of string handling, although C++ supports both the way. This thumb-rule will not only enhance the readability of the code but also make less prone to bug in the code.

Manoj Debnath
Manoj Debnath
A teacher(Professor), actively involved in publishing, research and programming for almost two decades. Authored several articles for reputed sites like CodeGuru, Developer, DevX, Database Journal etc. Some of his research interest lies in the area of programming languages, database, compiler, web/enterprise development etc.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read