3339

How do I iterate over the words of a string composed of words separated by whitespace?

Note that I'm not interested in C string functions or that kind of character manipulation/access. I prefer elegance over efficiency. My current solution:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Ashwin Nanjappa
  • 76,204
  • 83
  • 211
  • 292
  • 686
    Dude... Elegance is just a fancy way to say "efficiency-that-looks-pretty" in my book. Don't shy away from using C functions and quick methods to accomplish anything just because it is not contained within a template ;) –  Oct 25 '08 at 09:04
  • 19
    `while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; }` – isekaijin Sep 29 '09 at 15:47
  • @nlaq, Except that you'd have to convert your string object using c_str(), and back to a string again if you still needed it to be a string, no? – Aaron H. Feb 15 '11 at 00:00
  • 28
    @Eduardo: that's wrong too... you need to test iss between trying to stream another value and using that value, i.e. `string sub; while (iss >> sub) cout << "Substring: " << sub << '\n';` – Tony Delroy Apr 11 '12 at 02:24
  • 14
    Various options in C++ to do this by default: http://www.cplusplus.com/faq/sequences/strings/split/ – hB0 Oct 31 '13 at 00:23
  • 27
    There's more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility. IMHO Elegance is not a proxy for efficiency but maintainability. – Matt Mar 31 '17 at 13:22
  • 3
    Most of the answers here are notably latin-centric. Many of the answers assume a single character can be used as 'whitespace' even though the question defines the delimiter to be whitespace. Unicode has at least 25 whitespace characters. But word-delimiting is not merely a whitespace issue. For instance, in syllabic writing, such as Tibetan, word delimitation is a semantic, rather than syntactic, problem. Therefore, using whitespace to extract words is not a suitable approach for many languages. – Konchog Oct 29 '18 at 12:08
  • Small addition to the above. You can add a locale facet that treats punctuation as space so you don't need to handle that separately. https://codereview.stackexchange.com/a/57467/507 – Martin York Feb 20 '19 at 21:26

83 Answers83

2577

I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.

#include <string>
#include <sstream>
#include <vector>
#include <iterator>

template <typename Out>
void split(const std::string &s, char delim, Out result) {
    std::istringstream iss(s);
    std::string item;
    while (std::getline(iss, item, delim)) {
        *result++ = item;
    }
}

std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, std::back_inserter(elems));
    return elems;
}

Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:

std::vector<std::string> x = split("one:two::three", ':');
Evan Teran
  • 87,561
  • 32
  • 179
  • 238
  • elegant solution, I always forget about this particular "getline", thou I do not believe it is aware of quotes and escape sequences. – boskom May 27 '10 at 13:32
  • @stijn: are you saying that `split("one two three", ' ');` returns a vector with 4 elements? I'm not sure that is the case, but I'll test it. – Evan Teran Nov 09 '10 at 15:45
  • wait, it seems the formatting removed some spaces (or I forgot them): I'm talking about the string "one two three" with 2 spaces between "two" and "three" – stijn Nov 09 '10 at 18:54
  • 2
    I liked this solution, however, I wrapped the function in a template, changing the vectors std::string template parameter into a parameter. For me, I also used boost::lexical_cast on said template parameter in the push_back. – Kit10 Aug 09 '12 at 19:30
  • How can I modify it to work with std::wstring, std::getline won't work right? – キキジキ Nov 19 '12 at 09:09
  • 1
    `std::getline` is templated, so it may "just work", if not see http://en.cppreference.com/w/cpp/string/basic_string/getline to figure out how to tweak it. Passing a `wchar_t` character as the delim may be enough to trigger the right template. – Evan Teran Nov 19 '12 at 16:29
  • if you are enabling return value optimization, can't you make the function to return void? – Rozuur Jul 10 '13 at 14:52
  • 96
    In order to avoid it skipping empty tokens, do an `empty()` check: `if (!item.empty()) elems.push_back(item)` – David G Nov 09 '13 at 22:33
  • 13
    How about the delim contains two chars as `->`? – herohuyongtao Dec 26 '13 at 08:15
  • 9
    @herohuyongtao, this solution only works for single char delimiters. – Evan Teran Dec 27 '13 at 06:11
  • 1
    @Copperpot How did you do it in a template? – loop Jan 12 '14 at 23:02
  • 2
    @EvanTeran This may be not regarding splitting the string but general doubt in your code, The elems you are passing as an reference argument and returning the reference again. I just wanted to know is there any reason for that? – duslabo Jan 25 '14 at 17:27
  • 4
    @JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this: `f(split(s, d, v))` while still having the benefit of a pre-allocated `vector` if you like. – Evan Teran Jan 25 '14 at 17:50
  • 10
    Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value. – dshin Sep 09 '15 at 19:04
  • 3
    almost perfect: `split(":abc:def:", ':');` returns only 3 instead of 4 elements! – fmuecke Sep 09 '15 at 20:31
  • Being able to set max number of returned elements is crucial to me. – Jonny Oct 29 '15 at 01:25
  • 1
    @Jonny, should be trivial, just add an extra condition to the while loop comparing the `vector`'s size to the max. Something like this: `while (elems.size() < max_count && std::getline(ss, item, delim)) {` – Evan Teran Oct 29 '15 at 05:57
  • @Jonny, I see. Your answer looks a bit more complex than necessary. If you make the max default to something like `size_t(-1)`, that will effectively be "infinity" (it's the biggest size your system can represent, so you'll run out of RAM before you hit this). Then you can make the condition as simple as my comment above. No more need to double check the stream state and do a second read and such. Just a suggestion :-). – Evan Teran Oct 29 '15 at 06:02
  • Might be wrong but you might lose the end of the string with that. Well basically I mimic the explode function of php, or so I believe. – Jonny Oct 29 '15 at 06:08
  • Gotcha. My solution will **stop** at `max_count`, skipping the rest of the string (since it found the amount it wanted). I guess you are looking for something that will always make the last one the rest of the string. I have some functions like that too here: https://github.com/eteran/cpp-utilities/blob/master/string.h Some are specifically designed to match php's string manipulation functions as closely as possible :-) – Evan Teran Oct 29 '15 at 06:21
  • Why not `return split(s, delim, std::vector());` ? – Gabriel Oct 29 '15 at 19:53
  • 2
    @Gabriel, you could. But I think when it was written (a few years ago), having a named variable encouraged NVRO more reliably. With C++11 move semantics, it may be a lot less of a difference. – Evan Teran Oct 30 '15 at 03:16
  • be aware that if you are using OpenCV, split can be confused with split from OpenCV that splits images. – Diedre Jun 20 '17 at 16:07
  • 3
    I really wish they'd add a standard method with this signature: `vector std::string::split(char delimiter = ' ');` – doctorram Feb 02 '18 at 22:26
  • @loop See https://gitlab.com/tbeu/wcx_setfolderdate/blob/master/src/splitstring.h for a templated implementation. – tbeu Jul 07 '19 at 20:56
  • @tbeu fixing your link: https://gitlab.com/tbeu/wcx_setfolderdate/-/blob/master/src/splitstring.h – luizfls Mar 20 '20 at 04:17
  • As others noted this does not correctly handle emtpy strings at the end. (This is not a matter of definition since "a,b," and "a,b" both give the same result.) This can be fixed by initializing iss with s + delim and handling the special case that an empty strig should return an empty list explicitly. – Johannes Overmann Nov 11 '21 at 23:58
1504

For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
    using namespace std;
    string sentence = "And I feel fine...";
    istringstream iss(sentence);
    copy(istream_iterator<string>(iss),
         istream_iterator<string>(),
         ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
     istream_iterator<string>(),
     back_inserter(tokens));

... or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},
                      istream_iterator<string>{}};
Rabbid76
  • 202,892
  • 27
  • 131
  • 174
Zunino
  • 2,034
  • 1
  • 13
  • 12
  • 177
    Is it possible to specify a delimiter for this? Like for instance splitting on commas? – l3dx Aug 06 '09 at 11:49
  • 7
    @l3dx: it seems that the parameter "\n" is the delimiter. This code is very nice, but I would like to know better about it. Maybe somebody could explain each line of that snippet? – Jonathan Dec 11 '09 at 17:30
  • 17
    @Jonathan: \n is not the delimiter in this case, it's the deliminer for outputting to cout. – huy Feb 03 '10 at 12:37
  • 1
    based on this: http://www.cplusplus.com/reference/algorithm/copy/ no. The whitespace behavior is a function of the `istream_iterator`. It would be more elegant to roll your own. – Wayne Werner Aug 04 '10 at 17:59
  • 5
    @graham.reeds, @l3dx: Please don't write another CSV parser which can't handle quoted fields: http://en.wikipedia.org/wiki/Comma-separated_values – Douglas Sep 01 '10 at 09:30
  • 801
    This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable. – ABCD Jan 10 '11 at 03:57
  • 12
    To people asking how this works: equivalent code using less of the STL would look like `string token; istringstream iss(sentence); while (iss >> token) { cout << token; }` or `{ tokens.push_back(token); }` – user470379 Feb 07 '11 at 05:11
  • Why do I get "error C2664: 'std::back_inserter' : cannot convert parameter 1 from 'std::vector<_Ty> (__cdecl *)(void)' to 'std::vector<_Ty> &'" in VS2008? – szx Apr 17 '11 at 10:22
  • The template argument to `back_inserter` should be `string`, not `vector`. That is, it should be `back_inserter(tokens)`, not `back_inserter>(tokens)`. – Nawaz May 27 '12 at 14:56
  • 2
    Take a look at ranges if you care about elegance in practical terms (i.e. do more with less code): http://www.slideshare.net/rawwell/iteratorsmustgo – Alexei Sholik Oct 17 '12 at 18:27
  • 44
    Actually, this *can* work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings. – Jerry Coffin Dec 19 '12 at 20:30
  • 1
    The main purpose of istream_iterator is it can parse int, float, double, etc from an istream: istream_iterator does a decent job reading doubles separated by space. With a front or especially back inserter it's a great combo! :) – Oleg Jan 11 '13 at 02:48
  • 5
    `vector` has a ctor that takes a begin and end iterator, so no need for the copy call to insert them into a container. – legends2k Jan 13 '13 at 18:41
  • 67
    @Kinderchocolate *"The string can be assumed to be composed of words separated by whitespace"* - Hmm, doesn't sound like a poor solution to the question's problem. *"not scalable and not maintable"* - Hah, nice one. – Christian Rau Feb 07 '13 at 15:08
  • @Nawaz [Why should it?](http://en.cppreference.com/w/cpp/iterator/back_inserter) You're inserting into a `std::vector` and not into a `std::string`. But then again, there shouldn't be an explicit template argument, anyway (well, there shouldn't even be a `back_inserter` or `copy`, but ok). – Christian Rau Feb 07 '13 at 15:12
  • @ChristianRau: Oh you're right; the first code-snippet probably confused me. Actually I should have said you don't need to mention the template argument in `std::back_inserter`; in fact, mentioning template argument defies the very purpose of `back_inserter`. – Nawaz Feb 07 '13 at 16:30
  • 2
    why do you need to use curly brackets in `vector tokens{istream_iterator{iss}, istream_iterator{}};` is it because otherwise it looks like function call? – stewart99 Jan 07 '14 at 05:06
  • Questions: 1. why would `istream_iterator` stop at white spaces? For me spaces are also part of the string; 2. why is it very inefficient? – Ziyuan Apr 22 '15 at 12:23
  • 14
    The elegance in needing 5 includes, 3 lines (not counting ```using ``` and quite cryptic code to... split a string? dear god. – Michahell Apr 22 '15 at 15:42
  • 1
    We could also have used STL to split a string. – Moiz Sajid Aug 30 '15 at 11:31
  • This is much faster than Evan Teran's answer if you only need to split on whitespace. – noɥʇʎԀʎzɐɹƆ Jul 07 '16 at 15:23
  • While the missing delimiter concern is correct one should take into account that the OPs solution couldn't handle that either. So this seems to be not a requirement. – exilit Jul 21 '16 at 20:40
  • @doorfly The only place where curly brackets are needed is `istream_iterator{}`, because that would otherwise be regarded as a function. – Seppo Enarvi Feb 28 '17 at 20:31
  • If using `wstring` and your code breaks, check this answer for fixing the istream_iterator usage with `wchar_t`: https://stackoverflow.com/a/20959347/3543437 – kayleeFrye_onDeck Jul 03 '18 at 20:44
  • @l3dx Yes. You can add a specialized local to the stream that makes a , a space (and all other characters not a space). Then the code will work just the same. codereview.stackexchange.com/a/57467/507 – Martin York Feb 20 '19 at 21:30
  • 1
    This code could really use some comments to explain what the *purpose* of every item is. A typical person asking this question is only going to end up with more questions after reading this, e.g. what the purpose of the empty istream_iterator is, or why the "create the vector directly" solution has so many brackets. –  Oct 14 '19 at 21:17
  • 1
    I don't think there is any power or elegance in this, compared to just `std::string::split()`. Of course there is not such `split` in STL – tjysdsg May 14 '20 at 12:00
  • You can set the delimiter of istringstream https://stackoverflow.com/a/21814768/1943599 – Mellester Jul 02 '20 at 17:44
873

A possible solution using Boost might be:

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));

This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.

See the documentation for details.

LihO
  • 41,190
  • 11
  • 99
  • 167
ididak
  • 5,790
  • 1
  • 20
  • 21
  • 37
    Speed is irrelevant here, as both of these cases are much slower than a strtok-like function. – Tom Mar 01 '09 at 16:51
  • 4
    This is practical and quick enough if you know the line will contain just a few tokens, but if it contains many then you will burn a ton of memory (and time) growing the vector. So no, it's not faster than the stringstream solution -- at least not for large n, which is the only case where speed matters. – j_random_hacker Aug 24 '09 at 09:02
  • 55
    And for those who don't already have boost... bcp copies over 1,000 files for this :) – Roman Starkov Jun 09 '10 at 20:12
  • 14
    Warning, when given an empty string (""), this method return a vector containing the "" string. So add an "if (!string_to_split.empty())" before the split. – Offirmo Oct 11 '11 at 13:10
  • 29
    @Ian Embedded developers aren't all using boost. – ACK_stoverflow Jan 31 '12 at 18:23
  • 12
    @ACK_stoverflow are embedded developers using C++ anyway? – WDRust Feb 25 '12 at 08:10
  • 3
    `bcp`'ing this brings forth libraries such as the MPL, which I think is **really hardly needed** to split text. Man it is a PITA... – Luis Machuca Mar 20 '12 at 18:24
  • 3
    @j_random_hacker: "at least not for large n, which is the only case where speed matters" - also for smallish n in a large-n loop... – Tony Delroy Apr 11 '12 at 02:29
  • 6
    @tuxSlayer: various POSIX/XOPEN/UNIX standards also specify `strtok_r` – Tony Delroy Apr 11 '12 at 02:31
  • 1
    @TonyDelroy: Yeah, and it looks like in msvc it is called strtok_s (meaning safe?:)). Not too portable... – tuxSlayer Apr 11 '12 at 08:05
  • 1
    @tuxSlayer: if you'd prefer to write your own implementation instead of have a five line `#if`/`#else`/`#endif` then knock yourself out.... – Tony Delroy Jun 06 '12 at 15:43
  • 1
    Use std::string::find(..) and std::string::substr(..) no need to use boost. – Nils Jul 03 '12 at 12:49
  • 1
    actually in our company we are not allowed to use boost due to security, yeah i know but suits have decided. – AndersK Aug 27 '12 at 06:20
  • 33
    as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed by the cons. – GMasucci May 22 '13 at 08:19
  • 1
    nice it even works for calling of boost framework in xcode (iOS project) in cpp class – user2083364 Aug 21 '13 at 09:48
  • 6
    My personal opinion is that C and C++ are languages not meant to be agile or to provide fast to market solutions, using Boost is almost the same as choosing an higher level language that offer more abstraction, for those we choose Java, C#, etc... Because for those we don't care for exactly what it's doing beneath the hood. Using Boost would also mean that I would have to tell my client that I'm including a third party library. Thanks anyway. :) – Tiago May 04 '15 at 10:55
  • Can the boost::split really work on the utf-8 string? Can you share any documentation for that? I am trying to split a utf-8 string at newlines. Will the boost::split work correctly if the string that I pass is using utf-8 encoding? – sajas Mar 02 '16 at 12:58
  • @Andrew: ```any_of``` has been part of the standard library since 2011: http://en.cppreference.com/w/cpp/algorithm/all_any_none_of – graham.reeds May 24 '17 at 10:56
412
#include <vector>
#include <string>
#include <sstream>

int main()
{
    std::string str("Split me by whitespaces");
    std::string buf;                 // Have a buffer string
    std::stringstream ss(str);       // Insert the string into a stream

    std::vector<std::string> tokens; // Create vector to hold our words

    while (ss >> buf)
        tokens.push_back(buf);

    return 0;
}
JeJo
  • 30,635
  • 6
  • 49
  • 88
kev
  • 155,172
  • 47
  • 273
  • 272
  • 25
    You can also split on other delimiters if you use `getline` in the `while` condition e.g. to split by commas, use `while(getline(ss, buff, ','))`. – Ali Oct 06 '18 at 20:20
  • I don't understand how this got 400 upvotes. This is basically the same as in OQ: use a stringstream and >> from it. Exactly what OP did even in revision 1 of the question history. – Thomas Weller Oct 17 '22 at 16:46
200

An efficient, small, and elegant solution using a template function:

template <class ContainerT>
void split(const std::string& str, ContainerT& tokens,
           const std::string& delimiters = " ", bool trimEmpty = false)
{
   std::string::size_type pos, lastPos = 0, length = str.length();
   
   using value_type = typename ContainerT::value_type;
   using size_type = typename ContainerT::size_type;
   
   while (lastPos < length + 1)
   {
      pos = str.find_first_of(delimiters, lastPos);
      if (pos == std::string::npos)
         pos = length;

      if (pos != lastPos || !trimEmpty)
         tokens.emplace_back(value_type(str.data() + lastPos,
               (size_type)pos - lastPos));

      lastPos = pos + 1;
   }
}

I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<...> may sometimes be preferred over vector<...>.

It also allows you to specify whether to trim empty tokens from the results via a last optional parameter.

All it requires is std::string included via <string>. It does not use streams or the boost library explicitly but will be able to accept some of these types.

Also since C++-17 you can use std::vector<std::string_view> which is much faster and more memory-efficient than using std::string. Here is a revised version which also supports the container as a return type:

#include <vector>
#include <string_view>
#include <utility>
    
template < typename StringT,
           typename DelimiterT = char,
           typename ContainerT = std::vector<std::string_view> >
ContainerT split(StringT const& str, DelimiterT const& delimiters = ' ', bool trimEmpty = true, ContainerT&& tokens = {})
{
    typename StringT::size_type pos, lastPos = 0, length = str.length();

    while (lastPos < length + 1)
    {
        pos = str.find_first_of(delimiters, lastPos);
        if (pos == StringT::npos)
            pos = length;

      if (pos != lastPos || !trimEmpty)
            tokens.emplace_back(str.data() + lastPos, pos - lastPos);

        lastPos = pos + 1;
    }

    return std::forward<ContainerT>(tokens);
}

Care has been taken not to make any unneeded copies.

This will allow for either:

for (auto const& line : split(str, '\n'))

Or:

auto& lines = split(str, '\n');

Both returning the default template container type of std::vector<std::string_view>.

To get a specific container type back, or to pass an existing container, use the tokens input parameter with either a typed initial container or an existing container variable:

auto& lines = split(str, '\n', false, std::vector<std::string>());

Or:

std::vector<std::string> lines;
split(str, '\n', false, lines);
Marius
  • 3,372
  • 1
  • 30
  • 36
  • 5
    I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and typenames: `typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType;` Then to substitute out the value_type and size_types accordingly. –  Nov 28 '11 at 21:41
  • 13
    For those of us for whom the template stuff and the first comment are completely foreign, a usage example cmplete with required includes would be lovely. – Wes Miller Aug 17 '12 at 11:51
  • 3
    Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed (ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as tokenize( some_string, some_vector ); – Wes Miller Aug 17 '12 at 14:23
  • 1
    Could someone be so kind as to provide a "summary" as to why this code has much greater performance? – user997112 Oct 14 '12 at 16:57
  • 2
    Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible instructions and also as little as possible memory copies enabled by the use of a substring class that only references offsets/lengths in other strings. (I rolled my own, but there are some other implementations). Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible. – Marius Nov 29 '12 at 14:50
  • Maybe there's a bug. Given "xxxabcyyyabczzzabc" and "abo", the split result is "xxx|cyyy|czzz|c". – Guosheng Aug 26 '15 at 09:23
  • 3
    That's the correct output for when `trimEmpty = true`. Keep in mind that `"abo"` is not a delimiter in this answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of characters (I think `str.find_first_of` should change to `str.find_first`, but I could be wrong... can't test) – Marius Aug 28 '15 at 15:24
  • Thanks @thomas-perl for the revision, it does indeed make it more readable and compact. My original implementation avoided the additional comparison per loop as I was optimizing for a very low latency application. Your edit will be more applicable to most users visiting here however. – Marius Sep 27 '16 at 18:51
  • 1
    I had some issues initially, but this does in fact work with `wstring` / unicode if you update the template accordingly. Be careful though; i ran into some easy to cause runtime errors that the compiler didn't catch in a couple different places. – kayleeFrye_onDeck Jun 08 '18 at 00:24
  • 1
    Thanks @kayleeFrye_onDeck , I've not been using C++ at this level for a few years now and may be a bit rusty on the new specs, but if there is anything I should fix on this post, let me know and I'll check it out. – Marius Jul 27 '18 at 14:24
  • Your code is not working! Try string = "hih1ihi", substring = "hi". Your code is not giving the correct result. minus. – Optimus1 Nov 17 '21 at 14:57
  • 1
    @Optimus1 I think you assumed the `delimiters` parameter is not a character list of delimiters but rather a substring. Therein lies the rub. – Marius Mar 23 '22 at 18:30
177

Here's another solution. It's compact and reasonably efficient:

std::vector<std::string> split(const std::string &text, char sep) {
  std::vector<std::string> tokens;
  std::size_t start = 0, end = 0;
  while ((end = text.find(sep, start)) != std::string::npos) {
    tokens.push_back(text.substr(start, end - start));
    start = end + 1;
  }
  tokens.push_back(text.substr(start));
  return tokens;
}

It can easily be templatised to handle string separators, wide strings, etc.

Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty strings.

It can also be easily expanded to skip empty tokens:

std::vector<std::string> split(const std::string &text, char sep) {
    std::vector<std::string> tokens;
    std::size_t start = 0, end = 0;
    while ((end = text.find(sep, start)) != std::string::npos) {
        if (end != start) {
          tokens.push_back(text.substr(start, end - start));
        }
        start = end + 1;
    }
    if (end != start) {
       tokens.push_back(text.substr(start));
    }
    return tokens;
}

If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:

std::vector<std::string> split(const std::string& text, const std::string& delims)
{
    std::vector<std::string> tokens;
    std::size_t start = text.find_first_not_of(delims), end = 0;

    while((end = text.find_first_of(delims, start)) != std::string::npos)
    {
        tokens.push_back(text.substr(start, end - start));
        start = text.find_first_not_of(delims, end);
    }
    if(start != std::string::npos)
        tokens.push_back(text.substr(start));

    return tokens;
}
sigalor
  • 901
  • 11
  • 24
Alec Thomas
  • 19,639
  • 4
  • 30
  • 24
  • 10
    The first version is simple and gets the job done perfectly. The only change I would made would be to return the result directly, instead of passing it as a parameter. – gregschlom Jan 19 '12 at 02:25
  • 3
    The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the vector, or a heap allocation which would then have to be freed. – Alec Thomas Feb 06 '12 at 18:56
  • My bad, I was wrongly assuming that that STL would use lazy copy, as Qt containers do. Too bad they don't. – gregschlom Feb 06 '12 at 22:58
  • I like this because it requires the minimum amount of extra headers. I might recommend an edit to make it follow best practice usage of namespaces (IE std:: in front of everything). – Peter M Mar 07 '13 at 20:04
  • 2
    A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move semantics. – Alec Thomas Jun 27 '13 at 01:20
  • 7
    @AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1 anyway; very succinct) – Marcelo Cantos Aug 17 '13 at 11:54
  • @Peter M I would rather have it be passed in by reference, just in case the `vector` got large. – Alex Spencer Nov 15 '13 at 19:43
  • 1
    @Veritas In what way does it not work if the delimiter is the last character? Also, outputting empty tokens is intentional, though it could obviously be easily modified to not do that if required. – Alec Thomas Apr 08 '14 at 15:36
  • 13
    Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11 support punch cards these days? – Spacen Jasset Aug 11 '15 at 15:15
  • If you pass in an empty string, it returns a vector with 1 element (empty string). If you pass in a string that's the same as sep, then it returns a vector with 2 elements (both empty strings). Should have "if (end > 0) {" before the push_back in while loop and "if (start > 0) {" before push_back below while loop to fix this. – CodeSmile Sep 26 '15 at 17:12
  • 3
    @LearnCocos2D Please don't [alter the meaning](http://meta.stackexchange.com/questions/11474/what-is-the-etiquette-for-modifying-posts/11476#11476) of a post with an edit. This behaviour is by design. It is identical behaviour to Python's split operator. I'll add a note to make this clear. – Alec Thomas Sep 27 '15 at 21:50
  • 3
    Suggest using std::string::size_type instead of int, as some compilers might spit out signed/unsigned warnings otherwise. – Pascal Kesseli Nov 01 '15 at 20:45
  • 1
    the first function in this answer is the best solution - works perfectly with a reverse join function - `std::string strJoin(const std::vector v, const char& delimiter) { if(!v.empty()) { std::stringstream ss; std::string str(1, delimiter); auto it = v.cbegin(); while(true) { ss << *it++; if(it != v.cend()) ss << delimiter; else return ss.str(); } } return ""; }` – Roman Shestakov Dec 16 '17 at 15:45
143

This is my favorite way to iterate through a string. You can do whatever you want per word.

string line = "a line of text to iterate through";
string word;

istringstream iss(line, istringstream::in);

while( iss >> word )     
{
    // Do something on `word` here...
}
Azeem
  • 11,148
  • 4
  • 27
  • 40
gnomed
  • 5,483
  • 2
  • 26
  • 28
  • 1
    Is it possible to declare `word` as a `char`? – abatishchev Jun 26 '10 at 17:23
  • 1
    Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++ – gnomed Jun 30 '10 at 22:18
  • 12
    if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try: `stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;` – Wayne Werner Aug 04 '10 at 18:03
  • I don't understand how this got 140 upvotes. This is basically the same as in OQ: use a stringstream and >> from it. Exactly what OP did even in revision 1 of the question history. – Thomas Weller Oct 17 '22 at 16:46
89

This is similar to Stack Overflow question How do I tokenize a string in C++?. Requires Boost external library

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int argc, char** argv)
{
    string text = "token  test\tstring";

    char_separator<char> sep(" \t");
    tokenizer<char_separator<char>> tokens(text, sep);
    for (const string& t : tokens)
    {
        cout << t << "." << endl;
    }
}
Haseeb Mir
  • 928
  • 1
  • 13
  • 22
Ferruccio
  • 98,941
  • 38
  • 226
  • 299
  • Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current token? – einpoklum Apr 09 '18 at 19:47
71

I like the following because it puts the results into a vector, supports a string as a delim and gives control over keeping empty values. But, it doesn't look as good then.

#include <ostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;

vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
    vector<string> result;
    if (delim.empty()) {
        result.push_back(s);
        return result;
    }
    string::const_iterator substart = s.begin(), subend;
    while (true) {
        subend = search(substart, s.end(), delim.begin(), delim.end());
        string temp(substart, subend);
        if (keep_empty || !temp.empty()) {
            result.push_back(temp);
        }
        if (subend == s.end()) {
            break;
        }
        substart = subend + delim.size();
    }
    return result;
}

int main() {
    const vector<string> words = split("So close no matter how far", " ");
    copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n"));
}

Of course, Boost has a split() that works partially like that. And, if by 'white-space', you really do mean any type of white-space, using Boost's split with is_any_of() works great.

Jamal
  • 763
  • 7
  • 22
  • 32
Shadow2531
  • 11,980
  • 5
  • 35
  • 48
59

The STL does not have such a method available already.

However, you can either use C's strtok() function by using the std::string::c_str() member, or you can write your own. Here is a code sample I found after a quick Google search ("STL string split"):

void Tokenize(const string& str,
              vector<string>& tokens,
              const string& delimiters = " ")
{
    // Skip delimiters at beginning.
    string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    // Find first "non-delimiter".
    string::size_type pos     = str.find_first_of(delimiters, lastPos);

    while (string::npos != pos || string::npos != lastPos)
    {
        // Found a token, add it to the vector.
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters.  Note the "not_of"
        lastPos = str.find_first_not_of(delimiters, pos);
        // Find next "non-delimiter"
        pos = str.find_first_of(delimiters, lastPos);
    }
}

Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html

If you have questions about the code sample, leave a comment and I will explain.

And just because it does not implement a typedef called iterator or overload the << operator does not mean it is bad code. I use C functions quite frequently. For example, printf and scanf both are faster than std::cin and std::cout (significantly), the fopen syntax is a lot more friendly for binary types, and they also tend to produce smaller EXEs.

Don't get sold on this "Elegance over performance" deal.

Azeem
  • 11,148
  • 4
  • 27
  • 40
  • I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution. – Ashwin Nanjappa Oct 25 '08 at 09:16
  • ... and you dont want to just build a OO wrapper over the C functions why? –  Oct 25 '08 at 09:42
  • 11
    @Nelson LaQuet: Let me guess: Because strtok is not reentrant? – paercebal Oct 25 '08 at 09:52
  • Why not use the C++ features that are meant for this job? – graham.reeds Oct 25 '08 at 11:54
  • 44
    @Nelson don't *ever* pass string.c_str() to strtok! strtok trashes the input string (inserts '\0' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string. – Evan Teran Oct 25 '08 at 18:19
  • char* ch = new char[str.size()]; strcpy(ch, str.c_str()); ... delete[] ch; // problem solved. –  Oct 26 '08 at 00:20
  • 3
    @Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons. – j_random_hacker Aug 24 '09 at 09:08
  • "For example, printf and scanf both are faster then cin and cout" only because synchronization is enabled by default – paulm May 12 '14 at 19:13
  • *"The STL does not have such a method available already"* - what's wrong with string's `find_first_of` and using iterators to remember positions? Then, use `substr` to extract. – jww Sep 26 '14 at 00:44
  • 2
    @paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even when synchronization is disabled (and on stringstreams, which can't synchronize). – Ben Voigt Apr 12 '15 at 23:55
  • Or you could use `strsep()` (though not as portable). If you don't care about more than one char as the delimiter another answer gives an idea (`getdelim()`) but you could also iterate over the string with `strchr()`. Or...there are many ways depending on what you are after and need. – Pryftan Jun 13 '20 at 18:22
45

Here is a split function that:

  • is generic
  • uses standard C++ (no boost)
  • accepts multiple delimiters
  • ignores empty tokens (can easily be changed)

    template<typename T>
    vector<T> 
    split(const T & str, const T & delimiters) {
        vector<T> v;
        typename T::size_type start = 0;
        auto pos = str.find_first_of(delimiters, start);
        while(pos != T::npos) {
            if(pos != start) // ignore empty tokens
                v.emplace_back(str, start, pos - start);
            start = pos + 1;
            pos = str.find_first_of(delimiters, start);
        }
        if(start < str.length()) // ignore trailing delimiter
            v.emplace_back(str, start, str.length() - start); // add what's left of the string
        return v;
    }
    

Example usage:

    vector<string> v = split<string>("Hello, there; World", ";,");
    vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");
Marco M.
  • 2,956
  • 2
  • 29
  • 22
  • You forgot to add to use list: "extremely inefficient" – Xander Tulip Mar 19 '12 at 00:20
  • 1
    @XanderTulip, can you be more constructive and explain how or why? – Marco M. Mar 21 '12 at 11:57
  • 3
    @XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO, google it) should take care of this. Also in C++11 you could return by move reference. – Joseph Garvin May 07 '12 at 13:56
  • 3
    This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start, pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other shenanigans done by the .substr function. – Mihai Bişog Sep 05 '12 at 13:50
  • @zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer. Thanks – Marco M. Sep 12 '12 at 13:03
  • Can someone please make it return up to a max N elements? Any remaining characters should end up in the last element. – Jonny Oct 29 '15 at 01:28
  • Anyone else getting the error "missing 'typename' prior to dependent type name 'T::size_type'"? – Daniel Ryan May 23 '17 at 03:06
39

I have a 2 lines solution to this problem:

char sep = ' ';
std::string s="1 This is an example";

for(size_t p=0, q=0; p!=s.npos; p=q)
  std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;

Then instead of printing you can put it in a vector.

mjfroman
  • 3
  • 1
rhomu
  • 306
  • 5
  • 6
  • 2
    it's only a two-liner because one of those two lines is huge and cryptic... no one who actually has to read code ever, wants to read something like this, or would write it. contrived brevity is worse than tasteful verbosity. – underscore_d Nov 12 '21 at 23:23
  • You can even make it a one liner by putting everything on a single line! Isn't that wonderful? – rhomu Jan 11 '23 at 18:17
37

Here's a simple solution that uses only the standard regex library

#include <regex>
#include <string>
#include <vector>

std::vector<string> Tokenize( const string str, const std::regex regex )
{
    using namespace std;

    std::vector<string> result;

    sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
    sregex_token_iterator reg_end;

    for ( ; it != reg_end; ++it ) {
        if ( !it->str().empty() ) //token could be empty:check
            result.emplace_back( it->str() );
    }

    return result;
}

The regex argument allows checking for multiple arguments (spaces, commas, etc.)

I usually only check to split on spaces and commas, so I also have this default function:

std::vector<string> TokenizeDefault( const string str )
{
    using namespace std;

    regex re( "[\\s,]+" );

    return Tokenize( str, re );
}

The "[\\s,]+" checks for spaces (\\s) and commas (,).

Note, if you want to split wstring instead of string,

  • change all std::regex to std::wregex
  • change all sregex_token_iterator to wsregex_token_iterator

Note, you might also want to take the string argument by reference, depending on your compiler.

Cerbrus
  • 70,800
  • 18
  • 132
  • 147
dk123
  • 18,684
  • 20
  • 70
  • 77
  • This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it correctly in GCC 4.9. I am still giving you my +1 – mchiasson Aug 19 '14 at 12:27
  • 1
    This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and "regex" passed by references also. thx. – QuantumKarl Oct 16 '15 at 15:06
  • 1
    Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape sequences... You can just use `R"([\s,]+)"`. – Sam Feb 17 '18 at 17:42
37

Yet another flexible and fast way

template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
  const char* s = input;
  const char* e = s;
  while (*e != 0) {
    e = s;
    while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
    if (e - s > 0) {
      op(s, e - s);
    }
    s = e + 1;
  }
}

To use it with a vector of strings (Edit: Since someone pointed out not to inherit STL classes... hrmf ;) ) :

template<class ContainerType>
class Appender {
public:
  Appender(ContainerType& container) : container_(container) {;}
  void operator() (const char* s, unsigned length) { 
    container_.push_back(std::string(s,length));
  }
private:
  ContainerType& container_;
};

std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " \t");

That's it! And that's just one way to use the tokenizer, like how to just count words:

class WordCounter {
public:
  WordCounter() : noOfWords(0) {}
  void operator() (const char*, unsigned) {
    ++noOfWords;
  }
  unsigned noOfWords;
};

WordCounter wc;
tokenize(wc, "A number of words to be counted", " \t"); 
ASSERT( wc.noOfWords == 7 );

Limited by imagination ;)

Robert
  • 2,330
  • 29
  • 47
  • Nice. Regarding `Appender` note ["Why shouldn't we inherit a class from STL classes?"](http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c4143/Working-with-the-Final-Class-in-C.htm) – Andreas Spindler Sep 10 '13 at 12:07
33

Using std::stringstream as you have works perfectly fine, and do exactly what you wanted. If you're just looking for different way of doing things though, you can use std::find()/std::find_first_of() and std::string::substr().

Here's an example:

#include <iostream>
#include <string>

int main()
{
    std::string s("Somewhere down the road");
    std::string::size_type prev_pos = 0, pos = 0;

    while( (pos = s.find(' ', pos)) != std::string::npos )
    {
        std::string substring( s.substr(prev_pos, pos-prev_pos) );

        std::cout << substring << '\n';

        prev_pos = ++pos;
    }

    std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word
    std::cout << substring << '\n';

    return 0;
}
Azeem
  • 11,148
  • 4
  • 27
  • 40
KTC
  • 8,967
  • 5
  • 33
  • 38
  • 1
    This only works for single character delimiters. A simple change lets it work with multicharacter: `prev_pos = pos += delimiter.length();` – David Doria Feb 05 '16 at 14:48
26

If you like to use boost, but want to use a whole string as delimiter (instead of single characters as in most of the previously proposed solutions), you can use the boost_split_iterator.

Example code including convenient template:

#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>

template<typename _OutputIterator>
inline void split(
    const std::string& str, 
    const std::string& delim, 
    _OutputIterator result)
{
    using namespace boost::algorithm;
    typedef split_iterator<std::string::const_iterator> It;

    for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));
            iter!=It();
            ++iter)
    {
        *(result++) = boost::copy_range<std::string>(*iter);
    }
}

int main(int argc, char* argv[])
{
    using namespace std;

    vector<string> splitted;
    split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));

    // or directly to console, for example
    split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "\n"));
    return 0;
}
zerm
  • 2,812
  • 25
  • 17
23

Heres a regex solution that only uses the standard regex library. (I'm a little rusty, so there may be a few syntax errors, but this is at least the general idea)

#include <regex.h>
#include <string.h>
#include <vector.h>

using namespace std;

vector<string> split(string s){
    regex r ("\\w+"); //regex matches whole words, (greedy, so no fragment words)
    regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
    regex_iterator<string::iterator> rend; //iterators to iterate thru words
    vector<string> result<regex_iterator>(rit, rend);
    return result;  //iterates through the matches to fill the vector
}
AJMansfield
  • 4,039
  • 3
  • 29
  • 50
  • Similar responses with maybe better regex approach: [here](http://stackoverflow.com/a/6321203/86967), and [here](http://stackoverflow.com/a/9437426/86967). – Brent Bradburn Dec 05 '14 at 23:25
22

There is a function named strtok.

#include<string>
using namespace std;

vector<string> split(char* str,const char* delim)
{
    char* saveptr;
    char* token = strtok_r(str,delim,&saveptr);

    vector<string> result;

    while(token != NULL)
    {
        result.push_back(token);
        token = strtok_r(NULL,delim,&saveptr);
    }
    return result;
}
Erik Aronesty
  • 11,620
  • 5
  • 64
  • 44
Pratik Deoghare
  • 35,497
  • 30
  • 100
  • 146
  • 4
    `strtok` is from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string. – Kevin Panko Jun 14 '10 at 14:07
  • 14
    Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread calls `strtok` when another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. http://www.mkssoftware.com/docs/man3/strtok.3.asp – Kevin Panko Jun 14 '10 at 17:27
  • 1
    as mentioned before strtok is unsafe and even in C strtok_r is recommended for use – systemsfault Jul 06 '10 at 12:17
  • 4
    strtok_r can be used if you are in a section of code that may be accessed. this is the *only* solution of all of the above that isn't "line noise", and is a testament to what, exactly, is wrong with c++ – Erik Aronesty Oct 10 '11 at 18:04
  • Updated so there can be no objections on the grounds of thread safety from C++ wonks. – Erik Aronesty May 02 '14 at 14:50
  • 1
    strtok is evil. It treats two delimiters as a single delimiter if there is nothing between them. – EvilTeach Aug 10 '14 at 23:53
  • A for() loop looks better. Like this http://davekb.com/browse_programming_tips:strtok_r_example:txt – Yetti99 Jun 09 '15 at 08:12
22

C++20 finally blesses us with a split function. Or rather, a range adapter. Godbolt link.

#include <iostream>
#include <ranges>
#include <string_view>

namespace ranges = std::ranges;
namespace views = std::views;

using str = std::string_view;

auto view =
    "Multiple words"
    | views::split(' ')
    | views::transform([](auto &&r) -> str {
        return str(r.begin(), r.end());
    });

auto main() -> int {
    for (str &&sv : view) {
        std::cout << sv << '\n';
    }
}
Drew Dormann
  • 59,987
  • 13
  • 123
  • 180
J. Willus
  • 537
  • 1
  • 5
  • 15
20

Using std::string_view and Eric Niebler's range-v3 library:

https://wandbox.org/permlink/kW5lwRCL1pxjp2pW

#include <iostream>
#include <string>
#include <string_view>
#include "range/v3/view.hpp"
#include "range/v3/algorithm.hpp"

int main() {
    std::string s = "Somewhere down the range v3 library";
    ranges::for_each(s  
        |   ranges::view::split(' ')
        |   ranges::view::transform([](auto &&sub) {
                return std::string_view(&*sub.begin(), ranges::distance(sub));
            }),
        [](auto s) {std::cout << "Substring: " << s << "\n";}
    );
}

By using a range for loop instead of ranges::for_each algorithm:

#include <iostream>
#include <string>
#include <string_view>
#include "range/v3/view.hpp"

int main()
{
    std::string str = "Somewhere down the range v3 library";
    for (auto s : str | ranges::view::split(' ')
                      | ranges::view::transform([](auto&& sub) { return std::string_view(&*sub.begin(), ranges::distance(sub)); }
                      ))
    {
        std::cout << "Substring: " << s << "\n";
    }
}
Robert Andrzejuk
  • 5,076
  • 2
  • 22
  • 31
Porsche9II
  • 629
  • 5
  • 17
19

The stringstream can be convenient if you need to parse the string by non-space symbols:

string s = "Name:JAck; Spouse:Susan; ...";
string dummy, name, spouse;

istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')
Kimbluey
  • 1,199
  • 2
  • 12
  • 23
lukmac
  • 4,617
  • 8
  • 33
  • 34
15

Short and elegant

#include <vector>
#include <string>
using namespace std;

vector<string> split(string data, string token)
{
    vector<string> output;
    size_t pos = string::npos; // size_t to avoid improbable overflow
    do
    {
        pos = data.find(token);
        output.push_back(data.substr(0, pos));
        if (string::npos != pos)
            data = data.substr(pos + token.size());
    } while (string::npos != pos);
    return output;
}

can use any string as delimiter, also can be used with binary data (std::string supports binary data, including nulls)

using:

auto a = split("this!!is!!!example!string", "!!");

output:

this
is
!example!string
Sanfer
  • 414
  • 5
  • 16
user1438233
  • 1,153
  • 1
  • 14
  • 30
  • 1
    I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the string, so it is forcing the creation of a copy of the original string. – Alessandro Teruzzi Aug 01 '16 at 15:30
15

So far I used the one in Boost, but I needed something that doesn't depends on it, so I came to this:

static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)
{
    std::ostringstream word;
    for (size_t n = 0; n < input.size(); ++n)
    {
        if (std::string::npos == separators.find(input[n]))
            word << input[n];
        else
        {
            if (!word.str().empty() || !remove_empty)
                lst.push_back(word.str());
            word.str("");
        }
    }
    if (!word.str().empty() || !remove_empty)
        lst.push_back(word.str());
}

A good point is that in separators you can pass more than one character.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Goran
  • 93
  • 1
  • 9
14

I've rolled my own using strtok and used boost to split a string. The best method I have found is the C++ String Toolkit Library. It is incredibly flexible and fast.

#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>

const char *whitespace  = " \t\r\n\f";
const char *whitespace_and_punctuation  = " \t\r\n\f;,=";

int main()
{
    {   // normal parsing of a string into a vector of strings
        std::string s("Somewhere down the road");
        std::vector<std::string> result;
        if( strtk::parse( s, whitespace, result ) )
        {
            for(size_t i = 0; i < result.size(); ++i )
                std::cout << result[i] << std::endl;
        }
    }

    {  // parsing a string into a vector of floats with other separators
        // besides spaces

        std::string s("3.0, 3.14; 4.0");
        std::vector<float> values;
        if( strtk::parse( s, whitespace_and_punctuation, values ) )
        {
            for(size_t i = 0; i < values.size(); ++i )
                std::cout << values[i] << std::endl;
        }
    }

    {  // parsing a string into specific variables

        std::string s("angle = 45; radius = 9.9");
        std::string w1, w2;
        float v1, v2;
        if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
        {
            std::cout << "word " << w1 << ", value " << v1 << std::endl;
            std::cout << "word " << w2 << ", value " << v2 << std::endl;
        }
    }

    return 0;
}

The toolkit has much more flexibility than this simple example shows but its utility in parsing a string into useful elements is incredible.

DannyK
  • 1,342
  • 16
  • 23
13

I made this because I needed an easy way to split strings and C-based strings. Hopefully someone else can find it useful as well. Also, it doesn't rely on tokens, and you can use fields as delimiters, which is another key I needed.

I'm sure there are improvements that can be made to even further improve its elegance, and please do by all means.

StringSplitter.hpp:

#include <vector>
#include <iostream>
#include <string.h>

using namespace std;

class StringSplit
{
private:
    void copy_fragment(char*, char*, char*);
    void copy_fragment(char*, char*, char);
    bool match_fragment(char*, char*, int);
    int untilnextdelim(char*, char);
    int untilnextdelim(char*, char*);
    void assimilate(char*, char);
    void assimilate(char*, char*);
    bool string_contains(char*, char*);
    long calc_string_size(char*);
    void copy_string(char*, char*);

public:
    vector<char*> split_cstr(char);
    vector<char*> split_cstr(char*);
    vector<string> split_string(char);
    vector<string> split_string(char*);
    char* String;
    bool do_string;
    bool keep_empty;
    vector<char*> Container;
    vector<string> ContainerS;

    StringSplit(char * in)
    {
        String = in;
    }

    StringSplit(string in)
    {
        size_t len = calc_string_size((char*)in.c_str());
        String = new char[len + 1];
        memset(String, 0, len + 1);
        copy_string(String, (char*)in.c_str());
        do_string = true;
    }

    ~StringSplit()
    {
        for (int i = 0; i < Container.size(); i++)
        {
            if (Container[i] != NULL)
            {
                delete[] Container[i];
            }
        }
        if (do_string)
        {
            delete[] String;
        }
    }
};

StringSplitter.cpp:

#include <string.h>
#include <iostream>
#include <vector>
#include "StringSplit.hpp"

using namespace std;

void StringSplit::assimilate(char*src, char delim)
{
    int until = untilnextdelim(src, delim);
    if (until > 0)
    {
        char * temp = new char[until + 1];
        memset(temp, 0, until + 1);
        copy_fragment(temp, src, delim);
        if (keep_empty || *temp != 0)
        {
            if (!do_string)
            {
                Container.push_back(temp);
            }
            else
            {
                string x = temp;
                ContainerS.push_back(x);
            }

        }
        else
        {
            delete[] temp;
        }
    }
}

void StringSplit::assimilate(char*src, char* delim)
{
    int until = untilnextdelim(src, delim);
    if (until > 0)
    {
        char * temp = new char[until + 1];
        memset(temp, 0, until + 1);
        copy_fragment(temp, src, delim);
        if (keep_empty || *temp != 0)
        {
            if (!do_string)
            {
                Container.push_back(temp);
            }
            else
            {
                string x = temp;
                ContainerS.push_back(x);
            }
        }
        else
        {
            delete[] temp;
        }
    }
}

long StringSplit::calc_string_size(char* _in)
{
    long i = 0;
    while (*_in++)
    {
        i++;
    }
    return i;
}

bool StringSplit::string_contains(char* haystack, char* needle)
{
    size_t len = calc_string_size(needle);
    size_t lenh = calc_string_size(haystack);
    while (lenh--)
    {
        if (match_fragment(haystack + lenh, needle, len))
        {
            return true;
        }
    }
    return false;
}

bool StringSplit::match_fragment(char* _src, char* cmp, int len)
{
    while (len--)
    {
        if (*(_src + len) != *(cmp + len))
        {
            return false;
        }
    }
    return true;
}

int StringSplit::untilnextdelim(char* _in, char delim)
{
    size_t len = calc_string_size(_in);
    if (*_in == delim)
    {
        _in += 1;
        return len - 1;
    }

    int c = 0;
    while (*(_in + c) != delim && c < len)
    {
        c++;
    }

    return c;
}

int StringSplit::untilnextdelim(char* _in, char* delim)
{
    int s = calc_string_size(delim);
    int c = 1 + s;

    if (!string_contains(_in, delim))
    {
        return calc_string_size(_in);
    }
    else if (match_fragment(_in, delim, s))
    {
        _in += s;
        return calc_string_size(_in);
    }

    while (!match_fragment(_in + c, delim, s))
    {
        c++;
    }

    return c;
}

void StringSplit::copy_fragment(char* dest, char* src, char delim)
{
    if (*src == delim)
    {
        src++;
    }
        
    int c = 0;
    while (*(src + c) != delim && *(src + c))
    {
        *(dest + c) = *(src + c);
        c++;
    }
    *(dest + c) = 0;
}

void StringSplit::copy_string(char* dest, char* src)
{
    int i = 0;
    while (*(src + i))
    {
        *(dest + i) = *(src + i);
        i++;
    }
}

void StringSplit::copy_fragment(char* dest, char* src, char* delim)
{
    size_t len = calc_string_size(delim);
    size_t lens = calc_string_size(src);
    
    if (match_fragment(src, delim, len))
    {
        src += len;
        lens -= len;
    }
    
    int c = 0;
    while (!match_fragment(src + c, delim, len) && (c < lens))
    {
        *(dest + c) = *(src + c);
        c++;
    }
    *(dest + c) = 0;
}

vector<char*> StringSplit::split_cstr(char Delimiter)
{
    int i = 0;
    while (*String)
    {
        if (*String != Delimiter && i == 0)
        {
            assimilate(String, Delimiter);
        }
        if (*String == Delimiter)
        {
            assimilate(String, Delimiter);
        }
        i++;
        String++;
    }

    String -= i;
    delete[] String;

    return Container;
}

vector<string> StringSplit::split_string(char Delimiter)
{
    do_string = true;
    
    int i = 0;
    while (*String)
    {
        if (*String != Delimiter && i == 0)
        {
            assimilate(String, Delimiter);
        }
        if (*String == Delimiter)
        {
            assimilate(String, Delimiter);
        }
        i++;
        String++;
    }

    String -= i;
    delete[] String;

    return ContainerS;
}

vector<char*> StringSplit::split_cstr(char* Delimiter)
{
    int i = 0;
    size_t LenDelim = calc_string_size(Delimiter);

    while(*String)
    {
        if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
        {
            assimilate(String, Delimiter);
        }
        if (match_fragment(String, Delimiter, LenDelim))
        {
            assimilate(String,Delimiter);
        }
        i++;
        String++;
    }

    String -= i;
    delete[] String;

    return Container;
}

vector<string> StringSplit::split_string(char* Delimiter)
{
    do_string = true;
    int i = 0;
    size_t LenDelim = calc_string_size(Delimiter);

    while (*String)
    {
        if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
        {
            assimilate(String, Delimiter);
        }
        if (match_fragment(String, Delimiter, LenDelim))
        {
            assimilate(String, Delimiter);
        }
        i++;
        String++;
    }

    String -= i;
    delete[] String;

    return ContainerS;
}

Examples:

int main(int argc, char*argv[])
{
    StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";
    vector<char*> Split = ss.split_cstr(":CUT:");

    for (int i = 0; i < Split.size(); i++)
    {
        cout << Split[i] << endl;
    }

    return 0;
}

Will output:

This
is
an
example
cstring

int main(int argc, char*argv[])
{
    StringSplit ss = "This:is:an:example:cstring";
    vector<char*> Split = ss.split_cstr(':');

    for (int i = 0; i < Split.size(); i++)
    {
        cout << Split[i] << endl;
    }

    return 0;
}

int main(int argc, char*argv[])
{
    string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";
    StringSplit ss = mystring;
    vector<string> Split = ss.split_string("[SPLIT]");

    for (int i = 0; i < Split.size(); i++)
    {
        cout << Split[i] << endl;
    }

    return 0;
}

int main(int argc, char*argv[])
{
    string mystring = "This|is|an|example|string";
    StringSplit ss = mystring;
    vector<string> Split = ss.split_string('|');

    for (int i = 0; i < Split.size(); i++)
    {
        cout << Split[i] << endl;
    }

    return 0;
}

To keep empty entries (by default empties will be excluded):

StringSplit ss = mystring;
ss.keep_empty = true;
vector<string> Split = ss.split_string(":DELIM:");

The goal was to make it similar to C#'s Split() method where splitting a string is as easy as:

String[] Split = 
    "Hey:cut:what's:cut:your:cut:name?".Split(new[]{":cut:"}, StringSplitOptions.None);

foreach(String X in Split)
{
    Console.Write(X);
}

I hope someone else can find this as useful as I do.

Laura White
  • 188
  • 2
  • 9
Steve Dell
  • 575
  • 1
  • 7
  • 23
13

This answer takes the string and puts it into a vector of strings. It uses the boost library.

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));
NL628
  • 418
  • 6
  • 21
11

Here's another way of doing it..

void split_string(string text,vector<string>& words)
{
  int i=0;
  char ch;
  string word;

  while(ch=text[i++])
  {
    if (isspace(ch))
    {
      if (!word.empty())
      {
        words.push_back(word);
      }
      word = "";
    }
    else
    {
      word += ch;
    }
  }
  if (!word.empty())
  {
    words.push_back(word);
  }
}
  • I believe this could be optimized a bit by using `word.clear()` instead of `word = ""`. Calling the clear method will empty the string but keep the already allocated buffer, which will be reused upon further concatenations. Right now a new buffer is created for every word, resulting in extra allocations. – Teodor Maxim Apr 26 '21 at 20:44
11

What about this:

#include <string>
#include <vector>

using namespace std;

vector<string> split(string str, const char delim) {
    vector<string> v;
    string tmp;

    for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {
        if(*i != delim && i != str.end()) {
            tmp += *i; 
        } else {
            v.push_back(tmp);
            tmp = ""; 
        }   
    }   

    return v;
}
Oktalist
  • 14,336
  • 3
  • 43
  • 63
gibbz
  • 11
  • 1
  • 2
  • This is the best answer here, if you only want to split on a single delimiter character. The original question wanted to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have actually answered http://stackoverflow.com/questions/53849 – Oktalist Dec 19 '12 at 22:09
11

Recently I had to split a camel-cased word into subwords. There are no delimiters, just upper characters.

#include <string>
#include <list>
#include <locale> // std::isupper

template<class String>
const std::list<String> split_camel_case_string(const String &s)
{
    std::list<String> R;
    String w;

    for (String::const_iterator i = s.begin(); i < s.end(); ++i) {  {
        if (std::isupper(*i)) {
            if (w.length()) {
                R.push_back(w);
                w.clear();
            }
        }
        w += *i;
    }

    if (w.length())
        R.push_back(w);
    return R;
}

For example, this splits "AQueryTrades" into "A", "Query" and "Trades". The function works with narrow and wide strings. Because it respects the current locale it splits "RaumfahrtÜberwachungsVerordnung" into "Raumfahrt", "Überwachungs" and "Verordnung".

Note std::upper should be really passed as function template argument. Then the more generalized from of this function can split at delimiters like ",", ";" or " " too.

Andreas Spindler
  • 7,568
  • 4
  • 43
  • 34
  • 2
    There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist did not fixed two minor bugs maybe because they were obvious anyway: `std::isupper` could be passed as argument, not `std::upper`. Second put a `typename` before the `String::const_iterator`. – Andreas Spindler Apr 28 '15 at 07:20
  • std::isupper is guaranteed to be defined only in header (the C++ version of the C header), so you must include that. This is like relying we can use std::string by using the header instead of the header. – Adola Jun 13 '22 at 03:39
10

I like to use the boost/regex methods for this task since they provide maximum flexibility for specifying the splitting criteria.

#include <iostream>
#include <string>
#include <boost/regex.hpp>

int main() {
    std::string line("A:::line::to:split");
    const boost::regex re(":+"); // one or more colons

    // -1 means find inverse matches aka split
    boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);
    boost::sregex_token_iterator end;

    for (; tokens != end; ++tokens)
        std::cout << *tokens << std::endl;
}
Marty B
  • 243
  • 3
  • 10
10

I cannot believe how overly complicated most of these answers were. Why didnt someone suggest something as simple as this?

#include <iostream>
#include <sstream>

std::string input = "This is a sentence to read";
std::istringstream ss(input);
std::string token;

while(std::getline(ss, token, ' ')) {
    std::cout << token << endl;
}
Sam B
  • 27,273
  • 15
  • 84
  • 121
9
#include<iostream>
#include<string>
#include<sstream>
#include<vector>
using namespace std;

    vector<string> split(const string &s, char delim) {
        vector<string> elems;
        stringstream ss(s);
        string item;
        while (getline(ss, item, delim)) {
            elems.push_back(item);
        }
        return elems;
    }

int main() {

        vector<string> x = split("thi is an sample test",' ');
        unsigned int i;
        for(i=0;i<x.size();i++)
            cout<<i<<":"<<x[i]<<endl;
        return 0;
}
enb081
  • 3,831
  • 11
  • 43
  • 66
san45
  • 459
  • 1
  • 6
  • 15
8

Get Boost ! : -)

#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <vector>

using namespace std;
using namespace boost;

int main(int argc, char**argv) {
    typedef vector < string > list_type;

    list_type list;
    string line;

    line = "Somewhere down the road";
    split(list, line, is_any_of(" "));

    for(int i = 0; i < list.size(); i++)
    {
        cout << list[i] << endl;
    }

    return 0;
}

This example gives the output -

Somewhere
down
the
road
8
#include <iostream>
#include <regex>

using namespace std;

int main() {
   string s = "foo bar  baz";
   regex e("\\s+");
   regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1);
   regex_token_iterator<string::iterator> end;
   while (i != end)
      cout << " [" << *i++ << "]";
}

IMO, this is the closest thing to python's re.split(). See cplusplus.com for more information about regex_token_iterator. The -1 (4th argument in regex_token_iterator ctor) is the section of the sequence that is not matched, using the match as separator.

solstice333
  • 3,399
  • 1
  • 31
  • 28
8

The code below uses strtok() to split a string into tokens and stores the tokens in a vector.

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>

using namespace std;


char one_line_string[] = "hello hi how are you nice weather we are having ok then bye";
char seps[]   = " ,\t\n";
char *token;



int main()
{
   vector<string> vec_String_Lines;
   token = strtok( one_line_string, seps );

   cout << "Extracting and storing data in a vector..\n\n\n";

   while( token != NULL )
   {
      vec_String_Lines.push_back(token);
      token = strtok( NULL, seps );
   }
     cout << "Displaying end result in vector line storage..\n\n";

    for ( int i = 0; i < vec_String_Lines.size(); ++i)
    cout << vec_String_Lines[i] << "\n";
    cout << "\n\n\n";


return 0;
}
enb081
  • 3,831
  • 11
  • 43
  • 66
Software_Designer
  • 8,490
  • 3
  • 24
  • 28
7

I use this simpleton because we got our String class "special" (i.e. not standard):

void splitString(const String &s, const String &delim, std::vector<String> &result) {
    const int l = delim.length();
    int f = 0;
    int i = s.indexOf(delim,f);
    while (i>=0) {
        String token( i-f > 0 ? s.substring(f,i-f) : "");
        result.push_back(token);
        f=i+l;
        i = s.indexOf(delim,f);
    }
    String token = s.substring(f);
    result.push_back(token);
}
Kevin Panko
  • 8,356
  • 19
  • 50
  • 61
Abe
  • 31
  • 1
  • 1
7

Although there was some answer providing C++20 solution, since it was posted there were some changes made and applied to C++20 as Defect Reports. Because of that the solution is a little bit shorter and nicer:

#include <iostream>
#include <ranges>
#include <string_view>

namespace views = std::views;
using str = std::string_view;

constexpr str text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";

auto splitByWords(str input) {
    return input
    | views::split(' ')
    | views::transform([](auto &&r) -> str {
        return {r.begin(), r.end()};
    });
}

auto main() -> int {
    for (str &&word : splitByWords(text)) {
        std::cout << word << '\n';
    }
}

As of today it is still available only on the trunk branch of GCC (Godbolt link). It is based on two changes: P1391 iterator constructor for std::string_view and P2210 DR fixing std::views::split to preserve range type.

In C++23 there won't be any transform boilerplate needed, since P1989 adds a range constructor to std::string_view:

#include <iostream>
#include <ranges>
#include <string_view>

namespace views = std::views;

constexpr std::string_view text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";

auto main() -> int {
    for (std::string_view&& word : text | views::split(' ')) {
        std::cout << word << '\n';
    }
}

(Godbolt link)

Kaznov
  • 1,035
  • 10
  • 17
6

The following is a much better way to do this. It can take any character and doesn't split lines unless you want. No special libraries are needed (well, besides std, but who really considers that an extra library) No pointers or references are needed, and it's static. Just simple plain C++.

#pragma once
#include <vector>
#include <sstream>
using namespace std;
class Helpers
{
    public:
        static vector<string> split(string s, char delim)
        {
            stringstream temp (stringstream::in | stringstream::out);
            vector<string> elems(0);
            if (s.size() == 0 || delim == 0)
                return elems;
            for(char c : s)
            {
                if(c == delim)
                {
                    elems.push_back(temp.str());
                    temp = stringstream(stringstream::in | stringstream::out);
                }
                else
                    temp << c;
            }
            if (temp.str().size() > 0)
                elems.push_back(temp.str());
                return elems;
            }

        //Splits string s with a list of delimiters in delims (it's just a list, like if we wanted to
        //split at the following letters, a, b, c we would make delims="abc".
        static vector<string> split(string s, string delims)
        {
            stringstream temp (stringstream::in | stringstream::out);
            vector<string> elems(0);
            bool found;
            if(s.size() == 0 || delims.size() == 0)
                return elems;
            for(char c : s)
            {
                found = false;
                for(char d : delims)
                {
                    if (c == d)
                    {
                        elems.push_back(temp.str());
                        temp = stringstream(stringstream::in | stringstream::out);
                        found = true;
                        break;
                    }
                }
                if(!found)
                    temp << c;
            }
            if(temp.str().size() > 0)
                elems.push_back(temp.str());
            return elems;
        }
};
Laura White
  • 188
  • 2
  • 9
Kelly Elton
  • 4,373
  • 10
  • 53
  • 97
6

Everyone answered for predefined string input. I think this answer will help someone for scanned input.

I used tokens vector for holding string tokens. It's optional.

#include <bits/stdc++.h>

using namespace std ;
int main()
{
    string str, token ;
    getline(cin, str) ; // get the string as input
    istringstream ss(str); // insert the string into tokenizer

    vector<string> tokens; // vector tokens holds the tokens

    while (ss >> token) tokens.push_back(token); // splits the tokens
    for(auto x : tokens) cout << x << endl ; // prints the tokens

    return 0;
}


sample input:

port city international university

sample output:

port
city
international
university

Note that by default this will work for only space as the delimiter. you can use custom delimiter. For that, you have customized the code. let the delimiter be ','. so use

char delimiter = ',' ;
while(getline(ss, token, delimiter)) tokens.push_back(token) ;

instead of

while (ss >> token) tokens.push_back(token);
mybrave
  • 1,662
  • 3
  • 20
  • 37
Nur Bijoy
  • 70
  • 2
  • 7
5

I wrote the following piece of code. You can specify delimiter, which can be a string. The result is similar to Java's String.split, with empty string in the result.

For example, if we call split("ABCPICKABCANYABCTWO:ABC", "ABC"), the result is as follows:

0  <len:0>
1 PICK <len:4>
2 ANY <len:3>
3 TWO: <len:4>
4  <len:0>

Code:

vector <string> split(const string& str, const string& delimiter = " ") {
    vector <string> tokens;

    string::size_type lastPos = 0;
    string::size_type pos = str.find(delimiter, lastPos);

    while (string::npos != pos) {
        // Found a token, add it to the vector.
        cout << str.substr(lastPos, pos - lastPos) << endl;
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        lastPos = pos + delimiter.size();
        pos = str.find(delimiter, lastPos);
    }

    tokens.push_back(str.substr(lastPos, str.size() - lastPos));
    return tokens;
}
Jim Huang
  • 401
  • 5
  • 9
5

Here is my solution using C++11 and the STL. It should be reasonably efficient:

#include <vector>
#include <string>
#include <cstring>
#include <iostream>
#include <algorithm>
#include <functional>

std::vector<std::string> split(const std::string& s)
{
    std::vector<std::string> v;

    const auto end = s.end();
    auto to = s.begin();
    decltype(to) from;

    while((from = std::find_if(to, end,
        [](char c){ return !std::isspace(c); })) != end)
    {
        to = std::find_if(from, end, [](char c){ return std::isspace(c); });
        v.emplace_back(from, to);
    }

    return v;
}

int main()
{
    std::string s = "this is the string  to  split";

    auto v = split(s);

    for(auto&& s: v)
        std::cout << s << '\n';
}

Output:

this
is
the
string
to
split
Galik
  • 47,303
  • 4
  • 80
  • 117
  • This is quite nice. I feel like the code could be clearer though, e.g. `end` unexpectedly isn't `s.end()`. – Timmmm May 16 '17 at 10:12
  • @Timmmm Out of curiosity what would you suggest for `pos`, `end` and `done`? – Galik May 16 '17 at 10:25
  • Also you can make it a bit simpler with `find_first_of` and `find_first_not_of`. – Timmmm May 16 '17 at 10:27
  • @Timmmm Well I shouldn't be using `ptr_fun` but using `std::isspace` makes the code more easily modifiable to accommodate different locales. Having said that my current working version uses `find_first_of`. That makes it more efficient and able to split on any character not just whitespace. In fact I also have a version that splits on a supplied string too , that uses `std::search` (the possibilities for this function are multifold it seems). – Galik May 16 '17 at 10:37
  • Yeah, I rewrote it [like this](http://stackoverflow.com/a/43999194/265521). Thanks for the code! – Timmmm May 16 '17 at 10:41
  • @Timmmm looks good. I'm not going to post my current version(s) here because they are monstrously templated to accommodate different string and container types and look horrendous (I'm overdue revisiting that code). But I will get rid of that `std::ptr_fun` hehe – Galik May 16 '17 at 10:48
5

When dealing with whitespace as separator, the obvious answer of using std::istream_iterator<T> is already given and voted up a lot. Of course, elements may not be separated by whitespace but by some separator instead. I didn't spot any answer which just redefines the meaning of whitespace to be said separator and then uses the conventional approach.

The way to change what streams consider whitespace, you'd simply change the stream's std::locale using (std::istream::imbue()) with a std::ctype<char> facet with its own definition of what whitespace means (it can be done for std::ctype<wchar_t>, too, but its is actually slightly different because std::ctype<char> is table-driven while std::ctype<wchar_t> is driven by virtual functions).

#include <iostream>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <locale>

struct whitespace_mask {
    std::ctype_base::mask mask_table[std::ctype<char>::table_size];
    whitespace_mask(std::string const& spaces) {
        std::ctype_base::mask* table = this->mask_table;
        std::ctype_base::mask const* tab
            = std::use_facet<std::ctype<char>>(std::locale()).table();
        for (std::size_t i(0); i != std::ctype<char>::table_size; ++i) {
            table[i] = tab[i] & ~std::ctype_base::space;
        }
        std::for_each(spaces.begin(), spaces.end(), [=](unsigned char c) {
            table[c] |= std::ctype_base::space;
        });
    }
};
class whitespace_facet
    : private whitespace_mask
    , public std::ctype<char> {
public:
    whitespace_facet(std::string const& spaces)
        : whitespace_mask(spaces)
        , std::ctype<char>(this->mask_table) {
    }
};

struct whitespace {
    std::string spaces;
    whitespace(std::string const& spaces): spaces(spaces) {}
};
std::istream& operator>>(std::istream& in, whitespace const& ws) {
    std::locale loc(in.getloc(), new whitespace_facet(ws.spaces));
    in.imbue(loc);
    return in;
}
// everything above would probably go into a utility library...

int main() {
    std::istringstream in("a, b, c, d, e");
    std::copy(std::istream_iterator<std::string>(in >> whitespace(", ")),
              std::istream_iterator<std::string>(),
              std::ostream_iterator<std::string>(std::cout, "\n"));

    std::istringstream pipes("a b c|  d |e     e");
    std::copy(std::istream_iterator<std::string>(pipes >> whitespace("|")),
              std::istream_iterator<std::string>(),
              std::ostream_iterator<std::string>(std::cout, "\n"));   
}

Most of the code is for packaging up a general purpose tool providing soft delimiters: multiple delimiters in a row are merged. There is no way to produce an empty sequence. When different delimiters are needed within a stream, you'd probably use differently set up streams using a shared stream buffer:

void f(std::istream& in) {
    std::istream pipes(in.rdbuf());
    pipes >> whitespace("|");
    std::istream comma(in.rdbuf());
    comma >> whitespace(",");

    std::string s0, s1;
    if (pipes >> s0 >> std::ws   // read up to first pipe and ignore sequence of pipes
        && comma >> s1 >> std::ws) { // read up to first comma and ignore commas
        // ...
    }
}
enb081
  • 3,831
  • 11
  • 43
  • 66
Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
5

As a hobbyist, this is the first solution that came to my mind. I'm kind of curious why I haven't seen a similar solution here yet, is there something fundamentally wrong with how I did it?

#include <iostream>
#include <string>
#include <vector>

std::vector<std::string> split(const std::string &s, const std::string &delims)
{
    std::vector<std::string> result;
    std::string::size_type pos = 0;
    while (std::string::npos != (pos = s.find_first_not_of(delims, pos))) {
        auto pos2 = s.find_first_of(delims, pos);
        result.emplace_back(s.substr(pos, std::string::npos == pos2 ? pos2 : pos2 - pos));
        pos = pos2;
    }
    return result;
}

int main()
{
    std::string text{"And then I said: \"I don't get it, why would you even do that!?\""};
    std::string delims{" :;\".,?!"};
    auto words = split(text, delims);
    std::cout << "\nSentence:\n  " << text << "\n\nWords:";
    for (const auto &w : words) {
        std::cout << "\n  " << w;
    }
    return 0;
}

http://cpp.sh/7wmzy

Jehjoa
  • 551
  • 8
  • 23
4

This is my versión taken the source of Kev:

#include <string>
#include <vector>
void split(vector<string> &result, string str, char delim ) {
  string tmp;
  string::iterator i;
  result.clear();

  for(i = str.begin(); i <= str.end(); ++i) {
    if((const char)*i != delim  && i != str.end()) {
      tmp += *i;
    } else {
      result.push_back(tmp);
      tmp = "";
    }
  }
}

After, call the function and do something with it:

vector<string> hosts;
split(hosts, "192.168.1.2,192.168.1.3", ',');
for( size_t i = 0; i < hosts.size(); i++){
  cout <<  "Connecting host : " << hosts.at(i) << "..." << endl;
}
3

I use the following code:

namespace Core
{
    typedef std::wstring String;

    void SplitString(const Core::String& input, const Core::String& splitter, std::list<Core::String>& output)
    {
        if (splitter.empty())
        {
            throw std::invalid_argument(); // for example
        }

        std::list<Core::String> lines;

        Core::String::size_type offset = 0;

        for (;;)
        {
            Core::String::size_type splitterPos = input.find(splitter, offset);

            if (splitterPos != Core::String::npos)
            {
                lines.push_back(input.substr(offset, splitterPos - offset));
                offset = splitterPos + splitter.size();
            }
            else
            {
                lines.push_back(input.substr(offset));
                break;
            }
        }

        lines.swap(output);
    }
}

// gtest:

class SplitStringTest: public testing::Test
{
};

TEST_F(SplitStringTest, EmptyStringAndSplitter)
{
    std::list<Core::String> result;
    ASSERT_ANY_THROW(Core::SplitString(Core::String(), Core::String(), result));
}

TEST_F(SplitStringTest, NonEmptyStringAndEmptySplitter)
{
    std::list<Core::String> result;
    ASSERT_ANY_THROW(Core::SplitString(L"xy", Core::String(), result));
}

TEST_F(SplitStringTest, EmptyStringAndNonEmptySplitter)
{
    std::list<Core::String> result;
    Core::SplitString(Core::String(), Core::String(L","), result);
    ASSERT_EQ(1, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
}

TEST_F(SplitStringTest, OneCharSplitter)
{
    std::list<Core::String> result;

    Core::SplitString(L"x,y", L",", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y", *result.rbegin());

    Core::SplitString(L",xy", L",", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(L"xy", *result.rbegin());

    Core::SplitString(L"xy,", L",", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"xy", *result.begin());
    ASSERT_EQ(Core::String(), *result.rbegin());
}

TEST_F(SplitStringTest, TwoCharsSplitter)
{
    std::list<Core::String> result;

    Core::SplitString(L"x,.y,z", L",.", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y,z", *result.rbegin());

    Core::SplitString(L"x,,y,z", L",,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y,z", *result.rbegin());
}

TEST_F(SplitStringTest, RecursiveSplitter)
{
    std::list<Core::String> result;

    Core::SplitString(L",,,", L",,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(L",", *result.rbegin());

    Core::SplitString(L",.,.,", L",.,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(L".,", *result.rbegin());

    Core::SplitString(L"x,.,.,y", L",.,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L".,y", *result.rbegin());

    Core::SplitString(L",.,,.,", L",.,", result);
    ASSERT_EQ(3, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(Core::String(), *(++result.begin()));
    ASSERT_EQ(Core::String(), *result.rbegin());
}

TEST_F(SplitStringTest, NullTerminators)
{
    std::list<Core::String> result;

    Core::SplitString(L"xy", Core::String(L"\0", 1), result);
    ASSERT_EQ(1, result.size());
    ASSERT_EQ(L"xy", *result.begin());

    Core::SplitString(Core::String(L"x\0y", 3), Core::String(L"\0", 1), result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y", *result.rbegin());
}
Cerbrus
  • 70,800
  • 18
  • 132
  • 147
Dmitry
  • 1
  • 1
  • 2
3

We can use strtok in c++ ,

#include <iostream>
#include <cstring>
using namespace std;

int main()
{
    char str[]="Mickey M;12034;911416313;M;01a;9001;NULL;0;13;12;0;CPP,C;MSC,3D;FEND,BEND,SEC;";
    char *pch = strtok (str,";,");
    while (pch != NULL)
    {
        cout<<pch<<"\n";
        pch = strtok (NULL, ";,");
    }
    return 0;
}
Gluttton
  • 5,739
  • 3
  • 31
  • 58
Venkata Naidu M
  • 351
  • 1
  • 6
3

This is my solution to this problem:

vector<string> get_tokens(string str) {
    vector<string> dt;
    stringstream ss;
    string tmp; 
    ss << str;
    for (size_t i; !ss.eof(); ++i) {
        ss >> tmp;
        dt.push_back(tmp);
    }
    return dt;
}

This function returns a vector of strings.

pz64_
  • 2,212
  • 2
  • 20
  • 43
3

Based on Galik's answer I made this. This is mostly here so I don't have to keep writing it again and again. It's crazy that C++ still doesn't have a native split function. Features:

  • Should be very fast.
  • Easy to understand (I think).
  • Merges empty sections.
  • Trivial to use several delimiters (e.g. "\r\n")
#include <string>
#include <vector>
#include <algorithm>

std::vector<std::string> split(const std::string& s, const std::string& delims)
{
    using namespace std;

    vector<string> v;

    // Start of an element.
    size_t elemStart = 0;

    // We start searching from the end of the previous element, which
    // initially is the start of the string.
    size_t elemEnd = 0;

    // Find the first non-delim, i.e. the start of an element, after the end of the previous element.
    while((elemStart = s.find_first_not_of(delims, elemEnd)) != string::npos)
    {
        // Find the first delem, i.e. the end of the element (or if this fails it is the end of the string).
        elemEnd = s.find_first_of(delims, elemStart);
        // Add it.
        v.emplace_back(s, elemStart, elemEnd == string::npos ? string::npos : elemEnd - elemStart);
    }
    // When there are no more non-spaces, we are done.

    return v;
}
Community
  • 1
  • 1
Timmmm
  • 88,195
  • 71
  • 364
  • 509
2

This is a function I wrote that helps me do a lot. It helped me when doing protocol for WebSockets.

using namespace std;
#include <iostream>
#include <vector>
#include <sstream>
#include <string>

vector<string> split ( string input , string split_id ) {
  vector<string> result;
  int i = 0;
  bool add;
  string temp;
  stringstream ss;
  size_t found;
  string real;
  int r = 0;
    while ( i != input.length() ) {
        add = false;
        ss << input.at(i);
        temp = ss.str();
        found = temp.find(split_id);
        if ( found != string::npos ) {
            add = true;
            real.append ( temp , 0 , found );
        } else if ( r > 0 &&  ( i+1 ) == input.length() ) {
            add = true;
            real.append ( temp , 0 , found );
        }
        if ( add ) {
            result.push_back(real);
            ss.str(string());
            ss.clear();
            temp.clear();
            real.clear();
            r = 0;
        }
        i++;
        r++;
    }
  return result;
}

int main() {
    string s = "S,o,m,e,w,h,e,r,e, down the road \n In a really big C++ house.  \n  Lives a little old lady.   \n   That no one ever knew.    \n    She comes outside.     \n     In the very hot sun.      \n\n\n\n\n\n\n\n   And throws C++ at us.    \n    The End.  FIN.";
    vector < string > Token;
    Token = split ( s , "," );
    for ( int i = 0 ; i < Token.size(); i++)    cout << Token.at(i) << endl;
    cout << endl << Token.size();
    int a;
    cin >> a;
    return a;
}
enb081
  • 3,831
  • 11
  • 43
  • 66
User
  • 659
  • 2
  • 12
  • 29
2

LazyStringSplitter:

#include <string>
#include <algorithm>
#include <unordered_set>

using namespace std;

class LazyStringSplitter
{
    string::const_iterator start, finish;
    unordered_set<char> chop;

public:

    // Empty Constructor
    explicit LazyStringSplitter()
    {}

    explicit LazyStringSplitter (const string cstr, const string delims)
        : start(cstr.begin())
        , finish(cstr.end())
        , chop(delims.begin(), delims.end())
    {}

    void operator () (const string cstr, const string delims)
    {
        chop.insert(delims.begin(), delims.end());
        start = cstr.begin();
        finish = cstr.end();
    }

    bool empty() const { return (start >= finish); }

    string next()
    {
        // return empty string
        // if ran out of characters
        if (empty())
            return string("");

        auto runner = find_if(start, finish, [&](char c) {
            return chop.count(c) == 1;
        });

        // construct next string
        string ret(start, runner);
        start = runner + 1;

        // Never return empty string
        // + tail recursion makes this method efficient
        return !ret.empty() ? ret : next();
    }
};
  • I call this method the LazyStringSplitter because of one reason - It does not split the string in one go.
  • In essence it behaves like a python generator
  • It exposes a method called next which returns the next string that is split from the original
  • I made use of the unordered_set from c++11 STL, so that look up of delimiters is that much faster
  • And here is how it works

TEST PROGRAM

#include <iostream>
using namespace std;

int main()
{
    LazyStringSplitter splitter;

    // split at the characters ' ', '!', '.', ','
    splitter("This, is a string. And here is another string! Let's test and see how well this does.", " !.,");

    while (!splitter.empty())
        cout << splitter.next() << endl;
    return 0;
}

OUTPUT

This
is
a
string
And
here
is
another
string
Let's
test
and
see
how
well
this
does

Next plan to improve this is to implement begin and end methods so that one can do something like:

vector<string> split_string(splitter.begin(), splitter.end());
smac89
  • 39,374
  • 15
  • 132
  • 179
  • 2
    Many questionable implemetation details aside this answer is the only one which does it lazily. I am really disappointed in C++ world here. Well, streamiterator kind of does it too, but then everyone puts result into vector killing all the benefits... – Slava Sep 26 '15 at 19:25
2

I've been searching for a way to split a string by a separator of any length, so I started writing it from scratch, as existing solutions didn't suit me.

Here is my little algorithm, using only STL:

//use like this
//std::vector<std::wstring> vec = Split<std::wstring> (L"Hello##world##!", L"##");

template <typename valueType>
static std::vector <valueType> Split (valueType text, const valueType& delimiter)
{
    std::vector <valueType> tokens;
    size_t pos = 0;
    valueType token;

    while ((pos = text.find(delimiter)) != valueType::npos) 
    {
        token = text.substr(0, pos);
        tokens.push_back (token);
        text.erase(0, pos + delimiter.length());
    }
    tokens.push_back (text);

    return tokens;
}

It can be used with separator of any length and form, as far as I've tested. Instantiate with either string or wstring type.

All the algorithm does is it searches for the delimiter, gets the part of the string that is up to the delimiter, deletes the delimiter and searches again until it finds it no more.

Of course, you can use any number of whitespaces for the delimiter.

I hope it helps.

enb081
  • 3,831
  • 11
  • 43
  • 66
robcsi
  • 264
  • 3
  • 17
  • that's actually quite nice. although I don't think erasing is the most efficient way and (2) what about keeping empty tokens? – fmuecke Sep 09 '15 at 20:20
2

No Boost, no string streams, just the standard C library cooperating together with std::string and std::list: C library functions for easy analysis, C++ data types for easy memory management.

Whitespace is considered to be any combination of newlines, tabs and spaces. The set of whitespace characters is established by the wschars variable.

#include <string>
#include <list>
#include <iostream>
#include <cstring>

using namespace std;

const char *wschars = "\t\n ";

list<string> split(const string &str)
{
  const char *cstr = str.c_str();
  list<string> out;

  while (*cstr) {                     // while remaining string not empty
    size_t toklen;
    cstr += strspn(cstr, wschars);    // skip leading whitespace
    toklen = strcspn(cstr, wschars);  // figure out token length
    if (toklen)                       // if we have a token, add to list
      out.push_back(string(cstr, toklen));
    cstr += toklen;                   // skip over token
  }

  // ran out of string; return list

  return out;
}

int main(int argc, char **argv)
{
  list<string> li = split(argv[1]);
  for (list<string>::iterator i = li.begin(); i != li.end(); i++)
    cout << "{" << *i << "}" << endl;
  return 0;
}

Run:

$ ./split ""
$ ./split "a"
{a}
$ ./split " a "
{a}
$ ./split " a b"
{a}
{b}
$ ./split " a b c"
{a}
{b}
{c}
$ ./split " a b c d  "
{a}
{b}
{c}
{d}

Tail-recursive version of split (itself split into two functions). All destructive manipulation of variables is gone, except for the pushing of strings into the list!

void split_rec(const char *cstr, list<string> &li)
{
  if (*cstr) {
    const size_t leadsp = strspn(cstr, wschars);
    const size_t toklen = strcspn(cstr + leadsp, wschars);

    if (toklen)
      li.push_back(string(cstr + leadsp, toklen));

    split_rec(cstr + leadsp + toklen, li);
  }
}

list<string> split(const string &str)
{
  list<string> out;
  split_rec(str.c_str(), out);
  return out;
}
Brad Larson
  • 170,088
  • 45
  • 397
  • 571
Kaz
  • 55,781
  • 9
  • 100
  • 149
  • 1
    please use std::vector instead of list – fmuecke Sep 09 '15 at 20:16
  • @fmuecke There is no requirement in the question for a specific representation to use for the pieces of the string, hence there is no need to incorporate your suggestion into the answer. – Kaz Sep 11 '15 at 18:59
2

Here is my version

#include <vector>

inline std::vector<std::string> Split(const std::string &str, const std::string &delim = " ")
{
    std::vector<std::string> tokens;
    if (str.size() > 0)
    {
        if (delim.size() > 0)
        {
            std::string::size_type currPos = 0, prevPos = 0;
            while ((currPos = str.find(delim, prevPos)) != std::string::npos)
            {
                std::string item = str.substr(prevPos, currPos - prevPos);
                if (item.size() > 0)
                {
                    tokens.push_back(item);
                }
                prevPos = currPos + 1;
            }
            tokens.push_back(str.substr(prevPos));
        }
        else
        {
            tokens.push_back(str);
        }
    }
    return tokens;
}

It works with multi-character delimiters. It prevents empty tokens to get in your results. It uses a single header. It returns the string as one single token when you provide no delimiter. It also returns an empty result if the string is empty. It is unfortunately inefficient because of the huge std::vector copy UNLESS you are compiling using C++11, which should be using the move schematic. In C++11, this code should be fast.

mchiasson
  • 2,452
  • 25
  • 27
2

Here's my entry:

template <typename Container, typename InputIter, typename ForwardIter>
Container
split(InputIter first, InputIter last,
      ForwardIter s_first, ForwardIter s_last)
{
    Container output;

    while (true) {
        auto pos = std::find_first_of(first, last, s_first, s_last);
        output.emplace_back(first, pos);
        if (pos == last) {
            break;
        }

        first = ++pos;
    }

    return output;
}

template <typename Output = std::vector<std::string>,
          typename Input = std::string,
          typename Delims = std::string>
Output
split(const Input& input, const Delims& delims = " ")
{
    using std::cbegin;
    using std::cend;
    return split<Output>(cbegin(input), cend(input),
                         cbegin(delims), cend(delims));
}

auto vec = split("Mary had a little lamb");

The first definition is an STL-style generic function taking two pair of iterators. The second is a convenience function to save you having to do all the begin()s and end()s yourself. You can also specify the output container type as a template parameter if you wanted to use a list, for example.

What makes it elegant (IMO) is that unlike most of the other answers, it's not restricted to strings but will work with any STL-compatible container. Without any change to the code above, you can say:

using vec_of_vecs_t = std::vector<std::vector<int>>;

std::vector<int> v{1, 2, 0, 3, 4, 5, 0, 7, 8, 0, 9};
auto r = split<vec_of_vecs_t>(v, std::initializer_list<int>{0, 2});

which will split the vector v into separate vectors every time a 0 or a 2 is encountered.

(There's also the added bonus that with strings, this implementation is faster than both strtok()- and getline()-based versions, at least on my system.)

Tristan Brindle
  • 16,281
  • 4
  • 39
  • 82
2

For those who need alternative in splitting string with a string delimiter, perhaps you can try my following solution.

std::vector<size_t> str_pos(const std::string &search, const std::string &target)
{
    std::vector<size_t> founds;

    if(!search.empty())
    {
        size_t start_pos = 0;

        while (true)
        {
            size_t found_pos = target.find(search, start_pos);

            if(found_pos != std::string::npos)
            {
                size_t found = found_pos;

                founds.push_back(found);

                start_pos = (found_pos + 1);
            }
            else
            {
                break;
            }
        }
    }

    return founds;
}

std::string str_sub_index(size_t begin_index, size_t end_index, const std::string &target)
{
    std::string sub;

    size_t size = target.length();

    const char* copy = target.c_str();

    for(size_t i = begin_index; i <= end_index; i++)
    {
        if(i >= size)
        {
            break;
        }
        else
        {
            char c = copy[i];

            sub += c;
        }
    }

    return sub;
}

std::vector<std::string> str_split(const std::string &delimiter, const std::string &target)
{
    std::vector<std::string> splits;

    if(!delimiter.empty())
    {
        std::vector<size_t> founds = str_pos(delimiter, target);

        size_t founds_size = founds.size();

        if(founds_size > 0)
        {
            size_t search_len = delimiter.length();

            size_t begin_index = 0;

            for(int i = 0; i <= founds_size; i++)
            {
                std::string sub;

                if(i != founds_size)
                {
                    size_t pos  = founds.at(i);

                    sub = str_sub_index(begin_index, pos - 1, target);

                    begin_index = (pos + search_len);
                }
                else
                {
                    sub = str_sub_index(begin_index, (target.length() - 1), target);
                }

                splits.push_back(sub);
            }
        }
    }

    return splits;
}

Those snippets consist of 3 function. The bad news is to use the str_split function you will need the other two functions. Yes it is a huge chunk of code. But the good news is that those additional two functions are able to work independently and sometimes can be useful too.. :)

Tested the function in main() block like this:

int main()
{
    std::string s = "Hello, world! We need to make the world a better place. Because your world is also my world, and our children's world.";

    std::vector<std::string> split = str_split("world", s);

    for(int i = 0; i < split.size(); i++)
    {
        std::cout << split[i] << std::endl;
    }
}

And it would produce:

Hello, 
! We need to make the 
 a better place. Because your 
 is also my 
, and our children's 
.

I believe that's not the most efficient code, but at least it works. Hope it helps.

yunhasnawa
  • 815
  • 1
  • 14
  • 30
2

Here's my take on this. I had to process the input string word by word, which could have been done by using space to count words but I felt it would be tedious and I should split the words into vectors.

#include<iostream>
#include<vector>
#include<string>
#include<stdio.h>
using namespace std;
int main()
{
    char x = '\0';
    string s = "";
    vector<string> q;
    x = getchar();
    while(x != '\n')
    {
        if(x == ' ')
        {
            q.push_back(s);
            s = "";
            x = getchar();
            continue;
        }
        s = s + x;
        x = getchar();
    }
    q.push_back(s);
    for(int i = 0; i<q.size(); i++)
        cout<<q[i]<<" ";
    return 0;
}
  1. Doesn't take care of multiple spaces.
  2. If the last word is not immediately followed by newline character, it includes the whitespace between the last word's last character and newline character.
2

Yes, I looked through all 30 examples.

I couldn't find a version of split that works for multi-char delimiters, so here's mine:

#include <string>
#include <vector>

using namespace std;

vector<string> split(const string &str, const string &delim)
{   
    const auto delim_pos = str.find(delim);

    if (delim_pos == string::npos)
        return {str};

    vector<string> ret{str.substr(0, delim_pos)};
    auto tail = split(str.substr(delim_pos + delim.size(), string::npos), delim);

    ret.insert(ret.end(), tail.begin(), tail.end());

    return ret;
}

Probably not the most efficient of implementations, but it's a very straightforward recursive solution, using only <string> and <vector>.

Ah, it's written in C++11, but there's nothing special about this code, so you could easily adapt it to C++98.

Romário
  • 1,664
  • 1
  • 20
  • 30
2

C++17 version without any memory allocation (except may be for std::function)

void iter_words(const std::string_view& input, const std::function<void(std::string_view)>& process_word) {

    auto itr = input.begin();

    auto consume_whitespace = [&]() {
        for(; itr != input.end(); ++itr) {
            if(!isspace(*itr))
                return;
        }
    };

    auto consume_letters = [&]() {
        for(; itr != input.end(); ++itr) {
            if(isspace(*itr))
                return;
        }
    };

    while(true) {
        consume_whitespace();
        if(itr == input.end())
            return;
        auto word_start = itr - input.begin();
        consume_letters();
        auto word_end = itr - input.begin();
        process_word(input.substr(word_start, word_end - word_start));
    }
}

int main() {
    iter_words("foo bar", [](std::string_view sv) {
        std::cout << "Got word: " <<  sv << '\n';
    });
    return 0;
}
balki
  • 26,394
  • 30
  • 105
  • 151
2

Quick version which uses vector as the base class, giving full access to all of its operators:

    // Split string into parts.
    class Split : public std::vector<std::string>
    {
        public:
            Split(const std::string& str, char* delimList)
            {
               size_t lastPos = 0;
               size_t pos = str.find_first_of(delimList);

               while (pos != std::string::npos)
               {
                    if (pos != lastPos)
                        push_back(str.substr(lastPos, pos-lastPos));
                    lastPos = pos + 1;
                    pos = str.find_first_of(delimList, lastPos);
               }
               if (lastPos < str.length())
                   push_back(str.substr(lastPos, pos-lastPos));
            }
    };

Example used to populate an STL set:

std::set<std::string> words;
Split split("Hello,World", ",");
words.insert(split.begin(), split.end());
enb081
  • 3,831
  • 11
  • 43
  • 66
landen
  • 1
  • 1
2

I use the following

void split(string in, vector<string>& parts, char separator) {
    string::iterator  ts, curr;
    ts = curr = in.begin();
    for(; curr <= in.end(); curr++ ) {
        if( (curr == in.end() || *curr == separator) && curr > ts )
               parts.push_back( string( ts, curr ));
        if( curr == in.end() )
               break;
        if( *curr == separator ) ts = curr + 1; 
    }
}

PlasmaHH, I forgot to include the extra check( curr > ts) for removing tokens with whitespace.

ManiP
  • 713
  • 2
  • 8
  • 19
1

I believe no one has posted this solution yet. Instead of using delimiters directly, it basically does the same as boost::split(), i.e., it allows you to pass a predicate that returns true if a char is a delimiter, and false otherwise. I think this gives the programmer a lot more control, and the great thing is you don't need boost.

template <class Container, class String, class Predicate>
void split(Container& output, const String& input,
           const Predicate& pred, bool trimEmpty = false) {
    auto it = begin(input);
    auto itLast = it;
    while (it = find_if(it, end(input), pred), it != end(input)) {
        if (not (trimEmpty and it == itLast)) {
            output.emplace_back(itLast, it);
        }
        ++it;
        itLast = it;
    }
}

Then you can use it like this:

struct Delim {
    bool operator()(char c) {
        return not isalpha(c);
    }
};    

int main() {
    string s("#include<iostream>\n"
             "int main() { std::cout << \"Hello world!\" << std::endl; }");

    vector<string> v;

    split(v, s, Delim(), true);
    /* Which is also the same as */
    split(v, s, [](char c) { return not isalpha(c); }, true);

    for (const auto& i : v) {
        cout << i << endl;
    }
}
LLLL
  • 373
  • 2
  • 5
1

I have just written a fine example of how to split a char by symbol, which then places each array of chars (words seperated by your symbol) into a vector. For simplicity i made the vector type of std string.

I hope this helps and is readable to you.

#include <vector>
#include <string>
#include <iostream>

void push(std::vector<std::string> &WORDS, std::string &TMP){
    WORDS.push_back(TMP);
    TMP = "";
}
std::vector<std::string> mySplit(char STRING[]){
        std::vector<std::string> words;
        std::string s;
        for(unsigned short i = 0; i < strlen(STRING); i++){
            if(STRING[i] != ' '){
                s += STRING[i];
            }else{
                push(words, s);
            }
        }
        push(words, s);//Used to get last split
        return words;
}

int main(){
    char string[] = "My awesome string.";
    std::cout << mySplit(string)[2];
    std::cin.get();
    return 0;
}
1
// adapted from a "regular" csv parse
std::string stringIn = "my csv  is 10233478 NOTseparated by commas";
std::vector<std::string> commaSeparated(1);
int commaCounter = 0;
for (int i=0; i<stringIn.size(); i++) {
    if (stringIn[i] == " ") {
        commaSeparated.push_back("");
        commaCounter++;
    } else {
        commaSeparated.at(commaCounter) += stringIn[i];
    }
}

in the end you will have a vector of strings with every element in the sentence separated by spaces. only non-standard resource is std::vector (but since an std::string is involved, i figured it would be acceptable).

empty strings are saved as a separate items.

Cerbrus
  • 70,800
  • 18
  • 132
  • 147
tony gil
  • 9,424
  • 6
  • 76
  • 100
1
#include <iostream>
#include <vector>
using namespace std;

int main() {
  string str = "ABC AABCD CDDD RABC GHTTYU FR";
  str += " "; //dirty hack: adding extra space to the end
  vector<string> v;

  for (int i=0; i<(int)str.size(); i++) {
    int a, b;
    a = i;

    for (int j=i; j<(int)str.size(); j++) {
      if (str[j] == ' ') {
        b = j;
        i = j;
        break;
      }
    }
    v.push_back(str.substr(a, b-a));
  }

  for (int i=0; i<v.size(); i++) {
    cout<<v[i].size()<<" "<<v[i]<<endl;
  }
  return 0;
}
torayeff
  • 9,296
  • 19
  • 69
  • 103
1

Just for convenience:

template<class V, typename T>
bool in(const V &v, const T &el) {
    return std::find(v.begin(), v.end(), el) != v.end();
}

The actual splitting based on multiple delimiters:

std::vector<std::string> split(const std::string &s,
                               const std::vector<char> &delims) {
    std::vector<std::string> res;
    auto stuff = [&delims](char c) { return !in(delims, c); };
    auto space = [&delims](char c) { return in(delims, c); };
    auto first = std::find_if(s.begin(), s.end(), stuff);
    while (first != s.end()) {
        auto last = std::find_if(first, s.end(), space);
        res.push_back(std::string(first, last));
        first = std::find_if(last + 1, s.end(), stuff);
    }
    return res;
}

The usage:

int main() {
    std::string s = "   aaa,  bb  cc ";
    for (auto el: split(s, {' ', ','}))
        std::cout << el << std::endl;
    return 0;
}
AlwaysLearning
  • 7,257
  • 4
  • 33
  • 68
1

Loop on getline with ' ' as the token.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
lemic
  • 177
  • 4
1

I have a very different approach from the other solutions that offers a lot of value in ways that the other solutions are variously lacking, but of course also has its own down sides. Here is the working implementation, with the example of putting <tag></tag> around words.

For a start, this problem can be solved with one loop, no additional memory, and by considering merely four logical cases. Conceptually, we're interested in boundaries. Our code should reflect that: let's iterate through the string and look at two characters at a time, bearing in mind that we have special cases at the start and end of the string.

The downside is that we have to write the implementation, which is somewhat verbose, but mostly convenient boilerplate.

The upside is that we wrote the implementation, so it is very easy to customize it to specific needs, such as distinguishing left and write word boundaries, using any set of delimiters, or handling other cases such as non-boundary or erroneous positions.

using namespace std;

#include <iostream>
#include <string>

#include <cctype>

typedef enum boundary_type_e {
    E_BOUNDARY_TYPE_ERROR = -1,
    E_BOUNDARY_TYPE_NONE,
    E_BOUNDARY_TYPE_LEFT,
    E_BOUNDARY_TYPE_RIGHT,
} boundary_type_t;

typedef struct boundary_s {
    boundary_type_t type;
    int pos;
} boundary_t;

bool is_delim_char(int c) {
    return isspace(c); // also compare against any other chars you want to use as delimiters
}

bool is_word_char(int c) {
    return ' ' <= c && c <= '~' && !is_delim_char(c);
}

boundary_t maybe_word_boundary(string str, int pos) {
    int len = str.length();
    if (pos < 0 || pos >= len) {
        return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};
    } else {
        if (pos == 0 && is_word_char(str[pos])) {
            // if the first character is word-y, we have a left boundary at the beginning
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};
        } else if (pos == len - 1 && is_word_char(str[pos])) {
            // if the last character is word-y, we have a right boundary left of the null terminator
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        } else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {
            // if we have a delimiter followed by a word char, we have a left boundary left of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};
        } else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {
            // if we have a word char followed by a delimiter, we have a right boundary right of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        }
        return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};
    }
}

int main() {
    string str;
    getline(cin, str);

    int len = str.length();
    for (int i = 0; i < len; i++) {
        boundary_t boundary = maybe_word_boundary(str, i);
        if (boundary.type == E_BOUNDARY_TYPE_LEFT) {
            // whatever
        } else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {
            // whatever
        }
    }
}

As you can see, the code is very simple to understand and fine tune, and the actual usage of the code is very short and simple. Using C++ should not stop us from writing the simplest and most readily customized code possible, even if that means not using the STL. I would think this is an instance of what Linus Torvalds might call "taste", since we have eliminated all the logic we don't need while writing in a style that naturally allows more cases to be handled when and if the need to handle them arises.

What could improve this code might be the use of enum class, accepting a function pointer to is_word_char in maybe_word_boundary instead of invoking is_word_char directly, and passing a lambda.

okovko
  • 1,851
  • 14
  • 27
1

A minimal solution is a function which takes as input a std::string and a set of delimiter characters (as a std::string), and returns a std::vector of std::strings.

#include <string>
#include <vector>

std::vector<std::string>
tokenize(const std::string& str, const std::string& delimiters)
{
  using ssize_t = std::string::size_type;
  const ssize_t str_ln = str.length();
  ssize_t last_pos = 0;

  // container for the extracted tokens
  std::vector<std::string> tokens;

  while (last_pos < str_ln) {
      // find the position of the next delimiter
      ssize_t pos = str.find_first_of(delimiters, last_pos);

      // if no delimiters found, set the position to the length of string
      if (pos == std::string::npos)
         pos = str_ln;

      // if the substring is nonempty, store it in the container
      if (pos != last_pos)
         tokens.emplace_back(str.substr(last_pos, pos - last_pos));

      // scan past the previous substring
      last_pos = pos + 1;
  }

  return tokens;
}

A usage example:

#include <iostream>

int main()
{
    std::string input_str = "one + two * (three - four)!!---! ";
    const char* delimiters = "! +- (*)";
    std::vector<std::string> tokens = tokenize(input_str, delimiters);

    std::cout << "input = '" << input_str << "'\n"
              << "delimiters = '" << delimiters << "'\n"
              << "nr of tokens found = " << tokens.size() << std::endl;
    for (const std::string& tk : tokens) {
        std::cout << "token = '" << tk << "'\n";
    }

  return 0;
}

AlQuemist
  • 1,110
  • 3
  • 12
  • 22
1

Yet another way -- continuation passing style, zero allocation, function based delimiting.

 void split( auto&& data, auto&& splitter, auto&& operation ) {
   using std::begin; using std::end;
   auto prev = begin(data);
   while (prev != end(data) ) {
     auto&&[prev,next] = splitter( prev, end(data) );
     operation(prev,next);
     prev = next;
   }
 }

Now we can write specific split functions based off this.

 auto anyOfSplitter(auto delimiters) {
   return [delimiters](auto begin, auto end) {
     while( begin != end && 0 == std::string_view(begin, end).find_first_of(delimiters) ) {
       ++begin;
     }
     auto view = std::string_view(begin, end);
     auto next = view.find_first_of(delimiters);
     if (next != view.npos)
       return std::make_pair( begin, begin + next );
     else
       return std::make_pair( begin, end );
   };
 }

we can now produce a traditional std string split like this:

 template<class C>
 auto traditional_any_of_split( std::string_view<C> str, std::string_view<C> delim ) {
   std::vector<std::basic_string<C>> retval;
   split( str, anyOfSplitter(delim), [&](auto s, auto f) {
     retval.emplace_back(s,f);
   });
   return retval;
 }

or we can use find instead

 auto findSplitter(auto delimiter) {
   return [delimiter](auto begin, auto end) {
     while( begin != end && 0 == std::string_view(begin, end).find(delimiter) ) {
       begin += delimiter.size();
     }
     auto view = std::string_view(begin, end);
     auto next = view.find(delimiter);
     if (next != view.npos)
       return std::make_pair( begin, begin + next );
     else
       return std::make_pair( begin, end );
   };
 }

 template<class C>
 auto traditional_find_split( std::string_view<C> str, std::string_view<C> delim ) {
   std::vector<std::basic_string<C>> retval;
   split( str, findSplitter(delim), [&](auto s, auto f) {
     retval.emplace_back(s,f);
   });
   return retval;
 }

by replacing the splitter portion.

Both of these allocate a buffer of return values. We can swap the return values to string views at the cost of manually managing lifetime.

We can also take a continuation that will get passed the string views one at a time, avoiding even allocating the vector of views.

This can be extended with an abort option, so that we can abort after reading a few prefix strings.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
1

Some C++20 compilers and most of the C++23 compilers (ranges and string_view)

for (auto word : std::views::split("Somewhere down the road", ' '))
        std::cout << std::string_view{ word.begin(), word.end() } << std::endl;
Pavan Chandaka
  • 11,671
  • 5
  • 26
  • 34
0

My implementation can be an alternative solution:

std::vector<std::wstring> SplitString(const std::wstring & String, const std::wstring & Seperator)
{
    std::vector<std::wstring> Lines;
    size_t stSearchPos = 0;
    size_t stFoundPos;
    while (stSearchPos < String.size() - 1)
    {
        stFoundPos = String.find(Seperator, stSearchPos);
        stFoundPos = (stFoundPos == std::string::npos) ? String.size() : stFoundPos;
        Lines.push_back(String.substr(stSearchPos, stFoundPos - stSearchPos));
        stSearchPos = stFoundPos + Seperator.size();
    }
    return Lines;
}

Test code:

std::wstring MyString(L"Part 1SEPsecond partSEPlast partSEPend");
std::vector<std::wstring> Parts = IniFile::SplitString(MyString, L"SEP");
std::wcout << L"The string: " << MyString << std::endl;
for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it)
{
    std::wcout << *it << L"<---" << std::endl;
}
std::wcout << std::endl;
MyString = L"this,time,a,comma separated,string";
std::wcout << L"The string: " << MyString << std::endl;
Parts = IniFile::SplitString(MyString, L",");
for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it)
{
    std::wcout << *it << L"<---" << std::endl;
}

Output of the test code:

The string: Part 1SEPsecond partSEPlast partSEPend
Part 1<---
second part<---
last part<---
end<---

The string: this,time,a,comma separated,string
this<---
time<---
a<---
comma separated<---
string<---
Cerbrus
  • 70,800
  • 18
  • 132
  • 147
hkBattousai
  • 10,583
  • 18
  • 76
  • 124
0

very late to the party here I know but I was thinking about the most elegant way of doing this if you were given a range of delimiters rather than whitespace, and using nothing more than the standard library.

Here are my thoughts:

To split words into a string vector by a sequence of delimiters:

template<class Container>
std::vector<std::string> split_by_delimiters(const std::string& input, const Container& delimiters)
{
    std::vector<std::string> result;

    for (auto current = begin(input) ; current != end(input) ; )
    {
        auto first = find_if(current, end(input), not_in(delimiters));
        if (first == end(input)) break;
        auto last = find_if(first, end(input), is_in(delimiters));
        result.emplace_back(first, last);
        current = last;
    }
    return result;
}

to split the other way, by providing a sequence of valid characters:

template<class Container>
std::vector<std::string> split_by_valid_chars(const std::string& input, const Container& valid_chars)
{
    std::vector<std::string> result;

    for (auto current = begin(input) ; current != end(input) ; )
    {
        auto first = find_if(current, end(input), is_in(valid_chars));
        if (first == end(input)) break;
        auto last = find_if(first, end(input), not_in(valid_chars));
        result.emplace_back(first, last);
        current = last;
    }
    return result;
}

is_in and not_in are defined thus:

namespace detail {
    template<class Container>
    struct is_in {
        is_in(const Container& charset)
        : _charset(charset)
        {}

        bool operator()(char c) const
        {
            return find(begin(_charset), end(_charset), c) != end(_charset);
        }

        const Container& _charset;
    };

    template<class Container>
    struct not_in {
        not_in(const Container& charset)
        : _charset(charset)
        {}

        bool operator()(char c) const
        {
            return find(begin(_charset), end(_charset), c) == end(_charset);
        }

        const Container& _charset;
    };

}

template<class Container>
detail::not_in<Container> not_in(const Container& c)
{
    return detail::not_in<Container>(c);
}

template<class Container>
detail::is_in<Container> is_in(const Container& c)
{
    return detail::is_in<Container>(c);
}
enb081
  • 3,831
  • 11
  • 43
  • 66
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
0

Thank you @Jairo Abdiel Toribio Cisneros. It works for me but your function return some empty element. So for return without empty I have edited with the following:

std::vector<std::string> split(std::string str, const char* delim) {
    std::vector<std::string> v;
    std::string tmp;

    for(std::string::const_iterator i = str.begin(); i <= str.end(); ++i) {
        if(*i != *delim && i != str.end()) {
            tmp += *i;
        } else {
            if (tmp.length() > 0) {
                v.push_back(tmp);
            }
            tmp = "";
        }
    }

    return v;
}

Using:

std::string s = "one:two::three";
std::string delim = ":";
std::vector<std::string> vv = split(s, delim.c_str());
enb081
  • 3,831
  • 11
  • 43
  • 66
Kakashi
  • 534
  • 11
  • 16
0

if you want split string by some chars you can use

#include<iostream>
#include<string>
#include<vector>
#include<iterator>
#include<sstream>
#include<string>

using namespace std;
void replaceOtherChars(string &input, vector<char> &dividers)
{
    const char divider = dividers.at(0);
    int replaceIndex = 0;
    vector<char>::iterator it_begin = dividers.begin()+1,
        it_end= dividers.end();
    for(;it_begin!=it_end;++it_begin)
    {
        replaceIndex = 0;
        while(true)
        {
            replaceIndex=input.find_first_of(*it_begin,replaceIndex);
            if(replaceIndex==-1)
                break;
            input.at(replaceIndex)=divider;
        }
    }
}
vector<string> split(string str, vector<char> chars, bool missEmptySpace =true )
{
    vector<string> result;
    const char divider = chars.at(0);
    replaceOtherChars(str,chars);
    stringstream stream;
    stream<<str;    
    string temp;
    while(getline(stream,temp,divider))
    {
        if(missEmptySpace && temp.empty())
            continue;
        result.push_back(temp);
    }
    return result;
}
int main()
{
    string str ="milk, pigs.... hot-dogs ";
    vector<char> arr;
    arr.push_back(' '); arr.push_back(','); arr.push_back('.');
    vector<string> result = split(str,arr);
    vector<string>::iterator it_begin= result.begin(),
        it_end= result.end();
    for(;it_begin!=it_end;++it_begin)
    {
        cout<<*it_begin<<endl;
    }
return 0;
}
Bushuev
  • 557
  • 1
  • 10
  • 29
0

This is an extension of one of the top answers. It now supports setting a max number of returned elements, N. The last bit of the string will end up in the Nth element. The MAXELEMENTS parameter is optional, if set at default 0 it will return an unlimited amount of elements. :-)

.h:

class Myneatclass {
public:
    static std::vector<std::string>& split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS = 0);
    static std::vector<std::string> split(const std::string &s, char delim, const size_t MAXELEMENTS = 0);
};

.cpp:

std::vector<std::string>& Myneatclass::split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS) {
    std::stringstream ss(s);
    std::string item;
    while (std::getline(ss, item, delim)) {
        elems.push_back(item);
        if (MAXELEMENTS > 0 && !ss.eof() && elems.size() + 1 >= MAXELEMENTS) {
            std::getline(ss, item);
            elems.push_back(item);
            break;
        }
    }
    return elems;
}
std::vector<std::string> Myneatclass::split(const std::string &s, char delim, const size_t MAXELEMENTS) {
    std::vector<std::string> elems;
    split(s, delim, elems, MAXELEMENTS);
    return elems;
}
Jonny
  • 15,955
  • 18
  • 111
  • 232
0

my general implementation for string and u32string ~, using the boost::algorithm::split signature.

template<typename CharT, typename UnaryPredicate>
void split(std::vector<std::basic_string<CharT>>& split_result,
           const std::basic_string<CharT>& s,
           UnaryPredicate predicate)
{
    using ST = std::basic_string<CharT>;
    using std::swap;
    std::vector<ST> tmp_result;
    auto iter = s.cbegin(),
         end_iter = s.cend();
    while (true)
    {
        /**
         * edge case: empty str -> push an empty str and exit.
         */
        auto find_iter = find_if(iter, end_iter, predicate);
        tmp_result.emplace_back(iter, find_iter);
        if (find_iter == end_iter) { break; }
        iter = ++find_iter; 
    }
    swap(tmp_result, split_result);
}


template<typename CharT>
void split(std::vector<std::basic_string<CharT>>& split_result,
           const std::basic_string<CharT>& s,
           const std::basic_string<CharT>& char_candidate)
{
    std::unordered_set<CharT> candidate_set(char_candidate.cbegin(),
                                            char_candidate.cend());
    auto predicate = [&candidate_set](const CharT& c) {
        return candidate_set.count(c) > 0U;
    };
    return split(split_result, s, predicate);
}

template<typename CharT>
void split(std::vector<std::basic_string<CharT>>& split_result,
           const std::basic_string<CharT>& s,
           const CharT* literals)
{
    return split(split_result, s, std::basic_string<CharT>(literals));
}
0
#include <iostream>
#include <string>
#include <deque>

std::deque<std::string> split(
    const std::string& line, 
    std::string::value_type delimiter,
    bool skipEmpty = false
) {
    std::deque<std::string> parts{};

    if (!skipEmpty && !line.empty() && delimiter == line.at(0)) {
        parts.push_back({});
    }

    for (const std::string::value_type& c : line) {
        if (
            (
                c == delimiter 
                &&
                (skipEmpty ? (!parts.empty() && !parts.back().empty()) : true)
            )
            ||
            (c != delimiter && parts.empty())
        ) {
            parts.push_back({});
        }

        if (c != delimiter) {
            parts.back().push_back(c);
        }
    }

    if (skipEmpty && !parts.empty() && parts.back().empty()) {
        parts.pop_back();
    }

    return parts;
}

void test(const std::string& line) {
    std::cout << line << std::endl;

    std::cout << "skipEmpty=0 |";
    for (const std::string& part : split(line, ':')) {
        std::cout << part << '|';
    }
    std::cout << std::endl;

    std::cout << "skipEmpty=1 |";
    for (const std::string& part : split(line, ':', true)) {
        std::cout << part << '|';
    }
    std::cout << std::endl;

    std::cout << std::endl;
}

int main() {
    test("foo:bar:::baz");
    test("");
    test("foo");
    test(":");
    test("::");
    test(":foo");
    test("::foo");
    test(":foo:");
    test(":foo::");

    return 0;
}

Output:

foo:bar:::baz
skipEmpty=0 |foo|bar|||baz|
skipEmpty=1 |foo|bar|baz|


skipEmpty=0 |
skipEmpty=1 |

foo
skipEmpty=0 |foo|
skipEmpty=1 |foo|

:
skipEmpty=0 |||
skipEmpty=1 |

::
skipEmpty=0 ||||
skipEmpty=1 |

:foo
skipEmpty=0 ||foo|
skipEmpty=1 |foo|

::foo
skipEmpty=0 |||foo|
skipEmpty=1 |foo|

:foo:
skipEmpty=0 ||foo||
skipEmpty=1 |foo|

:foo::
skipEmpty=0 ||foo|||
skipEmpty=1 |foo|
Oleg
  • 486
  • 6
  • 12
0

There's a way easier method to do this!!

#include <vector>
#include <string>
std::vector<std::string> splitby(std::string string, char splitter) {
    int splits = 0;
    std::vector<std::string> result = {};
    std::string locresult = "";
    for (unsigned int i = 0; i < string.size(); i++) {
        if ((char)string.at(i) != splitter) {
            locresult += string.at(i);
        }
        else {
            result.push_back(locresult);
            locresult = "";
        }
    }
    if (splits == 0) {
        result.push_back(locresult);
    }
    return result;
}

void printvector(std::vector<std::string> v) {
    std::cout << '{';
    for (unsigned int i = 0; i < v.size(); i++) {
        if (i < v.size() - 1) {
            std::cout << '"' << v.at(i) << "\",";
        }
        else {
            std::cout << '"' << v.at(i) << "\"";
        }
    }
    std::cout << "}\n";
}
General Grievance
  • 4,555
  • 31
  • 31
  • 45
0

My code is:

#include <list>
#include <string>
template<class StringType = std::string, class ContainerType = std::list<StringType> >
class DSplitString:public ContainerType
{
public:
    explicit DSplitString(const StringType& strString, char cChar, bool bSkipEmptyParts = true)
    {
        size_t iPos = 0;
        size_t iPos_char = 0;
        while(StringType::npos != (iPos_char = strString.find(cChar, iPos)))
        {
            StringType strTemp = strString.substr(iPos, iPos_char - iPos);
            if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts))
                push_back(strTemp);
            iPos = iPos_char + 1;
        }
    }
    explicit DSplitString(const StringType& strString, const StringType& strSub, bool bSkipEmptyParts = true)
    {
        size_t iPos = 0;
        size_t iPos_char = 0;
        while(StringType::npos != (iPos_char = strString.find(strSub, iPos)))
        {
            StringType strTemp = strString.substr(iPos, iPos_char - iPos);
            if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts))
                push_back(strTemp);
            iPos = iPos_char + strSub.length();
        }
    }
};

Example:

#include <iostream>
#include <string>
int _tmain(int argc, _TCHAR* argv[])
{
    DSplitString<> aa("doicanhden1;doicanhden2;doicanhden3;", ';');
    for each (std::string var in aa)
    {
        std::cout << var << std::endl;
    }
    std::cin.get();
    return 0;
}
Code Lღver
  • 15,573
  • 16
  • 56
  • 75
-1

Here's my approach, cut and split:

string cut (string& str, const string& del)
{
    string f = str;

    if (in.find_first_of(del) != string::npos)
    {
        f = str.substr(0,str.find_first_of(del));
        str = str.substr(str.find_first_of(del)+del.length());
    }

    return f;
}

vector<string> split (const string& in, const string& del=" ")
{
    vector<string> out();
    string t = in;

    while (t.length() > del.length())
        out.push_back(cut(t,del));

    return out;
}

BTW, if there's something I can do to optimize this ..

Khaled.K
  • 5,828
  • 1
  • 33
  • 51
-1

Not that we need more answers, but this is what I came up with after being inspired by Evan Teran.

std::vector <std::string> split(const string &input, auto delimiter, bool skipEmpty=true) {
  /*
  Splits a string at each delimiter and returns these strings as a string vector.
  If the delimiter is not found then nothing is returned.
  If skipEmpty is true then strings between delimiters that are 0 in length will be skipped.
  */
  bool delimiterFound = false;
  int pos=0, pPos=0;
  std::vector <std::string> result;
  while (true) {
    pos = input.find(delimiter,pPos);
    if (pos != std::string::npos) {
      if (skipEmpty==false or pos-pPos > 0) // if empty values are to be kept or not
        result.push_back(input.substr(pPos,pos-pPos));
      delimiterFound = true;
    } else {
      if (pPos < input.length() and delimiterFound) {
        if (skipEmpty==false or input.length()-pPos > 0) // if empty values are to be kept or not
          result.push_back(input.substr(pPos,input.length()-pPos));
      }
      break;
    }
    pPos = pos+1;
  }
  return result;
}
-5
void splitString(string str, char delim, string array[], const int arraySize)
{
    int delimPosition, subStrSize, subStrStart = 0;

    for (int index = 0; delimPosition != -1; index++)
    {
        delimPosition = str.find(delim, subStrStart);
        subStrSize = delimPosition - subStrStart;
        array[index] = str.substr(subStrStart, subStrSize);
        subStrStart =+ (delimPosition + 1);
    }
}
enb081
  • 3,831
  • 11
  • 43
  • 66
  • 4
    Welcome to StackOverflow. Your answer would be improved if you described the code a bit further. What differentiates it from the one (very high scoring) answers on this old question? – marko Dec 04 '12 at 23:20
-10

For a ridiculously large and probably redundant version, try a lot of for loops.

string stringlist[10];
int count = 0;

for (int i = 0; i < sequence.length(); i++)
{
    if (sequence[i] == ' ')
    {
        stringlist[count] = sequence.substr(0, i);
        sequence.erase(0, i+1);
        i = 0;
        count++;
    }
    else if (i == sequence.length()-1)  // Last word
    {
        stringlist[count] = sequence.substr(0, i+1);
    }
}

It isn't pretty, but by and large (Barring punctuation and a slew of other bugs) it works!

Cerbrus
  • 70,800
  • 18
  • 132
  • 147
Peter C.
  • 423
  • 2
  • 6
  • 12
  • 38
    I was tempted to +1 this answer for its simple, readable code (which I presume rubbed an elegantophile the wrong way, hence the -1), but then I saw that you allocated a fixed-size array of strings to hold the tokens. Come on, you *know* that's gonna break at the worst possible moment! :) – j_random_hacker Aug 24 '09 at 09:14