# PureBasic Forum

 It is currently Wed Jan 20, 2021 5:40 am

 All times are UTC + 1 hour

 Page 1 of 2 [ 20 posts ] Go to page 1, 2  Next
 Print view Previous topic | Next topic
Author Message
 Post subject: Word CountPosted: Tue Jul 29, 2014 9:13 pm
 Enthusiast

Joined: Thu Jul 02, 2009 5:42 am
Posts: 327
Procedure.i CountWords(a\$) ;-Count Words

While FindString(a\$,Chr(10),0)
a\$=ReplaceString(a\$,Chr(10)," ")
Wend

While FindString(a\$,Chr(13),0)
a\$=ReplaceString(a\$,Chr(13)," ")
Wend

While FindString(a\$," ",0)
a\$=ReplaceString(a\$," "," ")
Wend

If (Len(a\$)>0)
numwords=CountString(Trim(a\$)," ")+1
Else
numwords=CountString(Trim(a\$)," ")
EndIf

ProcedureReturn numwords
EndProcedure

I have a routine that counts the number of words in a document. The problem it is a little slow on bigger documents. Can anyone convert this code to asm (x64)
to see if it would be faster

Top

 Post subject: Re: Word CountPosted: Tue Jul 29, 2014 9:49 pm

Joined: Mon Jul 25, 2005 3:51 pm
Posts: 3767
Location: Utah, USA
spacebuddy wrote:
The problem it is a little slow on bigger documents. Can anyone convert this code to asm (x64)
to see if it would be faster

I can't help you with an asm version. It was slow for many reasons. Some I fixed in the code below. See if it works for you.

Code:
Procedure.i CountWords(a\$) ;-Count Words

ReplaceString(a\$,Chr(10)," ", #PB_String_InPlace)
ReplaceString(a\$,Chr(13)," ", #PB_String_InPlace)

While FindString(a\$,"  ",0)
a\$=ReplaceString(a\$,"  "," ")
Wend

Trim(a\$)
If (Len(a\$)>0)
numwords=CountString(a\$," ")+1
Else
numwords=0
EndIf

ProcedureReturn numwords
EndProcedure

_________________

Last edited by Demivec on Tue Jul 29, 2014 10:12 pm, edited 1 time in total.

Top

 Post subject: Re: Word CountPosted: Tue Jul 29, 2014 9:56 pm
 Enthusiast

Joined: Thu Jul 02, 2009 5:42 am
Posts: 327
Thanks Demivec

I have old computer and very slow, I will test to see if it helps

Top

 Post subject: Re: Word CountPosted: Tue Jul 29, 2014 10:25 pm

Joined: Mon Jul 25, 2005 3:51 pm
Posts: 3767
Location: Utah, USA
Here's the simple test code I used:
Code:
Procedure.i CountWords(a\$) ;-Count Words

ReplaceString(a\$,Chr(10)," ", #PB_String_InPlace)
ReplaceString(a\$,Chr(13)," ", #PB_String_InPlace)

While FindString(a\$,"  ",0)
a\$=ReplaceString(a\$,"  "," ")
Wend

Trim(a\$)
If (Len(a\$)>0)
numwords=CountString(a\$," ")+1
Else
numwords=0
EndIf

ProcedureReturn numwords
EndProcedure

filename\$ = OpenFileRequester("", "", "Text (*.txt)|*.txt;", 1)
If filename\$
a\$ = ReadString(1, #PB_File_IgnoreEOL)
CloseFile(1)
EndIf

If a\$
t1 = ElapsedMilliseconds()
c = CountWords(a\$)
t2 = ElapsedMilliseconds() - t1

MessageRequester("Results", "For file: '" + GetFilePart(f\$) +"', found " + c + " words in " + t2 + " ms.")
EndIf

I have a faster computer and I tested it with a 1418 KB file. It found 237209 words in 74 ms.

I tested the same file with your procedure and I aborted the program after 4 minutes of waiting.

_________________

Top

 Post subject: Re: Word CountPosted: Tue Jul 29, 2014 10:48 pm
 Always Here

Joined: Fri Oct 23, 2009 2:33 am
Posts: 6271
Location: Wales, UK
Huh? PB's CountString() will find a partial string or a whole word, so there should be no need to worry about other chars.

For speed, assuming you are working with files, load the file into a memory buffer and then use CountString() directly on the buffer.

_________________
IdeasVacuum
If it sounds simple, you have not grasped the complexity.

Top

 Post subject: Re: Word CountPosted: Tue Jul 29, 2014 11:29 pm
 Enthusiast

Joined: Thu Jul 02, 2009 5:42 am
Posts: 327
My system q6600 with 1Gig of ram. Everything run slow

Top

 Post subject: Re: Word CountPosted: Wed Jul 30, 2014 12:17 am
 Always Here

Joined: Fri Oct 23, 2009 2:33 am
Posts: 6271
Location: Wales, UK
.... the bottleneck would be how you load the file, once loaded, everything should be fast. How big are the files that need to be searched?

_________________
IdeasVacuum
If it sounds simple, you have not grasped the complexity.

Top

 Post subject: Re: Word CountPosted: Wed Jul 30, 2014 1:30 am
 Enthusiast

Joined: Thu Jul 02, 2009 5:42 am
Posts: 327
IdeasVacuum wrote:
.... the bottleneck would be how you load the file, once loaded, everything should be fast. How big are the files that need to be searched?

Files are around 100-200MB, this includes pictures and text. Loading is not problem

Top

 Post subject: Re: Word CountPosted: Wed Jul 30, 2014 6:21 am

Joined: Sat Apr 26, 2003 8:26 am
Posts: 2999
Location: Planet Earth
Pictures is binary data, and using strings of 100MB to 200MB does not make sense with
functions like 'a\$=ReplaceString(a\$," "," ")', because that creates/allocates a new string
of the big size, before it releases the old string. Same for 'CountString(Trim(a\$)," ")', which
would create a new trimmed string of 100MB to 200MB first, and then it would count the
words within this big string. But counting spaces within binary data doesn't make much sense anyway?

Top

 Post subject: Re: Word CountPosted: Wed Jul 30, 2014 6:30 am
 Enthusiast

Joined: Thu Jul 02, 2009 5:42 am
Posts: 327
Danilo, this could be big problem for me, now sure how to fix

Top

 Post subject: Re: Word CountPosted: Wed Jul 30, 2014 6:46 am

Joined: Sat Apr 26, 2003 8:26 am
Posts: 2999
Location: Planet Earth
What about loading it as binary data into a memory buffer? Then, search the buffer for spaces (Byte value 32).
Depends on the type of data. If it's text files, it depends on how the files are saved (ASCII or Unicode). For pictures,
or other binary data, I don't understand why you want to count space characters in it (.jpg, .png, .bmp)?

Top

 Post subject: Re: Word CountPosted: Wed Jul 30, 2014 7:53 am
 PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3710
Location: Netherlands
It was a bit of a puzzle to create but I hope this works for you.
It should work on x64 and x86, both ascii and unicode.
Code:
Procedure.l CountWords(*Text.Character); Requires SSE

; init some mmx registers
!mov eax, 1
!movd mm4, eax        ; mm4 = previous comparison result
!pxor mm3, mm3        ; mm3 = 0
!movq mm2, mm4        ; mm2 = counter
!mov eax, 0x200d0a09
!movd mm1, eax
!punpcklbw mm1, mm3   ; mm1 = separation characters (tab, lf, cr, space)
!movq mm0, mm4        ; mm0 = working register

CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
!mov rdx, [p.p_Text]
CompilerElse
!mov edx, [p.p_Text]
CompilerEndIf
!jmp countwords_entry

; main loop
!countwords_loop:
; compare character with separation chars
!pshufw mm0, mm5, 0
!pcmpeqw mm0, mm1
!psrlw mm0, 15
; at this time mm0 = 1 if a separation char is found otherwise 0
!pandn mm4, mm0
; make a copy of the comparison result
!movq mm4, mm0

; entry point for first character
!countwords_entry:
CompilerIf #PB_Compiler_Unicode
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
!movzx eax, word [rdx]
CompilerElse
!movzx eax, word [edx]
CompilerEndIf
CompilerElse
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
!movzx eax, byte [rdx]
CompilerElse
!movzx eax, byte [edx]
CompilerEndIf
CompilerEndIf
!movd mm5, eax

; loop if not end of string
!and ax, ax
!jnz countwords_loop

; correct counter if last character was a separation character
!psubd mm2, mm0
; set result and empty mmx state
!movd eax, mm2
!emms
ProcedureReturn

EndProcedure

Example
Code:
S.s = "This is a test string"
Debug CountWords(@S)

_________________
macOS 10.15 Catalina, Windows 10

Last edited by wilbert on Wed Jul 30, 2014 1:16 pm, edited 8 times in total.

Top

 Post subject: Re: Word CountPosted: Wed Jul 30, 2014 8:37 am

Joined: Sat Apr 26, 2003 8:26 am
Posts: 2999
Location: Planet Earth
wilbert's code translated to PB syntax:
Code:
Procedure.l CountWords(*Text.Character)
Protected wordCount
If *Text
While *text\c
c.c = *text\c                                                    ; get current character
If c = #TAB Or c = 32 Or c = #CR Or c = #LF                      ; If current char is TAB, SPACE, CR, LF
*text + SizeOf(Character)                                    ;     ignore it
Continue                                                     ;     Continue
Else                                                             ; Else
wordCount + 1                                                ;     wordCount + 1
While c And c <> #TAB And c <> 32 And c <> #CR And c <> #LF  ;     take all characters, except: TAB, SPACE, CR, LF, 0
*text + SizeOf(Character)                                ;
c.c = *text\c                                            ;
Wend                                                         ;
EndIf                                                            ; EndIf
Wend
EndIf
ProcedureReturn wordCount
EndProcedure

S.s = "This is a test string"
S.s + #TAB\$+Space(10)+#TAB\$+#CRLF\$+#TAB\$+"a bcd"+#LF\$+#LFCR\$+#CRLF\$+Space(10)
Debug CountWords(@S)

Top

 Post subject: Re: Word CountPosted: Wed Jul 30, 2014 6:50 pm

Joined: Fri Nov 09, 2012 11:04 pm
Posts: 1807
Location: Uttoxeter, UK
@wilbert
@Danilo

Thank you for sharing.
Both are neater and faster than the one I have been using.

Both are fast enough, but wilbert's is about 4x faster on my machine.

_________________
DE AA EB

Top

 Post subject: Re: Word CountPosted: Wed Jul 30, 2014 8:21 pm
 Enthusiast

Joined: Thu Jul 02, 2009 5:42 am
Posts: 327
Wilbert, I tested this on my machine and it is smoking fast

Top

 Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending
 Page 1 of 2 [ 20 posts ] Go to page 1, 2  Next

 All times are UTC + 1 hour

#### Who is online

Users browsing this forum: No registered users and 2 guests

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forum

Search for:
 Jump to:  Select a forum ------------------ PureBasic    Coding Questions    Game Programming    3D Programming    Assembly Programming    The PureBasic Editor    The PureBasic Form Designer    General Discussion    Feature Requests and Wishlists    Tricks 'n' Tips Bug Reports    Bugs - Windows    Bugs - Linux    Bugs - Mac OSX    Bugs - IDE    Bugs - Documentation OS Specific    AmigaOS    Linux    Windows    Mac OSX Miscellaneous    Announcement    Off Topic Showcase    Applications - Feedback and Discussion    PureFORM & JaPBe    TailBite