SPO600 – Strcpy Optimization Testing

So in my previous post I discussed the implementation of my optimization and upon successfully compiling, I was ready to move on to testing.

The first test I did was a very simple strcpy test:

#include 
#include 

int main()
{
   char src[40];
   char dest[100];
  
   memset(dest, '\0', sizeof(dest));
   strcpy(src, "Very simple test");
   strcpy(dest, src);

   printf("Final copied string : %s\n", dest);
   
   return(0);
}

Compiled using the following command:

gcc -g simple.c -o simple

The expected output would be:

Final copied string : Very simple test

Using the testrun script in build directory, I ran the following command:

./testrun.sh simple

Bad news! The output I got was:

Final copied string : Very sim

I will need some time to hack away at this and will edit once I’m done.

EDIT:

So after some time of debugging and testing a variety of things, I’ve changed the code to look like this:

BEFORE

        
.p2align 6
        /* Aligning here ensures that the entry code and main loop all lies
           within one 64-byte cache line.  */
L(bulk_entry):
        sub     to_align, to_align, #16
        stp     data1, data2, [dstin]
        sub     src, srcin, to_align
        sub     dst, dstin, to_align

L(vector_entry):
        ld1     {v0.16b}, [src], #16    /*load 16 bytes into vector register*/ 
        uminv   B3, v0.16b              /*find the minimum value in the vector register*/
        umov    w10, v3.16b[0]          /*move it to a 32 bit register to use                 
                                          with cmp instruction*/
        cmp     w10, #0                 /*if null is found then enter byte by                
                                          byte copy*/
        b.eq    L(byte_copy)

L(vector_store):
        st1     {v0.16b}, [dst]      
        b       L(vector_entry)

L(byte_copy):
        ldrb    w1, [src], #1
        strb    w1, [dst], #1
        cmp     w1, #0
        b.eq    L(byte_copy)

AFTER

        
     .p2align 6
     /* Aligning here ensures that the entry code and main loop all lies
        within one 64-byte cache line.  */
L(bulk_entry):
     sub to_align, to_align, #16
     stp data1, data2, [dstin]
     sub src, srcin, to_align
     sub dst, dstin, to_align

L(vector_entry):
     ld1     {v0.16b}, [src], #16     
     uminv   B3, v0.16b
     umov    w10, v3.16b[0] 
     cmp     w10, #0 
     b.eq    L(byte_entry)

L(vector_store):
     st1 {v0.16b}, [dst]      
     b  L(vector_entry)

L(byte_entry):
     sub src, src, #16
     b  L(byte_copy)

L(byte_copy):
     ldrb       w1, [src], #1
     strb       w1, [dst], #1
     cmp w1, #0
     b.ne       L(byte_copy)

Things to note
1. I added another branch called byte_entry so that the src address can be moved back 16 bytes (which was done in the ld1 instruction), before initiating byte by byte copy

Expected Output:

Simple_expected

Observed Output:

Simple_observed

So far so good, but I’ve encountered an issue when the string is longer

Expected Output:

Simple_longer_expected

Observed Output:

Simple_longer_observed

The string is getting cut off, but I am unsure why. I will continue to hack away but more than likely I will require some outside guidance. Hopefully my next post will be more fruitful!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s